Computer Organisation
Computer Organisation
UNIT II: Basic Computer Organization and Design Instruction Codes - Computer
Registers – Computer Instruction – Timing & Control Instruction Cycles – Memory
Reference Instruction.
UNIT III: Computer Arithmetic Introduction – Addition & Subtraction – Multiplication &
Division Algorithm – Floating Point Arithmetic Operations.
UNIT IV: I/O Organization – Peripheral Devices – I/O Interface – Mode of Transfers
– DMA.
TEXT BOOK:
1. Computer System Architecture, Morris Mano, Pearson Publication, Third
Edition.
REFERENCE BOOK:
1. Computer Organization, Prabhakar Gupta, Vineet Agarwal, Manish Varshey
– Word Press.
2. Computer Architecture & Organization – John L.Hennessy & David
A.Patterson.
UNIT I
Basic Structure of Computers
Computer Types
Computer is a fast electronic calculating machine which accepts digital input,
processes it according to the internally stored instructions (Programs) and produces
the result on the output device. The internal operation of the computer can be as
Micro Computer
Laptop Computer
Work Station
Super Computer
Main Frame
Hand Held
Multi core
OPERAND/s
OPCODE
Step 1: Fetch the instruction from main memory into the processor Step 2: Fetch the
operand at location LOCA from main memory into
the processor Register R1
Step 3: Add the content of Register R1 and the contents of register R0 Step 4: Store
the result (sum) in R0.
Figure 3 below shows how the memory and the processor are connected. As shown
in the diagram, in addition to the ALU and the control circuitry, the processor
contains a number of registers used for several different purposes. The instruction
register holds the instruction that is currently being executed.
The program counter keeps track of the execution of the program. It contains the
memory address of the next instruction to be fetched and executed. There are n
general purpose registers R0 to Rn-1 which can be used by the programmers during
writing programs.
igure 3: Connections between the processor and the memory
The interaction between the processor and the memory and the direction of flow of
information is as shown in the diagram below:
SOFTWARE
Figure 5: Single bus structure
If a user wants to enter and run an application program, he/she needs a System
Software. System Software is a collection of programs that are executed as needed
to perform functions such as:
Receiving and interpreting user commands
Entering and editing application programs and storing then as files in
secondary storage devices
Running standard application programs such as word processors, spread sheets,
games etc…
Operating system - is key system software component which helps the user to
exploit the below underlying hardware with the programs.
PERFORMANCE
The most important measure of the performance of a computer is how quickly it can
execute programs. The speed with which a computer executes program is affected
by the design of its hardware. For best performance, it is necessary to design the
compiles, the machine instruction set, and the hardware in a coordinated way.
The total time required to execute the program is elapsed time is a measure of the
performance of the entire computer system. It is affected by the speed of the
processor, the disk and the printer. The time needed to execute a instruction is
called the processor time.
Just as the elapsed time for the execution of a program depends on all units in a
computer system, the processor time depends on the hardware involved in the
execution of individual machine instructions. This hardware comprises the processor
and the memory which are usually connected by the bus.
The pertinent parts of the fig. c is repeated in fig. d which includes the cache memory
as part of the processor unit.
Let us examine the flow of program instructions and data between the memory and
the processor. At the start of execution, all program instructions and the required
data are stored in the main memory. As the execution proceeds, instructions are
fetched one by one over the bus into the processor, and a copy is placed in the
cache later if the same instruction or data item is needed a second time, it is read
directly from the cache.
The processor and relatively small cache memory can be fabricated on a single IC
chip. The internal speed of performing the basic steps of instruction processing on
chip is very high and is considerably faster than the speed at which the instruction
and data can be fetched from the main memory. A program will be executed faster if
the movement of instructions and data between the main memory and the processor
is minimized, which is achieved by using the cache.
For example:- Suppose a number of instructions are executed repeatedly over a
short period of time as happens in a program loop. If these instructions are available
in the cache, they can be fetched quickly during the period of repeated use. The
same applies to the data that are used repeatedly.
Processor clock:
Processor circuits are controlled by a timing signal called clock. The clock designer
the regular time intervals called clock cycles. To execute a machine instruction the
processor divides the action to be performed into a sequence of basic steps that
each step can be completed in one clock cycle. The length P of one clock cycle is an
important parameter that affects the processor performance.
Processor used in today‟s personal computer and work station have a clock rates
that range from a few hundred million to over a billion cycles per second.
T=N*S/R
this is often referred to as the basic performance equation.
We must emphasize that N, S & R are not independent parameters changing one
may affect another. Introducing a new feature in the design of a processor will lead
to improved performance only if the overall result is to reduce the value of T.
Performance measurements:
It is very important to be able to access the performance of a computer, comp
designers use performance estimates to evaluate the effectiveness of new features.
The previous argument suggests that the performance of a computer is given by the
execution time T, for the program of interest.
Inspite of the performance equation being so simple, the evaluation of „T‟ is
highly complex. Moreover the parameters like the clock speed and various
architectural features are not reliable indicators of the expected performance.
Hence measurement of computer performance using bench mark programs is done
to make comparisons possible, standardized programs must be used.
The performance measure is the time taken by the computer to execute a given
bench mark. Initially some attempts were made to create artificial programs that
could be used as bench mark programs. But synthetic programs do not properly
predict the performance obtained when real application programs are run.
A non profit organization called SPEC- system performance evaluation corporation
selects and publishes bench marks.
The program selected range from game playing, compiler, and data base
applications to numerically intensive programs in astrophysics and quantum
chemistry. In each case, the program is compiled under test, and the running time on
a real computer is measured. The same program is also compiled and run on one
computer selected as reference.
The „SPEC‟ rating is computed as follows.
00 0000 00 0
01
Fixed Point 0001 01 1
02 0010 02 2
03 0011 03 3
04 0100 04 4
05 0101 05 5
06 0110 06 6
07 0111 07 7
08 1000 10 8
09 1001 11 9
10 1010 12 A
11 1011 13 B
12 1100 14 C
13 1101 15 D
14 1110 16 E
15 1111 17 F
Representation:
It‟s the representation for integers only where the decimal point is always fixed. i.e at
the end of rightmost point. it can be again represented in two ways.
1. Sign and Magnitude Representation
In this system, he most significant (leftmost) bit in the word as a sign bit. If the sign
bit is 0, the number is positive; if the sign bit is 1, the number is negative.
The simplest form of representing sign bit is the sign magnitude representation.
One of the draw back for sign magnitude number is addition and subtraction need to
consider both sign of the numbes and their relative magnitude.
Another drawback is there are two representation for 0(Zero) i.e +0 and -0.
2. One’s Complement (1’s) Representation
In this representation negative values are obtained by complementing each bit of the
corresponding positive number.
For example 1s complement of 0101 is 1010 . The process of forming the 1s
complement of a given number is equivalent to subtracting that number from 2n -1
i.e from 1111 for 4 bit number.
Two‟s Complement (2‟s) Representation Forming the 2s complement of a number s
done by subtracting that number from 2n . So 2s complement of a number is
obtained by adding 1 to 1s complement of that number.
Ex: 2‟s complement of 0101 is 1010 +1 = 1011
NB: In all systems, the leftmost bit is 0 for positive number and 1 for negative
number.
Floating-point representation
Floating-point numbers are so called as the decimal or binary point floats over the
base
depending on the exponent value. It consists two components.
• Exponent
• Mantissa
Example: Avogadro's number can be written as 6.02x1023 in base 10. And the
mantissa and exponent are 6.02 and 1023 respctivly. But computer floating-point
numbers are usually based on base two. So 6.02x1023 is approximately (1 and
63/64)x278 or 1.111111 (base two) x 21001110 (base two)
Error Detection Codes
Parity System Hamming Distance CRC
Check sum
Every statement written in register transfer notation implies the presence of the
required hardware construction
It is assumed that all transfers occur during a clock edge transitionAll
microoperations written on a single line are to be executed at the same time T: R2
R1, R1 R2
Multiplexers select the source register whose binary information is then placed on
the busThe select lines are connected to the selection inputs of the multiplexers and
choose the bits of one register
In general, a bys system will multiplex k registers of n bits each to produce an n- line
common bus
This requires n multiplexers – one for each bit
The size of each multiplexer must be k x 1
The number of select lines required is log k
To transfer information from the bus to a register, the bus lines are connected to the
inputs of all destination registers and the corresponding load control line must be
activated
Rather than listing each step as
BUS C, R1 BUS,
use R1 C, since the bus isimplied
Instead of using multiplexers, three-state gates can be used to
construct the bus system
A three-state gate is a digital circuit that exhibits three states
Two of the states are signals equivalent to logic 1 and 0
The third state is a high-impedance state – this behaves like an open circuit, which
means the output is disconnected and does not have a logic significance
The three-state buffer gate has a normal input and a control input which determines
the output state
With control 1, the output equals the normal input
With control 0, the gate goes to a high-impedance stateThis enables a large number
of three-state gate outputs to be connected with wires to form a common bus line
Arithmetic Microoperations
To implement the add microoperation with hardware, we need the registers that hold
the data and the digital component that performs the addition
A full-adder adds two bits and a previous carry
A binary adder is a digital circuit that generates the arithmetic sum of two binary
numbers of any length
A binary added is constructed with full-adder circuits connected in cascade
An n-bit binary adder requires n full-adders
The subtraction A-B can be carried out by the following steps
o Take the 1‟s complement of B (invert each bit)
o Get the 2‟s complement by adding 1
o Add the result to A
The addition and subtraction operations can be combined into one common circuit
by including an XOR gate with each full-adder
Logic operations specify binary operations for strings of bits stored in registers and
treat each bit separately
Example: the XOR of R1 and R2 is symbolized by
P: R1 R1⊕ R2
Example: R1 = 1010 and R2 = 1100
1010 Content of R1
1100 Content of R2
Logic microoperations can be used to change bit values, delete a group of bits, or
insert new bit values into a register
The selective-set operation sets to 1 the bits in A where there are corresponding 1‟s
in B
1010 A before
1100 B
(logic operand) 1110 A
after A A B
The selective-complement operation complements bits in A where there are
corresponding 1‟s in B
1010 A before
1100 B
(logic operand) 0110 A
after
A A⊕ B
The selective-clear operation clears to 0 the bits in A only where there are
corresponding 1‟s in B
1010 A before
1100 B
(logic
operand) 0010 A
after A A B
The mask operation is similar to the selective-clear operation, except that the bits of
A are cleared only where there are corresponding 0‟s in B
1010 A before
1100 B
(logic operand) 1000 A
after A A B
The insert operation inserts a new value into a group of bits
This is done by first masking the bits to be replaced and then Oring them with the
bits to be inserted
0110 1010 A before
0000 1111 B (mask)
0000 1010 A after masking
0000 1010 A before
1001 0000 B (insert)
1001 1010 A after insertion
The clear operation compares the bits in A and B and produces an all 0‟s result if the
two number are equal
1010 A
1010 B
0000 A A⊕ B
Shift Microoperations
The arithmetic shift shifts a signed binary number to the left or right
To the left is multiplying by 2, to the right is dividing by 2
Arithmetic shifts must leave the sign bit unchanged
A sign reversal occurs if the bit in Rn-1 changes in value after the shift
This happens if the multiplication causes an overflow
An overflow flip-flop Vs can be used to detect theoverflow Vs = Rn-1 ⊕ Rn-2
A bi-directional shift unit with parallel load could be used to implement this
Two clock pulses are necessary with this configuration: one to load the value and
another to shift
In a processor unit with many registers it is more efficient to implement the shift
operation with a combinational circuit
The content of a register to be shifted is first placed onto a common bus and the
output is connected to the combinational shifter, the shifted number is then loaded
back into the register
This can be constructed with multiplexers
Arithmetic Logic Shift Unit
The arithmetic logic unit (ALU) is a common operational unit connected to a number
of storage registers
To perform a microoperation, the contents of specified registers are placed in the
inputs of the ALU
The ALU performs an operation and the result is then transferred to a destination
register
The ALU is a combinational circuit so that the entire register transfer operation from
the source registers through the ALU and into the destination register can be
performed during one clock pulse period
UNIT II
Instruction Formats:
A computer will usually have a variety of instruction code formats. It is the function
of the control unit within the CPU to interpret each instruction code and provide the
necessary control functions needed to process the instruction.
The format of an instruction is usually depicted in a rectangular box symbolizing
the bits of the instruction as they appear in memory words or in a control register. The
bits of the instruction are divided into groups called fields. The most common fields
found in instruction formats are:
1 An operation code field that specifies the operation to be performed.
2. An address field that designates a memory address or a processor registers.
3. A mode field that specifies the way the operand or the effective address is determined.
Other special fields are sometimes employed under certain circumstances, as for
example a field that gives the number of shifts in a shift-type instruction.
The operation code field of an instruction is a group of bits that define various
processor operations, such as add, subtract, complement, and shift. The bits that define
the mode field of an instruction code specify a variety of alternatives for choosing the
operands from the given address.
Operations specified by computer instructions are executed on some data stored
in memory or processor registers, Operands residing in processor registers are
specified with a register address. A register address is a binary number of k bits that
defines one of 2k registers in the CPU. Thus a CPU with 16 processor registers R0
through R15 will have a register address field of four bits. The binary number 0101, for
example, will
designate register R5.
Computers may have instructions of several different lengths containing varying
number of addresses. The number of address fields in the instruction format of a
computer depends on the internal organization of its registers. Most computers fall into
one of three types of CPU organizations:
1 Single accumulator organization.
General register organization.
Stack organization.
All operations are performed with an implied accumulator register. The instruction
format in this type of computer uses one address field. For example, the instruction that
specifies an arithmetic addition is defined by an assembly language instruction as ADD.
Where X is the address of the operand. The ADD instruction in this case results in
the operation AC ← AC + M[X]. AC is the accumulator register and M[X] symbolizes the
memory word located at address X.
An example of a general register type of organization was presented in Fig. 7.1.
The instruction format in this type of computer needs three register address fields. Thus
the instruction for an arithmetic addition may be written in an assembly language as
ADD R1, R2, R3
To denote the operation R1 ← R2 + R3. The number of address fields in the
instruction can be reduced from three to two if the destination register is the same as
one of the source registers. Thus the instruction
ADD R1, R2
Would denote the operation R1 ← R1 + R2. Only register addresses for R1 and
R2 need be specified in this instruction.
Computers with multiple processor registers use the move instruction with a
mnemonic MOV to symbolize a transfer instruction. Thus the instruction
MOV R1, R2
Denotes the transfer R1 ← R2 (or R2 ← R1, depending on the particular
computer). Thus transfer-type instructions need two address fields
to specify the source and the destination.
General register-type computers employ two or three address fields in their
instruction format. Each address field may specify a processor register or a memory
word. An instruction symbolized by
ADD R1, X
Would specify the operation R1 ← R + M [X]. It has two address fields, one for
register R1 and the other for the memory address X.
The stack-organized CPU was presented in Fig. 8-4. Computers with stack
organization would have PUSH and POP instructions which require an address field.
Thus the instruction
PUSH X
Will push the word at address X to the top of the stack. The stack pointer is
updated automatically. Operation-type instructions do not need an address field in
stack-organized computers. This is because the operation is performed on the two
items that are on top of the stack. The instruction ADD in a stack computer consists of
an operation code only with no address field. This operation has the effect of popping
the two top numbers from the stack, adding the numbers, and pushing the sum into the
stack. There is no need to specify operands with an address field since all operands are
implied to be in the stack.
To illustrate the influence of the number of addresses on computer programs, we
will evaluate the arithmetic statement X = (A + B) ∗ (C + D).
Using zero, one, two, or three address instruction. We will use the symbols ADD,
SUB, MUL, and DIV for the four arithmetic operations; MOV for the transfer-type
operation; and LOAD and STORE for transfers to and from memory and AC register.
We will assume that the operands are in memory addresses A, B, C, and D, and the
result must be stored in memory at address X.
Three-Address Instructions
Computers with three-address instruction formats can use each address field to
specify either a processor register or a memory operand. The program in assembly
language that evaluates X = (A + B) ∗ (C + D) is shown below, together with comments
that explain the register transfer
operation of each instruction.
ADD R1, A, B R1 ← M [A] + M [B]
ADD R2, C, D R2 ← M [C] + M [D]
MUL X, R1, R2 M [X]
← R1 ∗ R2
It is assumed that the computer has two processor registers, R1 and R2. The symbol M
[A] denotes the operand at memory address symbolized by A.
The advantage of the three-address format is that it results in short programs when
evaluating arithmetic expressions. The disadvantage is that the binary- coded
instructions require too many bits to specify three addresses. An example of a
commercial computer that uses three-address instructions is the Cyber 170. The
instruction formats in the Cyber computer are restricted to either three register address
fields or two register address fields and one memory address field.
Two-Address Instructions
Two address instructions are the most common in commercial computers. Here again
each address field can specify either a processor register or a memory word. The
program to evaluate X = (A + B) ∗ (C + D) is as follows:
MOV R1, A R1 ← M [A]
ADD R1, B R1 ← R1 + M [B]
MOV R2, C R2 ← M [C]
ADD R2, D R2 ← R2 + M [D]
MUL R1, R2 R1 ← R1∗R2
MOV X, R1 M [X] ← R1
The MOV instruction moves or transfers the operands to and from memory and
processor registers. The first symbol listed in an instruction is assumed to be both a
source and the destination where the result of the operation is transferred.
One-Address Instructions
One-address instructions use an implied accumulator (AC) register for all data
manipulation. For multiplication and division there is a need for a second register.
However, here we will neglect the second and assume that the AC contains the result of
tall operations. The program to evaluate X = (A + B) ∗ (C + D) is
LOAD A AC ← M [A]
ADD B AC ← A [C] + M [B]
PUSH A TOS ← A
PUSH B TOS ← B
ADD TOS ← (A + B)
PUSH TOS ← C
C
PUSH D TOS ← D
ADD TOS ← (C + D)
MUL TOS ← (C + D) ∗ (A + B)
POP M [X] ← TOS
STORE T M [T] ← AC
LOAD C AC ← M [C]
ADD D AC ← AC + M [D]
MUL T AC ← AC ∗ M [T]
STORE X M [X] ← AC
All operations are done between the AC register and a memory operand. T is the
address of a temporary memory location required for storing the intermediate result.
Zero-Address Instructions
A stack-organized computer does not use an address field for the instructions ADD
and MUL. The PUSH and POP instructions, however, need an address field to specify
the operand that communicates with the stack. The following program shows how X =
(A + B) ∗ (C + D) will be written for a stack organized computer. (TOS stands for top of
stack)
To evaluate arithmetic expressions in a stack computer, it is necessary to convert the
expression into reverse Polish notation. The name “zero- address” is given to this type
of computer because of the absence of an
address field in the computational instructions.
Instruction Codes
A set of instructions that specify the operations, operands, and the sequence by which
processing has to occur. An instruction code is a group of bits that tells the computer to
perform a specific operation part.
Format of Instruction
The format of an instruction is depicted in a rectangular box symbolizing the bits of an
instruction. Basic fields of an instruction format are given below:
1. An operation code field that specifies the operation to be performed.
2. An address field that designates the memory address or register.
3. A mode field that specifies the way the operand of effective address is determined.
Computers may have instructions of different lengths containing varying number of
addresses. The number of address field in the instruction format depends upon the
internal organization of its registers.
Addressing Modes
To understand the various addressing modes to be presented in this section, it is
imperative that we understand the basic operation cycle of the computer. The control
unit of a computer is designed to go through an instruction cycle that is divided into
three major phases:
1. Fetch the instruction from memory
2. Decode the instruction.
3. Execute the instruction.
There is one register in the computer called the program counter of PC that keeps
track of the instructions in the program stored in memory. PC holds the address of the
instruction to be executed next and is incremented each time an instruction is fetched
from memory. The decoding done in step 2 determines the operation to be performed,
the addressing mode of the instruction and the location of the operands. The computer
then executes the instruction and returns to step 1 to fetch the next instruction in
sequence.
In some computers the addressing mode of the instruction is specified with a
distinct binary code, just like the operation code is specified. Other computers use a
single binary code that designates both the operation and the mode of the instruction.
Instructions may be defined with a variety of
addressing modes, and sometimes, two or more addressing modes are combined in
one instruction.
The operation code specified the operation to be performed. The mode field is
sued to locate the operands needed for the operation. There may or may not be
an address field in the instruction. If there is an address field, it may designate a
memory address or a processor register. Moreover, as discussed in the
preceding section, the instruction may have more than one address field, and
each address field may be associated with its own particular addressing mode.
Although most addressing modes modify the address field of the instruction, there
are two modes that need no address field at all. These are the implied and immediate
modes.
1 Implied Mode: In this mode the operands are specified
implicitly in the definition of the instruction. For example, the instruction “complement
accumulator” is an implied-mode instruction because the operand in the accumulator
register is implied in the definition of the instruction. In fact, all register reference
instructions that sue an accumulator are implied-mode instructions.
Op code Mode Address
Figure 1: Instruction format with mode field
Zero-address instructions in a stack-organized computer are implied- mode
instructions since the operands are implied to be on top of the stack.
2 Immediate Mode: In this mode the operand is specified in the
instruction itself. Inother words, an immediate- mode instruction has an operand field
rather than an address field. The operand field contains the actual operand to be used
in conjunction with the operation specified in the instruction. Immediate-mode
instructions are useful for initializing registers to a constant value.
It was mentioned previously that the address field of an instruction may specify
either a memory word or a processor register. When the address field specifies a
processor register, the instruction is said to be in the register mode.
3 Register Mode: In this mode the operands are in registers that
reside within the CPU.The particular register is selected from a register field in the
instruction. A base register is assumed to hold a base address and the address field of
the instruction gives a displacement relative to this base address. The base register
addressing mode is used in computers to facilitate the relocation of programs in
memory. When programs and data are moved from one segment of memory to another,
as required in multiprogramming systems, the address values of the base register
requires updating to reflect the beginning of a new memory segment.Numerical Example
Computer Registers
Computer Instructions:
The basic computer has 16 bit instruction register (IR) which can denote either memory
reference or register reference or input-output instruction.
1. Memory Reference – These instructions refer to memory address as an operand. The
other operand is always accumulator. Specifies 12 bit address, 3 bit opcode (other than
111) and 1 bit addressing mode for direct and indirect addressing.
Example –
IR register contains = 0001XXXXXXXXXXXX, i.e. ADD after fetching and decoding of
instruction we find out that it is a memory reference instruction for ADD operation.
Instruction Cycle
The CPU performs a sequence of microoperations for each instruction. The sequence
for each instruction of the Basic Computer can be refined into 4 abstract phases:
1. Fetch instruction
2. Decode
3. Fetch operand
4. Execute
Program execution can be represented as a top-down design:
1. Program execution
a. Instruction 1
i. Fetch instruction
ii. Decode
iii. Fetch operand
iv. Execute
v. Instruction 2
vi. Fetch instruction
vii. Decode
viii. Fetch operand
ix. Execute
b. Instruction 3 ...
T0: AR ← PC
T1: IR ← M[AR], PC ← PC + 1
T2: D0-7 ← decoded IR(12-14), AR ← IR(0-11), I ← IR(15)
For every timing cycle, we assume SC ← SC + 1 unless it is stated that SC ← 0.
Micro Programmed Control:
Control Memory
The control unit in a digital computer initiates sequences of microoperations
The complexity of the digital system is derived form the number of sequences that are
performed
When the control signals are generated by hardware, it is hardwired
In a bus-oriented system, the control signals that specify microoperations are groups of
bits that select the paths in multiplexers, decoders, and ALUs.
The control unit initiates a series of sequential steps of microoperations
The control variables can be represented by a string of 1‟s and 0‟s called acontrol word
A microprogrammed control unit is a control unit whose binary control variables are
stored in memory
A sequence of microinstructions constitutes a microprogram
The control memory can be a read-only memory
Dynamic microprogramming permits a microprogram to be loaded and uses a writable
control memory
A computer with a microprogrammed control unit will have two separate memories: a
main memory and a control memory
The microprogram consists of microinstructions that specify various internal control
signals for execution of register microoperations
These microinstructions generate the microoperations to:
o fetch the instruction from main memory
o evaluate the effective address
o execute the operation
o return control to the fetch phase for the next instruction
The control memory address register specifies the address of the microinstruction
The control data register holds the microinstruction read from memory
The microinstruction contains a control word that specifies one or more microoperations
for the data processor
The location for the next microinstruction may, or may not be the next in sequence
Some bits of the present microinstruction control the generation of the address of the
next microinstruction
The next address may also be a function of external input conditions
While the microoperations are being executed, the next address is computed in the next
address generator circuit (sequencer) and then transferred into the CAR to read the
next microinstructions
Typical functions of a sequencer are:
o incrementing the CAR by one
o loading into the CAR and address from control memory
o transferring an external address
o loading an initial address to start the control operations
A clock is applied to the CAR and the control word and next-address
The main advantage of the microprogrammed control is that once the hardware
configuration is established, there should be no need for h/w or wiring changes
To establish a different control sequence, specify a different set of microinstructions for
control memory
Address Sequencing
Microinstructions are stored in control memory in groups, with each group specifying a
routine
Each computer instruction has its own microprogram routine to generate the
microoperations
The hardware that controls the address sequencing of the control memory must be
capable of sequencing the microinstructions within a routine and be able to branch from
one routine to another
Steps the control must undergo during the execution of a single computer instruction:
o Load an initial address into the CAR when power is turned on in the computer. This
address is usually the address of the first microinstruction that activates the instruction
fetch routine – IR holds instruction
o The control memory then goes through the routine to determine the effective address of
the operand – AR holds operand address
o The next step is to generate the microoperations that execute the instruction by
considering the opcode and applying a mapping
o After execution, control must return to the fetch routine by executing an unconditional
branch
The microinstruction in control memory contains a set of bits to initiate microoperations
in computer registers and other bits to specify the method by which the next address is
obtained
Conditional branching is obtained by using part of the microinstruction to select a
specific status bit in order to determine its condition
The status conditions are special bits in the system that provide parameter information
such as the carry-out of an adder, the sign bit of a number, the mode bits of an
instruction, and i/o status conditions
The status bits, together with the field in the microinstruction that specifies a branch
address, control the branch logic
The branch logic tests the condition, if met then branches, otherwise, increments the
CAR
If there are 8 status bit conditions, then 3 bits in the microinstruction are used to specify
the condition and provide the selection variables for the multiplexer
For unconditional branching, fix the value of one status bit to be one load the branch
address from control memory into the CAR
A special type of branch exists when a microinstruction specifies a branch to the first
word in control memory where a microprogram routine is located
The status bits for this type of branch are the bits in the opcode
Assume an opcode of four bits and a control memory of 128 locations
The mapping process converts the 4-bit opcode to a 7-bit address for control memory
This provides for each computer instruction a microprogram routine with a capacity of
four microinstructions
Subroutines are programs that are used by other routines to accomplish a particular
task and can be called from any point within the main body of the microprogram
Frequently many microprograms contain identical section of code
Microinstructions can be saved by employing subroutines that use common sections of
microcode
Microprograms that use subroutines must have a provisions for storing the return
address during a subroutine call and restoring the address during a subroutine return
A subroutine register is used as the source and destination for the addresses
UNIT III
The operation or task that must perform by CPU is:
• Fetch Instruction: The CPU reads an instruction from memory.
• Interpret Instruction: The instruction is decoded to determine what
action is required.
• Fetch Data: The execution of an instruction may require reading data
from memory or I/O module.
• Process data: The execution of an instruction may require performing
some arithmetic or logical operation on data.
• Write data: The result of an execution may require writing data to
memory or an I/O module.
To do these tasks, it should be clear that the CPU needs to store some data
temporarily. It must remember the location of the last instruction so that it can know
where to get the next instruction. It needs to store instructions and data temporarily
while an instruction is being executed. In other words, the CPU needs a small internal
memory. These storage locations are generally referred as registers.
The major components of the CPU are an arithmetic and logic unit (ALU) and a control
unit (CU). The ALU does the actual computation or processing of data. The CU controls
the movement of data and instruction into and out of the CPU and controls the operation
of the ALU.
The CPU is connected to the rest of the system through system bus. Through system
bus, data or information gets transferred between the CPU and the other component of
the system. The system bus may have three components:
Data Bus: Data bus is used to transfer the data between main memory and CPU.
Address Bus: Address bus is used to access a particular memory location by putting
the address of the memory location.
Control Bus: Control bus is used to provide the different control signal generated by
CPU to different part of the system.
As for example, memory read is a signal generated by CPU to indicate that a memory
read operation has to be performed. Through control bus this signal is transferred to
memory module to indicate the required operation.
Figure 1: CPU with the system bus.
There are three basic components of CPU: register bank, ALU and Control Unit. There
are several data movements between these units and for that an internal CPU bus is
used. Internal CPU bus is needed to transfer data between the various registers and the
ALU.
Figure 2 : Internal Structure of CPU
Stack Organization:
A useful feature that is included in the CPU of most computers is a stack or last in, first
out (LIFO) list. A stack is a storage device that stores information in such a manner that
the item stored last is the first item retrieved. The operation of a stack can be compared
to a stack of trays. The last tray placed on top of the stack is the first to be taken off.
The stack in digital computers is essentially a memory unit with an address register that
can only( after an initial value is loaded in to it).The register that hold the address for the
stack is called a stack pointer (SP) because its value always points at the top item in
stack. Contrary to a stack of trays where the tray it self may be taken out or inserted, the
physical registers of a stack are always available for reading or writing.
The two operation of stack are the insertion and deletion of items. The operation of
insertion is called PUSH because it can be thought of as the result of pushing a new
item on top. The operation of deletion is called POP because it can be thought of as the
result of removing one item so that the stack pops up. However, nothing is pushed or
popped in a computer stack. These operations are simulated by incrementing or
decrementing the stack pointer register.
Register stack:
In a 64-word stack, the stack pointer contains 6 bits because 26 =64. since SP has
only six bits, it cannot exceed a number grater than 63(111111 in binary). When 63 is
incremented by 1, the result is 0 since 111111 + 1 =1000000 in binary, but SP can
accommodate only the six least significant bits. Similarly, when 000000 is decremented
by 1, the result is 111111. The one bit register Full is set to 1 when the stack is full, and
the one-bit register EMTY is set to 1 when the stack is empty of items. DR is the data
register that holds the binary data to be written in to or read out of the stack.
The stac pointer is incremented so that it points to the address of the next-higher word.
A memory write operation inserts the word from DR into the top of the stack. Note that
SP holds the address of the top of the stack and that M(SP) denotes the memory word
specified by the address presently available in SP, the first item stored in the stack is at
address 1. The last item is stored at address 0, if SP reaches 0, the stack is full of item,
so FULLL is set to 1. This condition is reached if the top item prior to the last push was
in location 63 and after increment SP, the last item stored in location 0. Once an item is
stored in location 0, there are no more empty register in the stack. If an item is written in
the stack, Obviously the stack can not be empty, so EMTY is cleared to 0.
The top item is read from the stack into DR. The stack pointer is then decremented. if its
value reaches zero, the stack is empty, so Emty is set to 1. This condition is reached if
the item read was in location 1. once this item is read out , SP is decremented and
reaches the value 0, which is the initial value of SP. Note that if a pop operation reads
the item from location 0 and then SP is decremented, SP changes to 111111, which is
equal to decimal 63. In this configuration, the word in address 0 receives the last item in
the stack. Note also that an erroneous operation will result if the stack is pushed when
FULL=1 or popped when EMTY =1.
Memory Stack :
As show in figure :4 the initial value of SP is 4001 and the stack grows with decreasing
addresses. Thus the first item stored in the stack is at address 4000, the second item is
stored at address 3999, and the last address hat can be used for the stack is 3000. No
previous are available for stack limit checks. We assume that the items in the stack
communicate with a data register DR. A new item is inserted with the push operation as
follows.
INSTRUCTION FORMATS:
We know that a machine instruction has an opcode and zero or more operands.
Encoding an instruction set can be done in a variety of ways. Architectures are
differentiated from one another by the number of bits allowed per instruction (16, 32,
and 64 are the most common), by the number of operands allowed per instruction, and
by the types of instructions and data each can process. More specifically, instruction
sets are differentiated by the following features:
1. Operand storage in the CPU (data can be stored in a stack structure or
in registers)
2. Number of explicit operands per instruction (zero, one, two, and three
being the most common)
3. Operand location (instructions can be classified as register-to-register,
register-to- memory or memory-to-memory, which simply refer to the combinations of
operands allowed per instruction)
4. Operations (including not only types of operations but also which
instructions can access memory and which cannot)
5. Type and size of operands (operands can be addresses, numbers, or even characters)
Number of Addresses:
One of the characteristics of the ISA(Industrial Standard Architecture) that shapes the
architecture is the number of addresses used in an instruction. Most operations can be
divided into binary or unary operations. Binary operations such as addition and
multiplication require two input operands whereas the unary operations such as the
logical NOT need only a single operand. Most operations produce a single result. There
are exceptions, however. For example, the division operation produces two outputs: a
quotient and a remainder. Since most operations are binary, we need a total of three
addresses: two addresses to specify the two input operands and one to specify where
the result should go.
Three-Address Machines:
In three-address machines, instructions carry all three addresses explicitly. The RISC
processors use three addresses. Table X1 gives some sample instructions of a three-
address machine.
Instruction Semantics
add dest,src1,src2 Adds the two values at src1 and src2 and stores the result in
dest
M(dest) = [src1] + [src2]
sub dest,src1,src2 Subtracts the second
source operand at src2 from the first at src1 and stores the
result in dest M(dest) = [src1] - [src2]
mult dest,src1,src2 Multiplies the two values at src1
and src2 and stores the result in dest M(dest) = [src1] * [src2]
We use the notation that each variable represents a memory address that stores the
value associated with that variable. This translation from symbol name to the memory
address is done by using a symbol table.
As you can see from this code, there is one instruction for each arithmetic operation.
Also notice that all instructions, barring the first one, use an address twice. In the middle
three instructions, it is the temporary T and in the last one, it is A. This is the motivation
for using two addresses, as we show next.
Two-Address Machines :
In two-address machines, one address doubles as a source and destination. Usually,
we use dest to indicate that the address is used for destination. But you should note that
this address also supplies one of the source operands. The Pentium is an example
processor that uses two addresses. Sample instructions of a two-address machine
On these machines, the C statement
A=B+C*D-E+F+A
is converted to the following code: load T,C ; T = C
mult T,D ; T = C*D add T,B ; T = B + C*D
sub T,E ; T = B + C*D - E add T,F ; T = B + C*D - E + F
add A,T ; A = B + C*D - E + F + A
Table :T2 Sample Two-address machine instructions:
Instruction Semantics
load dest,src Copies the value at src to dest M(dest) = [src]
add dest,src Adds the two values at src and dest and stores the result in dest
M(dest) = [dest] + [src]
sub dest,src Subtracts the second source operand at src from the first at dest and
stores the result in dest M(dest) = [dest] - [src]
mult dest,src Multiplies the two values at src and dest and stores the result in dest
M(dest) = [dest] * [src]
Since we use only two addresses, we use a load instruction to first copy the C value into
a temporary represented by T. If you look at these six instructions, you will notice that
the operand T is common. If we make this our default, then we don‟t need even two
addresses: we can get away with just one address.
One-Address Machines :
In the early machines, when memory was expensive and slow, a special set of
registers was used to provide an input operand as well as to receive the result from the
ALU. Because of this, these registers are called the accumulators. In most machines,
there is just a single accumulator register. This kind of design, called accumulator
machines, makes sense if memory is expensive.
Immediate Addressing:
The simplest form of addressing is immediate addressing, in which the operand is
actually present in the instruction:
OPERAND = A
This mode can be used to define and use constants or set initial values of
variables. The advantage of immediate addressing is that no memory reference other
than the instruction fetch is required to obtain the operand. The disadvantage is that the
size of the number is restricted to the size of the address field, which, in most instruction
sets, is small compared with the world length.
Figure 4.1: Immediate Addressing Mod
The instruction format for Immediate Addressing Mode is shown in the Figure 4.1.
Direct Addressing:
A very simple form of addressing is direct addressing, in which the address field
contains the effective address of the operand:
EA = A
Indirect Addressing:
With direct addressing, the length of the address field is usually less than the
word length, thus limiting the address range. One solution is to have the address field
refer to the address of a word in memory, which in turn contains a full-length address of
the operand. This is know as indirect addressing:
EA = (A)
Diaplacement Addressing:
A very powerful mode of addressing combines the capabilities of direct
addressing and register indirect addressing, which is broadly categorized as
displacement addressing:
EA = A + (R)
Displacement addressing requires that the instruction have two address fields, at
least one of which is explicit. The value contained in one address field (value = A) is
used directly. The other address field, or an implicit reference based on opcode, refers
to a register whose contents are added to A to produce the effective address. The
general format of Displacement Addressing is shown in the Figure 4.6.
Three of the most common use of displacement addressing are:
• Relative addressing
• Base-register addressing
• Indexing
Figure 4.6: Displacement Addressing
Relative Addressing:
For relative addressing, the implicitly referenced register is the program counter
(PC). That is, the current instruction address is added to the address field to produce
the EA. Thus, the effective address is a displacement relative to the address of the
instruction.
Base-Register Addressing:
The reference register contains a memory address, and the address field
contains a displacement from that address. The register reference may be explicit or
implicit. In some implementation, a single segment/base register is employed and is
used implicitly. In others, the programmer may choose a register to hold the base
address of a segment, and the instruction must reference it explicitly.
Indexing:
The address field references a main memory address, and the reference register
contains a positive displacement from that address. In this case also the register
reference is sometimes explicit and sometimes implicit.
Generally index register are used for iterative tasks, it is typical that there is a
need to increment or decrement the index register after each reference to it. Because
this is such a common operation, some system will automatically do this as part of the
same instruction cycle.
This is known as auto-indexing. We may get two types of auto-indexing: -one is auto-
incrementing and the other one is - auto-decrementing.
If certain registers are devoted exclusively to indexing, then auto-indexing can be
invoked implicitly and automatically. If general purpose register are used, the auto index
operation may need to be signaled by a bit in the instruction.
Auto-indexing using increment can be depicted as follows:
EA = A + (R)
R = (R) + 1
Auto-indexing using decrement can be depicted as follows:
EA = A + (R)
R = (R) - 1
In some machines, both indirect addressing and indexing are provided, and it is
possible to employ both in the same instruction. There are two possibilities: The
indexing is performed either before or after the indirection.
If indexing is performed after the indirection, it is termed post indexing
EA = (A) + (R)
First, the contents of the address field are used to access a memory location
containing an address. This address is then indexed by the register value.
With pre indexing, the indexing is performed before the indirection: EA = ( A + (R)
An address is calculated, the calculated address contains not the operand, but the
address of the operand.
Stack Addressing:
A stack is a linear array or list of locations. It is sometimes referred to as a
pushdown list or last-in- first-out queue. A stack is a reserved block of locations. Items
are appended to the top of the stack so that, at any given time, the block is partially
filled. Associated with the stack is a pointer whose value is the address of the top of the
stack. The stack pointer is maintained in a register. Thus, references to stack locations
in memory are in fact register indirect addresses.
The stack mode of addressing is a form of implied addressing. The machine
instructions need not include a memory reference but implicitly operate on the top of the
stack.
COMPUTER ARITHMETIC
Introduction:
Data is manipulated by using the arithmetic instructions in digital computers. Data is
manipulated to produce results necessary to give solution for the computation problems.
The Addition, subtraction, multiplication and division are the four basic arithmetic
operations. If we want then we can derive other operations by using these four
operations.
To execute arithmetic operations there is a separate section called arithmetic
processing unit in central processing unit. The arithmetic instructions are performed
generally on binary or decimal data. Fixed-point numbers are used to represent integers
or fractions. We can have signed or unsigned negative numbers. Fixed-point addition is
the simplest arithmetic operation.
If we want to solve a problem then we use a sequence of well-defined steps. These
steps are collectively called algorithm. To solve various problems we give algorithms.
In order to solve the computational problems, arithmetic instructions are used in digital
computers that manipulate data. These instructions perform arithmetic calculations.
And these instructions perform a great activity in processing data in a digital computer.
As we already stated that with the four basic arithmetic operations addition, subtraction,
multiplication and division, it is possible to derive other arithmetic operations and solve
scientific problems by means of numerical analysis methods.
A processor has an arithmetic processor(as a sub part of it) that executes arithmetic
operations. The data type, assumed to reside in processor, registers during the
execution of an arithmetic instruction. Negative numbers may be in a signed magnitude
or signed complement representation. There are three ways of representing negative
fixed point - binary numbers signed magnitude, signed 1‟s complement or signed 2‟s
complement. Most computers use the signed magnitude representation for the mantissa.
Addition and Subtraction :
Addition and Subtraction with Signed –Magnitude Data
We designate the magnitude of the two numbers by A and B. Where the signed
numbers are added or subtracted, we find that there are eight different conditions to
consider, depending on the sign of the numbers and the operation performed. These
conditions are listed in the first column of Table 4.1. The other columns in the table
show the actual operation to be performed with the magnitude of the numbers. The last
column is needed to present a negative zero. In other words, when two equal numbers
are subtracted, the result should be +0 not -0. The algorithms for addition and
subtraction are derived from the table and can be stated as follows (the words
parentheses should be used for the subtraction algorithm).
Addition and Subtraction of Signed-Magnitude Numbers
Computer ArithmeticAddition and Subtraction
SIGNED 2’S COMPLEMENT ADDITION AND SUBTRACTION
Hardware
B Register
V Complementer and
Parallel Adder
Overflow
AC
Algorithm
Subtract Add
Minuend in AC Augend in AC
Subtrahend in B Addend in B
AC AC + B’+ 1 AC AC + B
V overflow V overflow
END END
Algorithm:
The flowchart is shown in Figure 7.1. The two signs A, and B, are compared by an
exclusive-OR gate.
If the output of the gate is 0 the signs are identical; If it is 1, the signs are different.
For an add operation, identical signs dictate that the magnitudes be added. For a
subtract operation, different signs dictate that the magnitudes be added.
The magnitudes are added with a microoperation EA A + B, where EA is a register
that combines E and A. The carry in E after the addition constitutes an overflow if it is
equal to 1. The value of E is transferred into the add-overflow flip-flop AVF.
The two magnitudes are subtracted if the signs are different for an add operation or
identical for a subtract operation. The magnitudes are subtracted by adding A to the
2's complemented B. No overflow can occur if the numbers are subtracted so AVF is
cleared to 0.
1 in E indicates that A >= B and the number in A is the correct result. If this numbs is
zero, the sign A must be made positive to avoid a negative zero.
0 in E indicates that A < B. For this case it is necessary to take the 2's complement
of the value in A. The operation can be done with one microoperation A A' +1.
However, we assume that the A register has circuits for microoperations
complement and increment, so the 2's complement is obtained from these two
microoperations.
In other paths of the flowchart, the sign of the result is the same as the sign of
so no change in A is required. However, when A < B, the sign of the result is the
complement of the original sign of A. It is then necessary to complement A, to obtain
the correct sign.
The final result is found in register A and its sign in As. The value in AVF provides an
overflow indication. The final value of E is immaterial.
Figure 7.2 shows a block diagram of the hardware for implementing the addition and
subtraction operations.
It consists of registers A and B and sign flip-flops As and Bs. Subtraction is done by
adding A to the 2's complement of B.
The output carry is transferred to flip-flop E , where it can be checked to determine
the relative magnitudes of two numbers. The add-overflow flip-flop AVF holds the
overflow bit when A and B are added. The A register provides other microoperations
that may be needed when we specify the sequence of steps in the algorithm.
Multiplication Algorithm:
In the beginning, the multiplicand is in B and the multiplier in Q. Their corresponding
signs are in Bs and Qs respectively. We compare the signs of both A and Q and set
to corresponding sign of the product since a double-length product will be stored in
registers A and Q. Registers A and E are cleared and the sequence counter SC is
set to the number of bits of the multiplier. Since an operand must be stored with its
sign, one bit of the word will be occupied by the sign and the magnitude will consist
of n-1 bits.
Now, the low order bit of the multiplier in Qn is tested. If it is 1, the multiplicand (B) is
added to present partial product (A), 0 otherwise. Register EAQ is then shifted once
to the right to form the new partial product. The sequence counter is decremented by
1 and its new value checked. If it is not equal to zero, the process is repeated and a
new partial product is formed. When SC = 0 we stops the process.
Booth’s algorithm :
Booth algorithm gives a procedure for multiplying binary integers in signed- 2‟s
complement representation.
It operates on the fact that strings of 0‟s in the multiplier require no addition but just
shifting, and a string of 1‟s in the multiplier from bit weight 2k to weight 2m can be
For example, the binary number 001110 (+14) has a string 1‟s from 23 to 21 (k=3,
Thus the product can be obtained by shifting the binary multiplicand M four times to
the left and subtracting M shifted left once.
As in all multiplication schemes, booth algorithm requires examination of the
multiplier bits and shifting of partial product. Prior to the shifting, the multiplicand may
be added to the partial product, subtracted from the partial, or left unchanged
according to the following rules:
1. The multiplicand is subtracted from the partial product upon encountering the first
least significant 1 in a string of 1‟s in the multiplier.
2. The multiplicand is added to the partial product upon encountering the first 0 in a
string of 0‟s in the multiplier.
3. The partial product does not change when multiplier bit is identical to the previous
multiplier bit.
This is because a negative multiplier ends with a string of 1‟s and the last operation
will be a subtraction of the appropriate weight.
The two bits of the multiplier in Qn and Qn+1 are inspected.
If the two bits are equal to 10, it means that the first 1 in a string of 1 's has been
encountered. This requires a subtraction of the multiplicand from the partial product
in AC.
If the two bits are equal to 01, it means that the first 0 in a string of 0's has been
encountered. This requires the addition of the multiplicand to the partial product in
AC.
When the two bits are equal, the partial product does not change.
Division Algorithms
Division of two fixed-point binary numbers in signed magnitude representation is
performed with paper and pencil by a process of successive compare, shift and
subtract operations. Binary division is much simpler than decimal division because
here the quotient digits are either 0 or 1 and there is no need to estimate how many
times the dividend or partial remainder fits into the divisor. The division process is
er Organization
Comput Page 66
described in Figure
The devisor is compared with the five most significant bits of the dividend. Since the
5-bit number is smaller than B, we again repeat the same process. Now the 6-bit
number is greater than B, so we place a 1 for the quotient bit in the sixth position
above the dividend. Now we shift the divisor once to the
right and subtract it from the dividend. The difference is known as a partial remainder
because the division could have stopped here to obtain a quotient of 1 and a
remainder equal to the partial remainder. Comparing a partial remainder with the
divisor continues the process. If the partial remainder is greater than or equal to the
divisor, the quotient bit is equal to
1. The divisor is then shifted right and subtracted from the partial remainder. If the
partial remainder is smaller than the divisor, the quotient bit is 0 and no subtraction is
needed. The divisor is shifted once to the right in any case. Obviously the result
gives both a quotient and a remainder.
m x re
The mantissa may be a fraction or an integer. The position of the radix point and the
value of the radix r are not included in the registers. For example, assume a fraction
representation and a radix
10. The decimal number 537.25 is represented in a register with m = 53725 and e =
3 and is interpreted to represent the floating-point number
.53725 x 103
A floating-point number is said to be normalized if the most significant digit of the
mantissa in nonzero. So the mantissa contains the maximum possible number of
significant digits. We cannot normalize a zero because it does not have a nonzero
digit. It is represented in floating-point by all 0‟s in the mantissa and exponent.
Floating-point representation increases the range of numbers for a given register.
Consider a computer with 48-bit words. Since one bit must be reserved for the sign,
the range of fixed-point integer numbers will be + (247 – 1), which is approximately +
1014. The 48 bits can be used to represent a floating-point number with 36 bits for
the mantissa and 12 bits for the exponent. Assuming fraction representation for the
mantissa and taking the two sign bits into consideration, the range of numbers that
can be represented is
+ (1 – 2-35) x 22047
This number is derived from a fraction that contains 35 1‟s, an exponent of 11 bits
(excluding its sign), and because 211–1 = 2047. The largest number that can be
.5372400 x 102
+ .1580000 x 10-1
Floating-point multiplication and division need not do an alignment of the mantissas.
Multiplying the two mantissas and adding the exponents can form the product.
Dividing the mantissas and subtracting the exponents perform division.
The operations done with the mantissas are the same as in fixed-point numbers, so
the two can share the same registers and circuits. The operations performed with the
exponents are compared and incremented (for aligning the mantissas), added and
subtracted (for multiplication) and division), and decremented (to normalize the
result). We can represent the exponent in any one of the three representations -
signed-magnitude, signed 2‟s complement or signed 1‟s complement.
Biased exponents have the advantage that they contain only positive numbers. Now
it becomes simpler to compare their relative magnitude without bothering about their
signs. Another advantage is that the smallest possible biased exponent contains all
zeros. The floating-point representation of zero is then a zero mantissa and the
smallest possible exponent.
Register Configuration
So, while discussing each way of data transfer asynchronously we see the sequence
of control in both terms when it is initiated by source or when it is initiated by
destination.In this way, each way of data transfer, can be further divided into parts,
source initiated and destination initiated.
We can also specify, asynchronous transfer between two independent units by
means of a timing diagram that shows the timing relationship that exists between the
control and the data buses.
Now, we will discuss each method of asynchronous data transfer in detail one by
one.
1. Strobe Control:
The Strobe Control method of asynchronous data transfer employs a single
control line to time each transfer .This control line is also known as strobe and it
may be achieved either by source or
destination, depending on which initiate transfer.
In block diagram, we see that, the strobe initiated by destination, and as shown in
timing diagram, the destination unit first activates the strobe pulse, informing the
source to provide the data.The source unit responds by placing the requested binary
information on the data bus.The data must be valid and remain in the bus long
enough for the destination unit to accept it.The falling edge of strobe pulse can be
used again to trigger a destination register.The destination unit then disables
the strobe.And source removes the data from data bus after a per determine time
interval.
Now, actually in computer, in the first case means in strobe initiated by source -
the strobe may be a memory-write control signal from the CPU to a memory unit.The
source, CPU, places the word on the data bus and informs the memory unit, which is
the destination, that this is a write operation.
And in the second case i.e, in the strobe initiated by destination - the strobe may
be a memory read control from the CPU to a memory unit.The destination, the CPU,
initiates the read operation to inform the memory, which is a source unit, to place
selected word into the data bus.
2. Handshaking:
The disadvantage of strobe method is that source unit that initiates the transfer has
no way of knowing whether the destination has actually received the data that was
placed in the
bus.Similarly, a destination unit that initiates the transfer has no way of knowing
whether the source unit, has actually placed data on the bus.
This problem can be solved by handshaking method.
Hand shaking method introduce a second control signal line that provides a replay to
the unit that initiates the transfer.
In it, one control line is in the same direction as the data flow in the bus from the
source to destination.It is used by source unit to inform the destination unit whether
there are valid data
in the bus.The other control line is in the other direction from destination to the
source.It is used by the destination unit to inform the source whether it can
accept data.And in it also, sequence of control depends on unit that initiate
transfer.Means sequence of control depends whether transfer is initiated by source
and destination.Sequence of control in both of them are described below:
In its block diagram, we see that the two handshaking lines are "data valid",
generated by the source unit, and "ready for data" generated by destination unit.Note
that the name of signal data accepted generated by destination unit has been
changed to ready for data to reflect its new meaning.
In it, transfer is initiated by destination, so source unit does not place data on data
bus until it receives ready for data signal from destination unit.After that, hand
shaking process is some as that of source initiated.
The sequence of event in it are shown in its sequence diagram and timing
relationship between signals is shown in its timing diagram.
Thus, here we can say that, sequence of events in both cases would be
identical.If we consider ready for data signal as the complement of data
accept.Means, the only difference between source and destination initiated transfer
is in their choice of initial state.
Modes of I/O Data Transfer
Data transfer between the central unit and I/O devices can be handled in
generally three types of modes which are given below:
1. Programmed I/O
2. Interrupt Initiated I/O
3. Direct Memory Access
Programmed I/O
Programmed I/O instructions are the result of I/O instructions written in
computer program. Each data item transfer is initiated by the instruction in the
program.
Usually the program controls data transfer to and from CPU and peripheral.
Transferring data under programmed I/O requires constant monitoring of the
peripherals by the CPU.
Priority Interrupt
A priority interrupt is a system which decides the priority at which various
devices, which generates the interrupt signal at the same time, will be serviced by
the CPU. The system has authority to decide which conditions are allowed to
interrupt the CPU, while some other interrupt is being serviced. Generally, devices
with high speed transfer such as magnetic disks are given high priority and slow
devices such as keyboards are given low priority.
When two or more devices interrupt the computer simultaneously, the computer
services the device with the higher priority first.
Interrupt
Random-access
BG
CPU memory unit (RAM)
BR
RD WR Addr Data RD WR Addr Data
Read control
Write control
Data bus
Address bus
Address
select
RD WR Addr Data
DS DMA ack.
RS DMA I/O
Controller Peripheral
BR device
BG DMA request
Interrupt
Input/output Processor
An input-output processor (IOP) is a processor with direct memory access capability.
In this, the computer system is divided into a memory unit and number of
processors.
Each IOP controls and manage the input-output tasks. The IOP is similar to CPU
except that it handles only the details of I/O processing. The IOP can fetch and
execute its own instructions. These IOP instructions are designed to manage I/O
transfers only.
Block Diagram Of I/O Processor:
Below is a block diagram of a computer along with various I/O Processors. The
memory unit occupies the central position and can communicate with each
processor.
The CPU processes the data required for solving the computational tasks. The IOP
provides a path for transfer of data between peripherals and memory. The CPU
assigns the task of initiating the I/O program.
The IOP operates independent from CPU and transfer data between peripherals and
memory.
The communication between the IOP and the devices is similar to the program
control method of transfer. And the communication with the memory is similar to the
direct memory access method.
In large scale computers, each processor is independent of other processors and
any processor can initiate the operation.
The CPU can act as master and the IOP act as slave processor. The CPU assigns
the task of initiating operations but it is the IOP, who executes the instructions, and
not the CPU. CPU instructions provide operations to start an I/O transfer. The IOP
asks for CPU through interrupt.
Instructions that are read from memory by an IOP are also called commands to
distinguish them from instructions that are read by CPU. Commands are prepared by
programmers and are stored in memory. Command words make the program for
IOP. CPU informs the IOP where to find the commands in memory.
Pipelining and vector processing
Parallel processing
Execution of Concurrent Events in the computing process to achieve faster
Computational Speed
Levels of Parallel Processing
- Job or Program level
- Task or Procedure level
- Inter-Instruction level
- Intra-Instruction level
PARALLEL COMPUTERS
Architectural Classification Flynn's classification
» Based on the multiplicity of Instruction Streams and Data Streams
» Instruction Stream Sequence of Instructions read from memory
» Data Stream
Operations performed on the data in the processor
What is Pipelining?
Pipelining is the process of accumulating instruction from the processor
through a pipeline. It allows storing and executing instructions in an orderly process.
It is also known as pipeline processing.
Pipelining is a technique where multiple instructions are overlapped during
execution. Pipeline is divided into stages and these stages are connected with one
another to form a pipe like structure. Instructions enter from one end and exit from
another end.
Pipelining increases the overall instruction throughput.
In pipeline system, each segment consists of an input register followed by a
combinational circuit. The register is used to hold data and combinational circuit
performs operations on it. The output of combinational circuit is applied to the input
register of the next segment
Pipeline system is like the modern day assembly line setup in factories. For example
in a car manufacturing industry, huge assembly lines are setup and at each point,
there are robotic arms to perform a certain task, and then the car moves on ahead to
the next arm.
Types of Pipeline
It is divided into 2 categories:
1. Arithmetic Pipeline
2. Instruction Pipeline
Arithmetic Pipeline
Arithmetic pipelines are usually found in most of the computers. They are used for
floating point operations, multiplication of fixed point numbers etc. For example: The
input to the Floating Point Adder pipeline is:
X = A*2^a Y = B*2^b
Here A and B are mantissas (significant digit of floating point numbers), while a and
b are exponents. The floating point addition and subtraction is done in 4 parts:
1. Compare the exponents.
2. Align the mantissas.
3. Add or subtract mantissas
4. Produce the result.
Registers are used for storing the intermediate results between the above
operations.
Instruction Pipeline
In this a stream of instructions can be executed by overlapping fetch, decode and
execute phases of an instruction cycle. This type of technique is used to increase the
throughput of the computer system.
An instruction pipeline reads instruction from the memory while previous instructions
are being executed in other segments of the pipeline. Thus we can execute multiple
instructions simultaneously. The pipeline will be more efficient if the instruction cycle
is divided into segments of equal duration.
Advantages of Pipelining
1. The cycle time of the processor is reduced.
2. It increases the throughput of the system
3. It makes the system reliable.
Disadvantages of Pipelining
1. The design of pipelined processor is complex and costly to manufacture.
2. The instruction latency is more.
Vector(Array) Processing
There is a class of computational problems that are beyond the capabilities of
a conventional computer. These problems require vast number of computations on
multiple data items, that will take a conventional computer(with scalar processor)
days or even weeks to complete.
Such complex instructions, which operates on multiple data at the same time,
requires a better way of instruction execution, which was achieved by Vector
processors.
Scalar CPUs can manipulate one or two data items at a time, which is not very
efficient. Also, simple instructions like ADD A to B, and store into C are not
practically efficient.
Addresses are used to point to the memory location where the data to be operated
will be found, which leads to added overhead of data lookup. So until the data is
found, the CPU would be sitting ideal, which is a big performance issue.
Hence, the concept of Instruction Pipeline comes into picture, in which the
instruction passes through several sub-units in turn. These sub-units perform various
independent functions, for example:
the first one decodes the instruction, the second sub-unit fetches the data and the
thirdsub-unit performs the math itself. Therefore, while the data is fetched for one
instruction, CPU does not sit idle, it rather works on decoding the next instruction set,
ending up working like an assembly line.
Vector processor, not only use Instruction pipeline, but it also pipelines the data,
working on multiple data at the same time.
A normal scalar processor instruction would be ADD A, B, which leads to addition of
two operands, but what if we can instruct the processor to ADD a group of
numbers(from 0 to n memory location) to another group of numbers(lets say, n to k
memory location). This can be achieved by vector processors.
In vector processor a single instruction, can ask for multiple data operations, which
saves time, as instruction is decoded once, and then it keeps on operating on
different data items.
Applications of Vector Processors
Computer with vector processing capabilities are in demand in specialized
applications. The following are some areas where vector processing is used:
1. Petroleum exploration.
2. Medical diagnosis.
3. Data analysis.
4. Weather forecasting.
5. Aerodynamics and space flight simulations.
6. Image processing.
7. Artificial intelligence.
UNIT – 5
Memory Hierarchy
Direct Mapping
The CPU address of 15 bits is divided into 2 fields. In this the 9 least
significant bits constitute the index field and the remaining 6 bits constitute the tag
field. The number of bits in index field is equal to the number of address bits required
to access cache memory.
Set Associative Mapping
The disadvantage of direct mapping is that two words with same index address can't
reside in cache memory at the same time. This problem can be overcome by set
associative mapping.
In this we can store two or more words of memory under the same index address.
Each data word is stored together with its tag and this forms a set.
Replacement Algorithms
Data is continuously replaced with new data in the cache memory using replacement
algorithms.
Following are the 2 replacement algorithms used:
FIFO - First in First out. Oldest item is replaced with the latest item.
LRU - Least Recently Used. Item which is least recently used by CPU is removed.
Writing in to cache and cache Initialization:
The benefit of write-through to main memory is that it simplifies the design of
the computer system. With write-through, the main memory always has an up-to-
date copy of the line. So when a read is done, main memory can always reply with
the requested data.
If write-back is used, sometimes the up-to-date data is in a processor cache, and
sometimes it is in main memory. If the data is in a processor cache, then that
processor must stop main memory from replying to the read request, because the
main memory might have a stale copy of the data. This is more complicated than
write-through.
Also, write-through can simplify the cache coherency protocol because it doesn't
need the Modifystate. The Modify state records that the cache must write back the
cache line before it invalidates or evicts the line. In write-through a cache line can
always be invalidated without writing back since memory already has an up-to-date
copy of the line.
Cache Coherence:
In a shared memory multiprocessor with a separate cache memory for each
processor , it is possible to have many copies of any one instruction operand : one
copy in the main memory and one in each cache memory. When one copy of an
operand is changed, the other copies of the operand must be changed also. Cache
coherence is the discipline that ensures that changes in the values of shared
operands are propagated throughout the system in a timely fashion.
Virtual Memory
Virtual memory is the separation of logical memory from physical memory. This
separation provides large virtual memory for programmers when only small physical
memory is available.Virtual memory is used to give programmers the illusion that
they have a very large memory even though the computer has a small main memory.
It makes the task of programming easier because the programmer no longer needs
to worry about the amount of physical memory available.
Address mapping using pages:
The table implementation of the address mapping is simplified if the information in
the address space. And the memory space is each divided into groups of fixed size.
Moreover, The physical memory is broken down into groups of equal size called
blocks, which may range from 64 to 4096 words each.
The term page refers to groups of address space of the same size.
Also, Consider a computer with an address space of 8K and a memory space of 4K.
If we split each into groups of 1K words we obtain eight pages and four blocks as
shown in the figure.
At any given time, up to four pages of address space may reside in main memory in
any one of the four blocks.
Associative memory page table:
The implementation of the page table is vital to the efficiency of the
virtual memory technique, for each memory reference must also include a reference
to the page table. The fastest solution is a set of dedicated registers to hold the page
table but this method is impractical for large page tables because of the expense.
But keeping the page table in main memory could cause intolerable delays because
even only one memory access for the page table involves a slowdown of 100
percent and large page tables can require more than one memory access. The
solution is to augment the page table with special high-speed memory made up of
associative registers or translation look aside buffers (TLBs) which are called
ASSOCIATIVE MEMORY.
Page replacement
The advantage of virtual memory is that processes can be using more
memory than exists in the machine; when memory is accessed that is not present (a
page fault), it must be paged in (sometimes referred to as being "swapped in",
although some people reserve "swapped in to refer to bringing in an entire address
space).
Swapping in pages is very expensive (it requires using the disk), so we'd like to avoid
page faults as much as possible. The algorithm that we use to choose which pages
to evict to make space for the new page can have a large impact on the number of
page faults that occur.