C1. Basic Structure of Computer
C1. Basic Structure of Computer
(1-1) to(1-58)
(2-1)to(2-88)
(3 -1) to(3-60)
(4 -1) to (4 . 26)
(5 -1) to (5-114)
(6 -1) to (6-140)
Appendix ,A Proofs
Features of Book
...-----------------------------------------------------------------------: Uee of clear, plain and lucid language making the understanding very euy.
:.I Book provide detailed insight into the subject.
: Approach of the book reaembles clue room teaching.
I
: Eitcellent theory well 1JUpporled with the practical e:umple1 and
:
I
illutratiODI.
'
~---------------------------------------- - ------------------------------
Syllabus
(Cbpten - 1, 2)
Functional units - Basic operatlonal concepts - Bus structures - Performance and metrics Instructions and instruction sequencing - Hardware - software interface - Instruction set
an:hltecture - Addressing modes - RISC - CISC. ALU design - Fixed point and floating point
operations.
(Ch1pter- J)
Fundamental concepts - Execution of a comptete Instruction - Multiple bus organization Hardwired control - Micro programmed control - Nano programming.
3. Pipelining
(Cbptr -
>
Basic conoep\$- Data hazards -Instruction hazards -Influence on inslrYc!ion set3- Data path
and control considerations - Performance considerations - Exception handling.
4. Memory System
(Clllpter - 5)
Basic concepts Semiconductor RAM - ROM - Speed - Size and cost - Cache memories Improving cache performance - Virtual memory - Memory management requirements Assciclative memories - Secondary storage devices.
5.110 Organization
(Cb1ptor - 6)
Accessing 1/0 devices - Programmed lnpul/output - Interrupts- Direct memory access Buses - Interface circuits - Standard 110 Interfaces (PCI, SCSI, USB), 1/0 devices and
processors.
Table of Contents
(Detail)
)
1 33
1 . 34
1 34
1- 35
1.8.5 Branching . . . . . . . . . . . . .. . . . . . . . . .. . .. . .. 1- 37
1.8.6 Conditional Codes ................. .. ................ .. ................ 1- 39
1.8.7 Generating Memory Addresses . .. . . . . . .. . . . .. . . . . . . .. 1- 40
1.9 Instruction Set Architecture ................................................................ 1 40
28
1.1 Introduction
At the beginning of the text, 1 felt, it is necessary to make distinction between
computer organi$ation and architecture. Although it is difficult to give precise
definitions for these terms we define them as follows :
a nd
their
The architectural attributes include the instruction set, dab types, number of bits
used to represent dab types, 1/0 mechanism, and techniques for addressing memory.
On the other hand, the organisation.11 attributes include those hnrdware details
between the computer model maintains the software compatibility between them and
hence protects the software investments of the customers. The computer models were
designed lo be software compatible with one another, meaning that all models in the
series shared a common instruction set. In other words we can say that, the programs
written for one model could be run without modification on any other model.
However, the execution time, memory usage, and the 1/0 usage for the progra.m mny
change from model to model . The organisational changes also try lo maintain the
hardwatt compatibility with in the computer models. Many times it is not possible to
maintain the hardware compatibility. In such cases computer model may require
advance 1/0 and memory interface modules; the 1/0 and memory interface modules
designed for previous organisation may not work with advanced organisation.
In microcomputers, the relationship between computer nrchitccture and
organisation is very close. ln which, changes in technology not only influence
organisation but also result 'in the introduction of new powerful nrchltecture. The RISC
(Reduced Instruction Set) machine is a good example of this.
This text is about the computer organisation. Its purpose is to prepare clear and
complete understanding of the nature and characteristics of modem-day computer
systems. We besin this text with the basic structures of computers. '
Microcomputers
Minicomputers
Desktop computers
Personal computers
Workstations
Servers
Supercomputers
1-3
Personal Computers : The personal computers are the most rommon fof!ll of
desktop computers. They found wide use in homes, schools and business offices.
Portable Notebook Computers : Portable notebook computers are the compact
version of personal computers. The lap top computers are the good example of
portable notebook computer.
Workstations : Workstati.o ns have higher computation power than pe:sonal
computers. They have high resolution graphics terminals and improved input/output
capabilities. Workstations are used in engineering applications and in interactive
graphics applications.
These computers have large storage unit and faster communication links.
The large storage unit allows to store sizable. database and fast communication links
allow faster communication of data blocks with computers connected in the network.
These computers serve major role in inremet communication.
Servers :
Superco mputers : These computers are basically multiprocessor computers used for
the !.uge-scalc nume.rical calculations required in applications such as weather
forecasting, robotic engineering, aircraft design and simulation.
Input
Output
unit
unit
15
arithmetic and logic unit to perform the desired operations. The program stored in the
memory decides the processing steps and the processed output is sent to the user with
the help of output device$ or it is stored in the memory for later reference. All the
above mentioned activities are co-ordinated and controlled by the control unit. The
arithmetic and logic unit in conjunction with control unit is commonly called C~ntral
Processing Unit (CPU). let us discuss the fwu:tional units in detail
() Keyboord
(d) Joystick
(b) Mouae
m Scanner
1 6
To acxess data from a particular w<>rd from main memory ca.ch word in the main
memory has a distinct address. This allows to access nny word from the main memory
by specifying corresponding address. The number of bits in each word is referred to
as the word length of the computer. Typically, the word length varies from 8 to
64 bits. The number of such words in the main memory decides the size of memol)'
or capacity of the memol)'. This is one of the specification of the computer. The size
of computer main memory varies from few million words to tens of million words.
An important characteristics of a memory is an cccss time (the time required to
acxcss one word). The acxess time for main memory should be as small as possible.
Typically, it is of the order of 10 to 100 nanoseconds. This access time also depend on
the type of memory. In randomly accessed memories (RAMs), fixed time is required to
access any word in the memory. However, in sequential access memories this time is
not fixed.
The main memory consists of only randomly accessed memories. These memories
are fast but they are small in capacities and expensive. Therefore, the computer uses
the secondary storage memories such as magnetic tapes, magnetic disks for the storage
of large amount of data.
Stored program concept
Today's computer arc built on two key prlndplcs
1. Instructions are represented as numbers.
2. Programs can be stored in memory to be read or written just like numbers.
The control unit uses control signals or timing signals to determine when a given
action is to talce place. It controls input and output operations, data transfers between
the processor, memory and input/ output devices using timing signals.
The control and the arithmetic and logic units of a computer are usually many
times faster than other devices connected to' a computer system. This enables them to
control a number of external input/output d.evices.
?t0ee$$0f
,l
PC
~
1
I
[ ]
II
ALU
eonuo1
uni1
IR
1t
'
'
'
l:
1I
II
!
I
I
I
f
MAR
Geneml pufJ)Oso
f1!91'torv
L ______ _ _ _ _
------- - -
- -----
-----.t:
Mam memory
Fig. 1.3 Connections between the processor and the main memory
The instruction register (IR) is used lo hold the instruction that is currently being
executed. The contents of IR are available to the control unit, which generate the
timi.n g signals that control the various processing elements involved in executing the
instruction.
The two registers MAR and MOR are used to handle the data transfer between the
main memory and the processor. The MAR holds the address of the main memory to
or from which data is to be transferred. The MDR contains the data to be written into
In polling the processor's software simply checks each of the 1/0 devices every so
often. During this check, the processor tests to see if any device needs servicing. A
more desirable method would be the one that allows the processor to be executing its
main program and only stop to service 1/0 devices when it is told to do so by the
device itself. In effect, the method, would provide an external asynchronous input that
would inform the processor that it shoul d complete whatever instruction that is
currently being executed and fetch a new routine that will service the requesting
device. Once this servicing is completed, the processor would resume exactly where it
left off. This method is called interrupt method .
To provide services such as polling and interrupt processing is one of the major
function of the processor. We Jen.ow that, many 1/0 devices are coMected to the
oompute.r system. It may happen that more than one input devices request for 1/0
service simultaneously. In such cases the 1/0 device having highest pri.ority is
serviced first. Therefore, to handle multiple interrupts (more than one interrupts)
processor use priority logic. Thus, handling multiple interrupts is also a function of
the processor. The Fig. 1.4 (a) shows how program execution flow gets modified when
interrupt occurs. The Fig. 1.4 (b) shows that interrupt service routine itself can be
interrupted by higher priority interrupt. Processing of such interrupt is called nested
interrupt p~esslng and interrupt is called 11ested interrupt.
Interrupt seMce
routlno
Main
program
lnl$rrupl seNice
routine 2
Fig. 1.4
We have seen that the processor provides the requested service by executing an
appropriate interrupt service routine. However, due to change in the program
sequence, the internal state of the processor may change and therefore, it is necessary
to save it in the memory before servicing the interrupt. Normally, the contents of PC,
the general registers, and some control information are stored in memory. When the
interrupt service routine is completed, the state of the processor is restored so that the
intti'fllpted program MAY eonlinue. Thutfol'i!, Mving the slate of the pr0c:tssor at the
time o f interrupt is also one of the function of the computer system.
Let us see few examples which will make you more easy to understand the basic
operations of a computer.
,_. Example 1.1 : State the operations inoolved in the exee11tion of ADD Rl, RO
instrudion.
Solution : The instruction Ad d Rl, RO adds the operand in Rl register to the operand
in RO register and sto.rcs the sum into RO register. Let us sec the steps involved in the
execution of this instruction.
1. Fetch the instruction from the memory into IR register of the p rocessor.
2. Decode the instruction.
3. Add the contents of Rl and RO 3nd store the =ult in the RO.
,_,. Example 1.2 : State the operations involved in the execution of Add LOCA, RO.
Solutlon : The instruction Add LOCA, RO adds the oi)erand at memory location
LOCA to the operand in register RO, and stores result in the register RO. The steps
involved in the execution of this instruction are :
1. Fetch the instruction &om the memory into the IR register of the processor.
2. Decode the instruction.
3. Fetch the second operand from memory location LOCA and add the contents
of LOCA and the contents of register RO.
4. Store the result in the RO.
Data Bus
Address Bus
Control Dus
1) Data Bus : The data bus consists o( 8, 16, 32 or more parallel signal lines. These
lines are used to send data to memory and output ports, and to receive data from
memory and input port Therefore, data bus lines are bi-directional. This means that
CPU can read data O(\ these lines from memory or &om a port, as well as send data
out on these lines to a memory location or to a port. The data bus is connected in
parallel to all peripherals. The communication between peripheral and CPU is
activated by giving output enable pulse to the peripheral. Outputs of peripherals are
floated when they are not in ~se.
2) Addresa Bus : It is an unidirectional bus. The address bus consists of 16, 20, 24
or more parallel signal lines. ~ these lines the CPU sends out the address of the
memory location or 1/0 port that is to be written to or read &om. Here, the
communication ls one way, the address is send from CPU to memory and 1/0 port
and hence these lines are unidirectional.
3) Co ntrol Bus ; The control lines regulate the activity on the bus. The CPU sends
signals on the control bus to enable the outputs of addressed memory devices or port
devices.
Typical control bus signals are :
Oock (CLK)
Reset
Ready
Hold
Processor
Memory
..
Memory
VO
VO
Control
bus
System
bus
Data
M
Address
~--
~--~ ~----"
bus
ProccsSO<
.
,.
~
'
Memory
:r
Memory
Input
......
,.
..
'
)<
System bus
Outpul
'
Jn a single bus structure all units are connected to common bus called system bus.
However, with single bus only two units can communicate with each other at a time.
The bus control lines are used to arbitrate multiple requests for use of the bus, The
main advantage of single bus structure is its low cost and its flexibility for attaching
peripheral devices.
The complexity of bus control logic depends on the amount of translation needed
between the system bus and CPU, the timing requirements, whether or not interrupt
management is included and the size of the overall system. For a small system, control
signals of the CPU could be used directly to reduce handshalcing logic. Also, drivers
and receivers would not be need.e d for the data and address lines. But large systems
with several interfaces would need bus driver and receiver circuits connected to the
bus in order to maintain adequate signal quality. In most of the processors,
multiplexed address and data buses are used to reduce the number of pins. During
first part of bus cycle, address is present on this buS. Afterwards, the same bus is used
for data transfer purpose. So latches are required to hold the address sent by the CPU
initially. Interrupt priority management is optional in a system. It is not required in
systems which use software priority management The complex system includes
hardware for managing the 1/0 interrupts to increase efficiency of a system. Many
manufacturers have made priority management devices. Programmable interrupt
controller (PIC) is the IC designed to fulfil the same task.
amongst these devices. The sharing mechanism co-ordinates the use of bus to
different devices. This co-<irdinatlon requires finite time called p ropagation
delay. When control of the bus posses from one device to another frequently,
these propagation delays are noticeable and affect the performance of computer
system.
2. When the aggregate data transfer demand approaches the capacity of the bus,
the bus may become a bottleneck.. In such situations we have to increase the
data rate of the bus or we have to use wider bus.
Now-a-days the data transfer rates for video controllers and network interfaces are
growing rapidly. The need of high speed shared bus is impractical to satisfy with a
single bus. Thus, most computer systems use the multiple buses. These buses have the
hierarchical structure.
Fig. 1.7 shows two bus configurations. The traditional bus connection uses three
buses : local bus, system bus and expanded bus. The high speed bus configuration
uses high-speed bus along with the three buses used in the traditional bus connection.
Here, cache controller is connected to high-speed bus. This bus supports connection to
high-speed LANs, such as Fiber Distributed Data Interface (FDDI), video and graphics
workstation controllers, as well as interface controllers to local peripheral buses
including SCSI and P1394.
Local bul
Main
momory
Locol l/O
controller
SCSI
Modem
...
(a) Traditional bus configuration
Serial
-I
1 -15
Ca cha
Maln
memoty
l otal 1/0
eonlJOPet
SCSI
FAX
p 1394
Video
lAN
Expansion
bus lntatf:lco
1.6 Software
Microcomputer software is divided into two broad categories, system software and
user software.
System Software
1 - 16
The editor is a program, which is used to aeate and modify source programs/text,
(letters, numbers, punctuation marks, assembly language programs, higher level
language programs such as PASCAL, C, FORTRAN etc.). The editor has commands
to change, delete or insert lines or characters.
Assembler
Assembler translates an assembly language source file that was aeated using the
ed1tor into machine language such as binary or object rode. The assembler reads the
source file of your program from the disk where you saved it after editing. An
assembler usually reads your source file more than once.
The assembler generates two files on the floppy or hard disk during these two
passes. The first file is called the object file. The object file contains the binary codes
for the instructions and information about the addresses of the instructions. The
second file generated by the assembler is called assembler list file. This file contains
the assembly language statements, the binary code for each instruction, and the offset
for each instruction.
In the first pass, the assembler performs the following operations :
1. Reading the source program instructions.
2. Creating a symbol table in which all symbols used in the program, together
with their attributes, are stored.
3. Replacing all mnemonic codes by their binary codes.
4. Detecting any syntax error in the source program.
5. Assigning relative addresses to instructions and data.
On a second pass through the source program, the assembler extracts the symbol
from the operand field and searches for it in the symbol tablc. U the symbol does not
appear in the table, the corresponding statement is obviously erroneous. If the symbol
does appear in the table, the symbol is replaced by its address or value.
Macro Assembler
A very useful facility provided by many assemblers is the use of macro. A macro
is a sequence of instructions to which a name is assigned. When the macro is
referenced by specifying its name, the macro assembler replaces the macro call by the
sequence of instructions that define the macro. The macro assembler functions in a
similar monner to the assembler described earlier. However it has to perform an
additional task of mocro expansion before the assembly program is translated into an
equivalent machine language program.
Cross Assembler
The distingu.i shing feature of a cross assembler is that .it is not written in the same
language used by the microprocessor that will execute machine code generated by the
assembler. Cross assembler is usually written in a high-level language such as
FORTRAN, l'!ASCAL, C which will make them machine independent. For example,
Z80 assembler moy be written in C and then the assembler may be executed on other
machine such as the Motorola 6800.
Meta Assembler
The most powerful assembler is the meta assembler because it supports many
different microprocessors.
Linker
A linker is a program used to join together several object files into on.e Inrge object
file. When writing large program,, it is usually much more efficient to divide the large
program into smaller modules. Each module can be individually written, tested and
debugged. When all the modules work, they can be linked together to form a Inrge
functioning program.
The linker produces a link file which contains the binary codes for all the
combined modules. The linker also produces a link map which contains the address
information about the link files. The linker, however, does not assign absolute
addresses to the program, it only assigns relative addresses starting &om zero. This
form of the program is said to be relocatable as it can be put anywhere in memory for
execution.
Locator
A locator is a program used to assign the specific addresses, at which the object
code is to be loaded into memory. A locator program that comes with the IBM PC
Disk Operating System (DOS) is called EXE2BIN.
1 -18
lnterpr1tler a nd Compiler
An interpreter processes higher, level
Create
language programs. A l a time, an interpreter
soun:e
program
executes one statement of the higher level
language. Unlike an interpr,eter, compiler
takes the source program written in higher
Read
source
level language and translates whole program
program
into a machine language. Fig. 1.8 shows the
operation of interpreter. The interpreter reads
Translate
a high level language statement of the source
statoment
tomochlno
program, translates the statement in to
code
machine code and, if it doesn't need
information
from
another instruction,
executes the code for that state:ment
immediately. It then reads the next high level
language source statement, translates it, and
executes it. BASIC programs are often
executed in this way.
The advantage of using an interpreter is
lhllt if an error is found, you can just correct
the source program lll\d immediately rerun
Fig. 1.8 Operation of Interpreter
it. The major disadvantage of the interpreter
approach is that an interpreted program runs
5 lo 25 times slower than the same program will
run after being compiled. The reason is that
Create
with an interpreter each statement must be
p<Ogram
translated to machine code every lime the
program is run.
""''""
Compile to
ttHocalable
mochino
cede
Therefore, it will run much faster than it would if executed by an interpreter. The
major disadvantage of the compiler approach is that when an error is found, it usually
must be corrected in the source program and the entire compile-load sequence
repeated.
Debugger
'
A debugger is a program which allows you to load your object code program into
system memory, execute the program, and debug it.
How does a debugger help in debugging
11
program ?
1. The debugger allows you to look at the contents of registers and memory
locations after your program runs.
2. It allows you to change the contents of register and memory locations and
rerun the program.
3. Some debuggers allow you to stop execution after each instruction so you can
check or alter memory and register contents.
4. A debugger also allows you to set a b rcakpoint at any point in your program.
When you run a progra.m, the system will execute instructions upto this
breakpoint and stop. You can then examine register and memory contents to
see if the results are correct at that poinl If the results are correct, you can
move the break point to a later point in your program. If results are not
correct, you can check the program up to that point to find out why they are
not correct.
In short, debugger tools can help you to isolate problems in your program.
Operating System
An operating system performs resource :management and provides an interface
between the user and the machine. A resoun:e may be the nticroprocessor, memory, or
an 1/0 device. Basically, an operating system. is a collection of system programs that
tells the machine what to do under a variety of conditions. Major operating system
functions include efficient sharing of memory, 1/0 peripherals, and the nticroprocessor
among several users. Along with DOS, UNIX and WINDOWS are the popular
operating systems used today.
1.7 Perfonnance
When we say one computer is faster than another, we compare their speeds and
observes that the faster computer runs a progiam in less time than other computers.
The computer center manager running a large server system may say a computer is
faster when it completes more jobs in an hour. The computer user is always interested
in reducing the time between the start and the completion of the program or event,
i.e. reducing the execution time. The execution time is also referred to as response
time. Reduction in response time increases the throughput (the total amount of work
done in a given time). The performance of the computer is directly related to
throughput and hence it is reciprocal of execution time.
1
PerformanceA = Execution timeA.
'This means that for two computers A and B if the performance of A is greater
than the performance of B, we have
PerformanceA > PerformanceB
1
1
>
Execution timcA
Execution time 8
If cqmputer A
25
- .. 2.S
10
and A is therefore 2.5 times faster than B.
In the above example, we could also say that computer B is 2.5 times slower than
computer /\, since
Performance 11 = 2.S
Performance 8
means that
Performance A
= Performance 11
25
For simplicity, we will normally use the terminology faster than when we try to
compare computers quantitatively. Because performace and execution time are
reciprocals, in.creasing perforamance requires decreasing execution time. To avoid the
potential confusion between the terms increasing and decreasing, we usually say
"improve performance" or "improve execution time" when we mean "increase
performance" and "decrease execution tim.e".
The ideal performance of a computer system is achieved when we have a perfect
match between the machine capability and the program behaviour. The machine
capability can be enhanced with better hardware technology, innovative architectural
features, and efficient resources monagemenL However, program behaviour is difficult
to predict since lt heavily depends on application and run-time conditions. The
program behaviour also depends on the algorithm design, data structures used,
lang\Ulge efficiency, programmer skill and compiler technology. Let us sec the factors
for projecting the performance of a computer.
nus formula makes it clear that the hardware designer can improve performance
by reducing either the length of the clock cycle or the number of clock cycles required
for a program.
1.7.3.1 Hardware Software Interface
The previous equation do not include any reference lo the number of instructions
needed for the program. However, since the compiler clearly generated instructions to
execute and the computer had to execute the instructions to run the program, the
execution time must depend on the number of instructions in a program.
For the execution of program, pnx:essor has to execute number of machine
language instructions. This number is denoted by N. The number N is the actual
number of instructions executed by the p rocessor and is not ne'cessarily equal to the
number of machine instructions in the machine language program. This is because
some instructions may be executed more than once in the loop and others may not be
executed at all. Each machine instruction takes one or more cycle time for execution.
nus time is required to perform various steps needed to execute machine instruction.
The average number of basic steps required to execute one machine Instruction is
denoted by 5, where each basic step is completed in one clock cycle. Thus, the
program execution time is given by
T = N;S
... (1)
... {J)
Where p is the number of processor cycles required for the instruction decode and
execute, m is the number of memory references needed, k is the ratio between
memory cycle and processor cycle, N is the machine instruction count, and R is the
clock rate. The above performance parameters, i.e. N, p, m, le, R are affected by four
system attributes : instruction set architecture, complier technology, CPU
implementation and control, ~d cac:tte and memory hierarchy, as shown Table 1.1.
Performance parametara
Syatem attribute
Machine
Instruction
count (N)
,/
Compiler technology
,/
Prouaaor
MemCN)'
Memory
cycln per
referencn per cees1
lnatnJC1fon (p) lnstn1C1fon (m) latency, k
,,,
,,,
,,,
Table 1.1
(R)
,/
,/
,/
Clock
rate
,/
1 - 24
The instruction set nrchitecture affects the machine instruction count (N), i.e. the
program length and the average processor cycles required per instruction (p). The
compil~r lechnology affects the value of N, p and the memory reference count (m).
The processor implementation and control determine the total proct.'SSOr time (p/R)
required. Finally, the memory technology and hierarchy design affect the :nemory
access latency {k/R).
Example 1.4 : Let us assume tltnt two computers use same instructWn set architecture.
Cqmputer A has a clock cycle time of 250 ps and a CPI of 2.0 for some program md
computer B has a clock cycle Hme of 500 ps and a CPI of 1.2 for tire same program.
Wlr~h computer is faster fo r this program and /Ty how much ?
Solution : We know that each computer executes the same number o f instructions for
the program; let's call this numbe.r N. First, find the number of processor clock cycles
for each computer :
CPU clock cyclesA
Nx2.0
N x 1.2
Thus we can say that computer A is faster. The amount faster is given by the rdtio
of the execution limes.
CPU Performance" = Execution times 8 = 600 N ps = l.2
CPU Performanceo
Execution limesA
500 N ps
We can conclude that computer A is 1.2 times faster than computer B for this
program.
1.7.3.2 Other Performance Measures
MIPS is an another way to measure the processor speed. The processor speed can
be measured in terms of million instructions per second (MU'S). It is given as
1
MIPS rate=
Average time required for the execution of instruction x 106
R
NxR
... (4)
=
CP!x 106
N x CPI x 106
MIPS rate = - - 6
Tx 10
Referring equation (2) we can also write
NxR
MIPS rate = - - C x 106
... (5)
\<Vhere C is the total number of clock cycles required to execute a given program
(NxCPI).
Throughput Rate
Another important measure of throughput is known as throughput rate. It
indicates a number of programs a system can execute per unit time. It is often
specified as programs/second. Throughput can be further measured separately for the
system fY'IJ and for the processor (W~ The processor throughput is given as
WP = Number of maChlne instructions executed per second
... ( )
6
Number of machine instructions per program
MIPS rate x 106
N
Load the operands from the main memory if they are not in the CPU
registers.
Store the results in the main memory unless they are to be retained in CPU
registers.
All instructions do not require to perform all steps listed above. When instruction
has all its operands in CPU registers, it will run faster whereas the instruction which
requires multiple memory accesses takes more time to execute. Let us consider two
programs P1 and P21 with instructions having all operands in the CPU and with
instructions having all operands in the memory, respectively. Also consider two
computers C 1 and C2 The clock speed of C1 is greater than the clock speed of Cz ;
however th.e memory access time in C1 is less than the memory access time in Cz.
With these computer conditions we can easily understand that the C1 will execute the
program P1 faster than Cz and Cz will execute the program P2 faster than C 1 In such
situation it is difficult to decide which computer is faster. Therefore, measures of
instructi.o n execution performance are based on average figures, which are usually
determined experimentally by measuring the run times of representative called
benchmark programs. In r~ent years, it has become popular to put together collection
of benchmarks to try to measure the performance of processors with a variety of
applications. The benchmark programs are different for checking the performance of
processor for different applications. According to applications the benchmark programs
are classified as :
Desktop Benchmark
Server Benchmark and
Embedded Benchmark
Desktop Benchmarks
We know that servers have to perform many functions, so there are multiple types
of benchmark programs for servers.
CPU throughput oriented benchmark .: This benchmark program can be used to
1 28
Automotive/ industrial
Consumer
Networking
Office automation
Telecommunications
The selected benchmark programs are compiled for the computer under test, and
the runnmg time on a real computer is mea.sur~'li. The same benchmark program is
also compiled ~nd run on Ute l'cference computer. A nonprofit orgllrtisatior~ called
System Performance Evaluation Corporation (SPEQ specified the benchmark programs
and reference computers in 1995 nnd again in 2000. For SPEC95, the reference
computer is the SUN SPARCStation 10/40 and for SPEC2000, the reference computer
is an Ultra-SPARC10 workstation with a 300 MHz Ultra SPARC-D processor.
The running time of a benchmark program is compared for computer under test
and the reference computer to decide the SPEC rating of the computer under test. The
SPEC rating is given by
SPEC ratin = Running time on the re fercncc computer
g
Running time on the computer under test
The SPEC rating for all selected programs is individually calculated and then the
geom.etric mean of the results is computed to determine the overall SPEC rating for
the computer under tesl It Is given by
I
SPEC rating =
(.n )ii
SPEC;
;J
whe.r e n is the nu.m ber benchmark pro~ams used for determining SPEC rating.
The computers providing higher performance have higher SPEC rating.
Addresses
Numbers
Characters
Logkal Data
Addresses : The addresses are in fact a form of data. In many situations, some
calculation must be performed on the operand reference in an instruction to determine
physical address.
Numbers : All c:Omputer supports numeric data types. The common numeric data
types are:
Integer or Fixed Point
Floating-point
Decimal
different characters can be represented. However, the ASCII encoded characters are
always stored and transmitted using 8-bits per character. The eighth bit may be set to
0 or used as a parity bit for error detection.
Another code used to encode cha racteIS is the Extended Binary Coded Decimal
Interchange Code ( EBCDIC).
Logical Data : Most of the processors interpret data as a bit, byte, word, or double
wo.r d. These are referred to as units of data. When data is viewed as n, !bit items of
data, each item having the value 0 or 1, it is considered as a logical data. The logical
data is used to store an array of Boolean or binary data items and with logical data
we can manipulate the bits of data items.
A computer has a set of instructions that allows the user to formulate any
data-processing task. To carry out tasks, regardless of whether a computer has 100
instructions or 300 instructions, its instructions must be capable of performing
following basic operations :
1/0 control.
Data Transfer Instructions : Data transfer instructions include the instructions for
data transfer between the memory and the processor register. The instructions may
include the byte transfer or word transfer instructions.
Arithmetic or Loglcal lnatructlona : These instructions are also Jcnown as data
processing instructions. The arithmetic instructions provide computational capabilities
for processing numeric data, whereas logic instructions provide capabilities of
performing logical operations on the bits of a word.
Program Sequencing and Control lnstruetlona :
This instruction type mainly
includes test and branch instructions. Test instructions are used to test the value of a
data word or the status of a computation. Branch instructions are used to branch to a
different set of instructions depending on the decision made.
110 Conlrol : The J/0 control instructions include the instructions for data transfer
between processor and input/output devices. The instructions may include the byte
transfer or word transfer instructions.
to
Before going to discuss these instructions we understand first some notations used
these instructions.
~present
Processor registers are represented by notations RO, Rl, R2, ... and so on.
~ch
on.
The contents of register or memory location are denoted by placing square
brackets around the name of the register or memory location.
Let us see following exo.mplcs for clear understanding.
Example :
R2
[LOC]
This expression states that the contents of memory location LOC are transferred
into the processor register R2.
Example :
R3
[Rl) + [R2)
This expression states that the contents of processor registers Rl and R2 are added
and the result is stored into the processor register R3.
Example : [LOCI ~ (Rl) - [R2J
This expression states that the contents of the processor register R2 is subtracted
from processor register Rl and the result :is stored into the memory location LOC.
The notations explained above are commonly known as register transfer notations
(RlN). In these notations, the data represented by the right-hand side of the
expression is transferred to the location specified by the left hand side of the
MOVE R2, R1
This expression states that the contents of processor register R2 are transferred to
processor register Rl. Thus the contents register R2 remain unchanged but contents of
register Rl are overwritten.
This expression states th.at the contents of processor registers Rl and R2 are added
and the result Is stored in the register R3.
lt is important to note that the above expressions written in the assembly language
notations has three fields : operation, source and destination having their positions
from left to right. This order is followed by many computer. But there are many
computers In which the order of source and destination operands is reversed.
ADD A, B, C
where A, B, C are the variables. These variable names are assigned to distinct
locations in the memory. ln this instruction operands A and B are called source
operands and operand C is called destination operand and ADD is the operation to be
performed on the operands. Thus the general instruction format for three address
instruction Is
Operation Source 1, Source 2, Destination
The number of bits required to represent such instruction include :
L Bits required to specify the three memory addresses of the three operands. U
n-bits are required to specify one memory address, 3n bits are required to
specify three memory addresses.
To represent this instruction less number of bits are required as compared to three
address instruction. The number of bits N!quired to represent two address instruction
include :
1. Bits required to specify the two memory addresses of the two operands, i.e. 2n
bits.
2 Bits required to specify the operation.
1.8.3.3 One Address lnstnlctlon
The one address instruction can be represented symbolically as
ADD B
This instruction adds the contents of variable A into th.e processor register called
accumulator and stores the sum back into the accumulator destroying the previous
contents of the accumulator. In this instruction the second operand is assumed
implicitly in " unique location accum:.tlator. The general instruction format for one
address instruction is
Operation Source
STORE B :
In one address instruction, it is important to note that the operand specified in the
instruction can be either source operand or destination operand depending on the
instruction. For example, in LOAD A instruction, the operand specified in the
instruction is a source operand whereas the operand specified in the SfORE B
instruction is a destination operand. Similarly, in one address instruction the implied
operand (accumulator) can be either source or destination depending on the
instruction.
From above discussion we can easily understand that the instruction with only one
address will require less number of bits to represent it, and instruction with three
addresses will require more number of bi ts to represent it. Therefore, to access entire
instruction from the memory, the instruction with th.r ee addresses requires more
memory accesses while instruction with one address requires less memory accesses.
The speed of instruction execution is m!linly depend on how much memory accesses it
requires for the execution. U memory accesses arc more, mor~ Wnc is required' lo
execute the instruction. Therefore, the execution time for three address instructions is
more than the execution time for one address instructions.
To have a less execution lime we have to use instructions with minimum memory
accesses. For this instead of referring the operands from memory it is advised to refer
operands from processor registers: When machine level language programs are
generated by compilers from highlevel languages, lhe intelligent compilers see that
the maximum references to the operands lie in the processor registers.
Fotchcyde
Decode lnstNQion
Detodecyde
Exec:u1e lnstsuctlon
Execute cydo
execution.
1 3e
new instruction cycle from where it has been interrupted. The Fig. 1.11 shows
instruction cycle with interrupt cycle.
Fetch cycle
De.code ln&tructlon
Execute Instruction
No
Exeaito cycle
lnlerrupl cycle
In th.is cycle, the instruction is fetched from the memory location whose adcltess is
in the PC. This instruction is placed in the instruction register (IR) in the processor.
Instruction Decode Cycle :
In this cycle, the opcode of the instruction stored in the mStruction register is
decoded/ examined to determine which operation is to be performed.
Instruction Execution Cycle :
In this cycle, the specified operation is performed by the processor. This often
involves fetching operands &om the memory or from processor registers, performing
an arithmetic or logical operation, and storing the result in the destination location.
During the instruction execution, PC contents arc incremented to point to the next
instru.ction. After completion of execution of the current instruction, the PC contains
the address of the next instruction, and a new instruction fetch cycle can begin.
1.8.5 Branching
Everytime it Is not possible to store a program in the consecutive memory
locations. After execution of decision making instruction we have to follow one of the
two program sequences. In such cases we c:an not use straightline sequencing. Here,
we have to use branch instructions to transfer the program control from one
straightline sequence to another straight-line sequence of instruction, as shown in
following program.
For example, see the
have to check whether A
A - B or B - A.
HOV NOMl,
HOV NUM2 ,
CMP RO,
JB N&XT
SUB RO, Rl
MOV R2, Rl
SUB Rl, RO
NUM2 +- NUM2 - NUMl
MOV R2, RO
; Store the result in R2
In the above program we have used JB NEXT instruction to transfer the program
control to the instruction SUB Rl, RO if NUMl is less than NUM2. Thus we have
decided to branch the program control after checking the condition. Such branch
instructions are cnlled conditional branch instructions. We discuss more about it in
section 1.8.6. In branch instructions the new address called target address or branch
target is loaded into PC and instruc:tion is fetched from the new address, instead of
the instruction at the location that follows the branch instruction in sequential address
order.
NEXT:
the conditional branch instruc:tions are used for program looping. In looping, the
program is instructed to execute certain set of instructions repeatedly to execute a
particular task number of times. For example, to add ten numbers stored in the
consecutive memory locations we have to perform addition ten times.
the program loop is the basic structure which forces the processor to repeat a
sequence of instructions. Loops have four sections.
1. Initialization section.
2
Processing section.
value.~
of
other variables
2. The actual data manipulation occurs in the processing section. lltls is the
..
Inidolization section
lnlUallzaUon secllon
loop eontro4 section
Yes
Resutl seclion
Flowchart 1
Flowchart 2
Note : The processor executes initialization section and result section only once,
while it may execute processing section and loop control section many times. Thus,
the execution time of the loop will be mainly dependent on the execution time of the
processing section and loop control section. The flowchart 1 shows typical program
loop. The processing section in this flowchart is always executed at least once. If you
interchange the position of the processing and loop control section then it is possible
that the processing section may not be executed at all, if necessary. Refer flowchart 2.
Bu ie Structure of Comput.r
bits In the status register. Status bits lead to a new set of microprocessor Instructions.
These lnstru~ons permit the execution of a progTam to change flow on the basis of
the condition of bits in the status register. So the condition bits in the status register
can be used to take logical decisions within the program. Some of the coll\D\Dn
condition code nags are:
1) Cany/Borrow : The carry bit is set whe n the summation of two 8-bit numbers is
greater than 1111 1111 (FFH). A borrow is 'g enerated when a large number is
subtracted from a smaller number.
2) Zero : The zero bit Is set when the contents of register are zero after any
operation. This happens not only when you decrement the register, but also when any
arithmetic or logical operation causes the contents of register to be zero.
3) Negative or Sign :
bit. If this bit is logic 1, the number is nega live number, othetWise
11
positive mml>er.
The negative bit or sign bit is set when an y arithmetic or logical operation gives a
negative result.
I
s -----------......----./
Magnitude
Fig. 1.12
5) overflow Flag : In 2's complement arithmetic, most significant bit is used to
represent sign and remaining bits are used to represent magnitude of a number (see
Fig. 1.12). This flag is set if the result of a signed operation is too large to fit in the
number of bits available (7-bits for 8-bit number) to represent it.
For example, if you add the 8-bit signed number 01110110 (+118 decimal) and the
8-bit signed n umber 00110110 (+ 54 decimal). The result will be 10101100 (+ 172
decimal), which is the correct binary result, but in this case it is too large to fit in the
7-bits allowed for the magnitude in an 8-bit signed number. The overflow flag will be
set after this operation to indicate that the result of the addition has overflowed into
the sign bit.
6) Parity : When the result of an operation leave the indicated register with an even
number of l's, parity bit is set.
MOV
ADD
INC
DEC
Rl, 0
Rl, (R2 )
R2
RO
JNZ BACK
I nitiali~e
the counter
Result 0
Result~esult
+ a r ray e l ement
if count
0, repeat
Operations
au 3 architectures :
Slack
Accumulator
GPR
PUSH A
LOAOA
LOAO R1, A
PUSH B
AOD B
ADD R1, B
ADO
STORE C
STORE R1, C
POP C
Not all processors can be neatly tagged into one of the above categories. The Intel
8086 has many instructions that use implicit operands although it has a general
register set The Intel 8051 is another example, it has 4 banks of GPRs but most
instructions must have the A re~er as one of its operands.
Let us see the advantages and disadvantages of above instruction set architecture.
Stack
Advantages :
registers.
Makes code generation easy. Data can be stored for long periods in
1 - 42
1.10 RISC-CISC
As we mentioned before most modern CPUs are of the GPR (General Purpose
Register) type. A few examples of such CPUs are the IBM 360, DEC VAX, Intel 80x86
and Motorola 68xxx. But while these CPUs were clearly better than previous stack and
accumulator based CPUs, they were still Lacking in several areas :
I. Instructions were of varying length from l byte to 6-8 bytes. This causes
problems with the pre-fetching a:nd pipelirung of instructions.
2.
ALU {Arithmetic Logical Unit) instruction.q could have operands that were
mcmor:y locations. Bccnusc the nwnbcr of cycles it takes to access memory
varies so does the whole instruction. This isn't good for complier . writers;
pipelirUng and multiple issue.
3. Most ALU instruction had only 2 operands where one of the operands is also
the destination. This means this operand is destroyed during the operation or
it must be saved before somewhere.
Thus in the early SO's the idea of RISC was introduced. The SPARC project was
started at Berkeley and the MIPS project at Standford. RJSC stands for Reduced
Instruction Set Comp uter. The ISA is composed of instructions that all have exactly
the same size, usually 32-bits. Thus they can be pre-fetched and pipelined successfully.
AU ALU instructions have 3 operands which are only registers. The only memory
access is through explicit LOAD/STORE instructions.
Thus A = B + C will be assembled as :
Load
Rl,A
Load
R2,B
Add
R3,Rl,R2
ST.ORE
C, R3
Instruction and
clala path
M icropiogranvned
oontrol memo
Mainmomcty
data cache
unified cache
Fig . 1.13
As shown in Fig. 1.13, RISC architecture uses separate instruction and data caches.
Their access paths are also different (Note : exceptions do exist). In a CISC processor.
there is a unified cache for holding both, instructions and data. Therefore they have to
share the same path for data and instruction.
The hardwired control is found in most RISC processors while the traditional CISC
processors use microprogrammed control. Thus the control memory is needed in these
processors. This may signi.6cantly slow down the instruction execution. The
modern C1SC processors may also use hardwired control. So split caches and
hardwired control are not exclusive in RISC machines.
asc
Let us compare the characteristics of RISC and CJSC processors. Table 1.2 shows
the comparison between them.
No.
RISC
CISC
..
Few ln1tructlon1
Many ln1tructlon1
10
11
Highly pipelined
12
MOV RI, R2
Example:
MOVE A, 2000
The above instruction copies the contents of memory location 2000 into the A
register. As shown in the instruction, here, address of operand is given explicitly in
the instruction.
'The constant for address a.n d data can be represented by immediate addressing
mode in the assembly language programming;. Let us see Immediate addressing mode.
3. Immediate Mode :
Example :
MOVE 20 (Rt), R2
The above instruction loads the conlents of register R2 into the memory location
whose address is calculated by addition of the conh?nts of register Rl and constant
value (offset or displacement) 20.
Anays are the most common way to structure the data. The array data structure is
easy to handle and moreover it is supported by almost all the programming languages
such as 'C', Pascal, Fortran etc. Many programming problems can be solved using
array.
For enmple : We need the information about the marks obtained by the students
in a particular subject or we want to know salary of every employee in some
company. Then such information can be collectively stored in array data structure.
We can visualize an array as :
10
20
2
40
70
101
30
45
78
75
100
Here a (2)
= 40
a (8) = 75
and so on
:I I I I I I I
Fig. 1.15 2 Dimensional arny
The two dimensional array elements can be accessed by index addressing mode.
Using index addressing mode we can specify the row address in the register Md
offset gives the column address. For example : MOV R2, lO{Rl). In lh.is instruction Rt
specifies the row the address and offset 10 gives the column address.
6. Relative Mode :
This addressing mode is commonly used to specify the target address in branch
instructions. For example,
JNZ BACK
This instruction causes program execution to go to the branch target location
identified by the name BAd<, if the branch condition is sntisfied. The branch target
location Cilll be determined by specifying it an offset from the current value of the
program counter. Since the branch target location may either before or alter the branch
instruction, the offset is specified as n signed n umber.
'Ibis instruction, initially decrements the contents of register RO and then the
decremented contents of register RO are used to address the memory location. Finally,
the contents from the addressed memory location are copied into the register Rl.
Register addressing
specified in the source register fields and the result is stored in the destination register
field. Fo.r example : ADD R1, ~. R3 ; R3 +- R1 + ~ In case of memory access one
source register specifies the memory address and second source register specUies the
offsel For example, LO (R 1) ~ ; RJ +- M [R 1 + ~).
Immediate Operand Adci=sing : In immediate Op<?rand addressing mode, the
second source is an immediate operand. The operation is performed with the data
specified in the souzce register field and the immediate operand, and the result is
stored in the destination register field. For example : ADD R1, # 100, R:i ;
l{z <- R1 + 100
Relative to PC Addressing : In relative to PC addressing, the instruction usually
consists of three fields : opcode field, condition field and address field. Opcode field
specifics the operation. The condition field specifies one of many possible branch
conditions, and address field specifies the signed offset which is to be a dded to the
contents of PC to calculate new address when branch condition is satisfied. For
example, JMP COND, R1 (~) ; PC +- R1 + ~ If condition is satisfied.
Review Questions
l . Ezptam tit< IJGrimts 1yp.. of computm ond their "Pl'liaitions.
2. Dnrw ml explain tit< block diagram of simple compul6 with fiw ftmdiDrud uniis.
(CSE Nov.JDec.-2004, CSI! April/Miy-20041
3. Summoriu lht IJf"TOlion of wmputn.
4. Defiru amrpultr me,,,ory aml compwt<r program.
S. zplai1! the ftmction of tadr ftmdlonal unit in Ille computtr system.
6. Ezptam !ht stoml P"'K'am coruq>t.
7. Ezplaht the we cf program counter and uislruclion rrgisltr.
8. Wltat is the role of prognun counter in AddrtSSing ?
(CS!! NovJDtt.-2003)
9. Dnrw aml o;plain !ht connlwlU bttW<OI !ht proas>or ml th< lllllin memory.
JO. Wltal do you mtmt by lnttm1pt ?
JI. Wltat Is na!M lnttm1pt ?
12. Explain lht 1ingle liru bus St1Vctur<.
13. Ezp_lain the multipa bus stn1<1urt.
l4. Dtfinr :
a) Encutlon timr
b) /!aponse t/mr
15. Defint proa:ssor clock ml clock nite.
16. Explain the rdation cf throughput with o:eucticn time nd mponse time.
17. Dtfint Mll'S rm and throughput 111tt.
18. Wltat Is MFWPS ? W/oaJ Is its signijiun ?
20.
21.
22.
23.
24.
25.
~t
athilcdurt..
Q.1
Ans. :
Which data structures can he hest supporttd using (a) indirect addressing ~
(b) indexed addressing mode?
!May/June-2006, CSE/IT, 2. Marb)
Q.2
Ans. : Indirect addressing mode suppo:rts pointers and indexed addressing mode
supports a.rraxs.
Expl1lin in detail the different types of instructions that are s11pporttd in a typical
processor.
(May/June-2006, CSE/IT, 10 Marks)
Q.3
Registers Rl and R2 of a computer contain t11e decimal oalues 1200 and 2400
respectiwly. What is the effectiw address of t11e memory operand in each of the
fol/awing instructions?
1) Load 20 (R1), RS
2) Add - (R2), RS
4) Sub (R1) +, R5
3) M1111c # 3000, R5
Ana. :
R1 1200 R2 2400
Sr. No.
Q.5
ln.truc:tlon
Effective addreu of
memory
Load 20(R1), RS
1200 + 20 1220
Add - (R2), RS
200 - 1
Movo #3000, RS
3000
Sub (Rl)+. RS
1200
2399
Ans. :
Q.6
Ans. :
Q.7
What are the fou r basic types of uperalions that need to he supporred '1y an
instruction set?
[NovJDec.-2006, CSl!/IT, 2 Marica)
Refer section 1.8.
Ans. :
Define addressing mode. Classify addressing 7IWdes and explain etlCh type with
txllmp/es.
[NovJDec.-2006, CSl!/IT, 10 Marks)
Q.8
Ans. :
Q.9
Ans. :
11\ns. :
Q.11
Ana. :
= 3 bit
23 = 8 > 7
= 32 -
20 - 3 - 6
26 64 > 60
=3 bits
Q.14 What are the various types of lnstructimt Set Architectures (ISAs) possible ? Discuss.
[May/June-2007, ECF., 8 Marks)
1 - 52
Q.16 How m11ny 128 x 8 RAM using chips art ntltd to provide a menwry CllpllCity of
2048 bytes ?
(M ay!June-2007, ECE, 2 Marks)
Ans.: 16
Q.1 7
Ans. : An instruction format defines the :layout of the bits of an instruction. It must
include opcode, zero or more operands and addressing mode for each operand. The
Q.19
in the instruction.
with an
Draw the single bus and three bus organiultion of the data path inside a proassor.
INov.IDec.2007, ECl1. 4 Marbl
is the data bus in most microprocessors bidirectional while the address bus is
unidirectional ?
[May/June-2007, CSE, 2 Mub)
Q.27 Why
Ans. : The idea of having computer wired for general computations with program
stored in memory was introduced by John Von Neumann when he was working as a
consultant at the Moore school. He a.nd originators of ENlAC designed the first
stored program computer named EDVAC (F.IC!Ctronlc Discrete Variable Computer) . .The
stored program concept in EDVAC facilitated the users to enter and alter various
programs and do variety of computations.
The EDVAC project was further d eveloped by Von Neumann with his
collaborators at the Institute for Advanced studies (lAS) in Princeton. They came up
with a new machine referred to as lAS or Von Neumann Machine. It has now
become the usual frame of reference for many modem computers.
Fig. 1.18 shows the general
structure of a Von Neumann
machine. It consists of five basic
units whose functions can be
summarized as follows :
Input
unit
unft
The control unit fetches and interprets the instructions in memory and causes
them to be executed.
The output unit transmits final results a.nd messages to the ou.tside world.
In the original !AS machine (Von Neumann machine), memory unit consists of
4096 storage locations (2 12 =4096) of 4-0-bits each, referred to as words. These
memory locations are used to store data as well as instructions.
Q.29 What art IM major instruction design iss11es ?
Number of operands
Number of register ~
Q.30 Registers RI and R2 of a computer contain lht thcimnl t1alues 1200 and 4600. What
is IM effective address of IM memory operand in each of the fol/Qwing instructions ?
a) Load 20 (Rl), RS
Ans. : a) EA : 1200 + 20
b) EA: 4599.
viewtd critic.ally ?
What are the Sbjtwarts used in a computer to aperate all the functional units 7
Discuss /Jriefly on the bus structures.
[NovJDec.-1008, ECE, 6 Mukai
Q.41
Q.42 What are the major functions of system software in typical computer ?
[May/June-2009, ECll;, 1 Mub)
Ans. : Refer section 1.6.
Mad<~)
"
ODO