mcs-012 Study Materials
mcs-012 Study Materials
1.0 Introduction 5
1.1 Objectives 5
1.2 The von Neumann Architecture 5
1.3 Instruction Execution: An Example 9
1.4 Instruction Cycle 12
1.4.1 Interrupts
1.4.2 Interrupts and Instruction Cycle
1.5 Computers: Then and Now 18
1.5.1 The Beginning
1.5.2 First Generation Computers
1.5.3 Second Generation Computers
1.5.4 Third Generation Computers
1.5.5 Later Generations
1.6 Summary 29
1.7 Solutions/Answers 29
1.0 INTRODUCTION
The use of Information Technology (IT) is well recognised. IT has become a must for
the survival of all business houses with the growing information technology trends.
Computer is the main component of an Information Technology network. Today,
computer technology has permeated every sphere of existence of modern man. From
railway reservations to medical diagnosis, from TV programmes to satellite launching,
from matchmaking to criminal catching ! everywhere we witness the elegance,
sophistication and efficiency possible only with the help of computers.
In this unit, you will be introduced to one of the important computer system
structures: the von Neumann Architecture. In addition, you will be introduced to the
concepts of a simple model of Instruction execution. This model will be enhanced in
the later blocks of this course. More details on these terms can be obtained from
further reading. We have also discussed about the main developments during the
various periods of computer history. Finally, we will discuss about the basic
components of microprocessors and their uses.
1.1 OBJECTIVES
After going through this unit you will be able to:
5
Introduction to Digital architecture, let us first define the term ‘computer’ as this will help us in discussing
Circuits
about von Neumann architecture in logical detail.
Computer is defined in the Oxford dictionary as “An automatic electronic apparatus
for making calculations or controlling operations that are expressible in numerical or
logical terms”.
The definition clearly categorises computer as an electronic apparatus although the
first computers were mechanical and electro-mechanical apparatuses. The definition
also points towards the two major areas of computer applications viz., data
processing’s and computer assisted controls/operations. Another important aspect of
the definition is the fact that the computer can perform only those operations/
calculations, which can be expressed in Logical or Numerical terms.
Some of the basic questions that arise from above definition are:
How are the data processing and control operations performed by an electronic device
like the computer?
Well, electronic components are used for creating basic logic circuits that are used to
perform calculations. These components are further discussed in the later units.
However, for the present discussion, it would be sufficient to say that there must be a
certain unit that will perform the task of data processing and control.
What is the basic function performed by a computer? The basic function performed by
a computer is the execution of the program. A program is a sequence of instructions,
which operates on data, to perform certain tasks such as finding a prime number. The
computer controls the execution of the program.
What is data in computers? In modern digital computers data is represented in binary
form by using two symbols 0 and 1. These are called binary digits or bits. But the data
which we deal with consists of numeric data and characters such as decimal digits 0 to
9, alphabets A to Z, arithmetic operators (e.g. +,-, etc.), relations operators (e.g. =, > ,
etc.), and many other special characters (e.g.;,@,{,], etc.). Therefore, there has to be a
mechanism for data representation. Old computers use eight bits to represent a
character. This allows up to 28 = 256 different items to be represented uniquely. This
collection of eight bits is called a byte. Thus, one byte is used to represent one
character internally. Most computers use two bytes or four bytes to represent numbers
(positive and negative) internally. The data also includes the operational data such as
integer, decimal number etc. We will discuss more about data representation in the
next unit.
Thus, the prime task of a computer is to perform instruction execution. The key
questions, which can be asked in this respect, are: (a) how are the instructions
supplied to the computer? and (b) how are the instructions interpreted and executed?
Let us answer the second question first. All computers have a Unit that performs the
arithmetic and logical functions. This Unit is referred to as the Arithmetic Logic Unit
(ALU). But how will the computer determine what operation is to be performed by
ALU or in other words who will interpret the operation that is to be performed by
ALU?
This interpretation is done by the Control Unit of the computer. The control unit
accepts the binary form of instruction and interprets the instruction to generate control
signals. These control signals then direct the ALU to perform a specified arithmetic or
logic function on the data. Therefore, by changing the control signal the desired
function can be performed on data. Or conversely, the operations that need to be
performed on the data can be obtained by providing a set of control signals. Thus, for
a new operation one only needs to change the set of control signals.
The unit that interprets a code (a machine instruction) to generate respective control
signals is termed as Control Unit (CU). A program now consists of a sequence of
codes. Each code is, in effect, an instruction, for the computer. The hardware
6
The Basic Computer
interprets each of these instructions and generates respective control signals such that
the desired operation is performed on the data.
The Arithmetic Logic Unit (ALU) and the Control Unit (CU) together are termed as
the Central Processing Unit (CPU). The CPU is the most important component of a
computer’s hardware.
All these arithmetic and logical Operations are performed in the CPU in special
storage areas called registers. The size of the register is one of the important
considerations in determining the processing capabilities of the CPU. Register size
refers to the amount of information that can be held in a register at a time for
processing. The larger the register size, the faster may be the speed of processing.
But, how can the instructions and data be put into the computers? The instructions and
data to a computer are supplied by external environment; it implies that input devices
are needed in the computer. The main responsibility of input devices will be to put the
data in the form of signals that can be recognised by the system. Similarly, we need
another component, which will report the results in proper format. This component is
called output device. These components together are referred to as input/output (I/O)
devices.
In addition, to transfer the information, the computer system internally needs the
system interconnections. At present we will not discuss about Input/Output devices
and system interconnections in details, except the information that most common
input/output devices are keyboard, monitor and printer, and the most common
interconnection structure is the Bus structure. These concepts are detailed in the later
blocks.
Input devices can bring instructions or data only sequentially, however, a program
may not be executed sequentially as jump, looping, decision-making instructions are
normally encountered in programming. In addition, more than one data element may
be required at a time. Therefore, a temporary storage area is needed in a computer to
store temporarily the instructions and the data. This component is referred to as
memory.
The memory unit stores all the information in a group of memory cells such as a
group of 8 binary digits (that is a byte) or 16 bits or 32 bits etc. These groups of
memory cells or bits are called memory locations. Each memory location has a unique
address and can be addressed independently. The contents of the desired memory
locations are provided to the CPU by referring to the address of the memory location.
The amount of information that can be held in the main memory is known as memory
capacity. The capacity of the main memory is measured in Mega Bytes (MB) or Giga
Bytes (GB). One-kilo byte stands for 210 bytes, which are 1024 bytes (or
approximately 1000 bytes). A Mega byte stands for 220 bytes, which is approximately
a little over one million bytes, a giga byte is 230 bytes.
Let us now define the key features of von Neumann Architecture:
The control unit (CU) interprets each of these instructions and generates respective
control signals.
! The Arithmetic Logic Unit (ALU) performs the arithmetic and logical
Operations in special storage areas called registers as per the instructions of
control unit. The size of the register is one of the important considerations in
determining the processing capabilities of the CPU. Register size refers to the
7
Introduction to Digital amount of information that can be held in a register at a time for processing.
Circuits
The larger the register size, the faster may be the speed of processing.
! An Input/ Output system involving I/O devices allows data input and reporting
of the results in proper form and format. For transfer of information a computer
system internally needs the system interconnections. One such interconnection
structure is BUS interconnection.
! Main Memory is needed in a computer to store instructions and the data at the
time of Program execution. Memory to CPU is an important data transfer path.
The amount of information, which can be transferred between CPU and
memory, depends on the size of BUS connecting the two.
! It was pointed out by von-Neumann that the same memory can be used for
Storing data and instructions. In such a case the data can be treated as data on
which processing can be performed, while instructions can be treated as data,
which can be used for the generation of control signals.
! The von Neumann machine uses stored program concept, i.e., the program
and data are stored in the same memory unit for execution. The computers prior
to this idea used to store programs and data on separate memories. Entering and
modifying these programs was very difficult as they were entered manually by
setting switches, plugging, and unplugging.
! Execution of instructions in von Neumann machine is carried out in a sequential
fashion (unless explicitly altered by the program itself) from one instruction to
the next.
Figure 1 shows the basic structure of a conventional von Neumann machine
A von Neumann machine has only a single path between the main memory and
control unit (CU). This feature/constraint is referred to as von Neumann bottleneck.
Several other architectures have been suggested for modern computers. You can know
about non von Neumann architectures in further readings.
Check Your Progress 1
1) State True or False T/F
9
Introduction to Digital
Circuits
The instruction execution is performed in the CPU registers. But before we define the
process of instruction execution let us first give details on Registers, the temporary
storage location in CPU for program execution. Let us define the minimum set of
registers required for von Neumann machines:
Accumulator Register (AC): This register is used to store data temporarily for
computation by ALU. AC is considered to contain one of the operands. The result of
computation by ALU is also stored back to AC. It implies that the operand value is
over-written by the result.
Memory Address Register (MAR): It specifies the address of memory location from
which data or instruction is to be accessed (read operation) or to which the data is to
be stored (write operation). Refer to figure 3.
Memory Buffer Register (MBR): It is a register, which contains the data to be written
in the memory (write operation) or it receives the data from the memory (read
operation).
Program Counter (PC): It keeps track of the instruction that is to be executed next,
that is, after the execution of an on-going instruction.
Instruction Register (IR): Here the instructions are loaded prior to execution.
Comments on figure 3 are as follows:
! All representation are in decimals. (In actual machines the representations are in
Binary).
! The Number of Memory Locations = 16
! Size of each memory location = 16 bits = 2 Bytes (Compare with contemporary
machines word size of 16,32, 64 bits)
! Thus, size of this sample memory = 16 words (Compare it with actual memory)
size, which is 128 MB, 256 MB, 512 MB, or more).
! In the diagram MAR is pointing to location 10.
! The last operation performed was “read memory location 10” which is 65 in
this. Thus, the contents of MBR is also 65.
10
Figure 3: CPU registers and their functions
The Basic Computer
The role of PC and IR will be explained later.
Now let us define several operation codes required for this machine, so that we can
translate the High level language instructions to assembly/machine instructions.
Figure 4: Memory and Registers Content on execution or the three given Consecutive
Instructions (All notations in Decimals)
Step to be
S.No. How is it done Who does it
performed
1 Calculate the address The Program Counter (PC Control Unit (CU).
of next instruction to register stores the address
be executed of next instruction.
2. Get the instruction in The memory is accessed Memory Read
the CPU register and the desired operation is done. Size
instruction is brought to of instruction is
register (IR) in CPU important. In addition,
PC is incremented to
point to next
instruction in sequence.
3. Decode the The control Unit issues CU.
instruction necessary control signals
12
The Basic Computer
4. Evaluate the operand CPU evaluates the CPU under the control
address address based on the of CU
addressing mode
specified.
5. Fetch the operand The memory is accessed Memory Read
and the desired operands
brought into the CPU
Registers
Repeat steps 4 and 5 if instruction has more than one operands.
6. Perform the operation The ALU does evaluation ALU/CU
as decoded in steps3. of arithmetic or logic,
instruction or the transfer
of control operations.
7. Store the results in The value is written to Memory write
memory desired memory location
! First the address of the next instruction is calculated, based on the size of
instruction and memory organisation. For example, if in a computer an
instruction is of 16 bits and if memory is organized as 16-bits words, then the
address of the next instruction is evaluated by adding one in the address of
the current instruction. In case, the memory is organized as bytes, which can
be addressed individually, then we need to add two in the current instruction
address to get the address of the next instruction to be executed in sequence.
! Now, the next instruction is fetched from a memory location to the CPU
registers such as Instruction register.
! The next state decodes the instruction to determine the type of operation
desired and the operands to be used.
! In case the operands need to be fetched from memory or via Input devices, then
the address of the memory location or Input device is calculated.
! Next, the operand is fetched (or operands are fetched one by one) from the
memory or read from the Input devices.
! Finally, the results are written back to memory or Output devices, wherever
desired by first calculating the address of the operand and then transferring the
values to desired destination.
Please note that multiple operands and multiple results are allowed in many
computers. An example of such a case may be an instruction ADD A, B. This
instruction requires operand A and B to be fetched.
In certain machines a single instruction can trigger an operation to be performed on an
array of numbers or a string of characters. Such an operation involves repeated fetch
for the operands without fetching the instruction again, that is, the instruction cycle
loops at operand fetch.
Thus, a Program is executed as per the instruction cycle of figure 5. But what happens
when you want the program to terminate in between? At what point of time is an
interruption to a program execution allowed? To answer these questions, let us discuss
the process used in computer that is called interrupt handling.
1.4.1 Interrupts
The term interrupt is an exceptional event that causes CPU to temporarily transfer its
control from currently executing program to a different program which provides
service to the exceptional event. It is like you asking a question in a class. When you
ask a question in a class by raising hands, the teacher who is explaining some point
may respond to your request only after completion of his/her point. Similarly, an
interrupt is acknowledged by the CPU when it has completed the currently executing
instruction. An interrupt may be generated by a number of sources, which may be
either internal or external to the CPU.
14
The Basic Computer
Some of the basic issues of interrupt are:
Figure 6 Gives the list of some common interrupts and events that cause the
occurrence of those interrupts.
Interrupts are a useful mechanism. They are useful in improving the efficiency of
processing. How? This is to the fact that almost all the external devices are slower
than the processor, therefore, in a typical system, a processor has to continually test
whether an input value has arrived or a printout has been completed, in turn wasting a
lot of CPU time. With the interrupt facility CPU is freed from the task of testing status
of Input/Output devices and can do useful processing during this time, thus increasing
the processing efficiency.
How does the CPU know that an interrupt has occurred?
There needs to be a line or a register or status word in CPU that can be raised on
occurrence of interrupt condition.
Once a CPU knows that an interrupt has occurred then what?
First the condition is to be checked as to why the interrupt has occurred. That includes
not only the device but also why that device has raised the interrupt. Once the
15
Introduction to Digital interrupt condition is determined the necessary program called ISRs (Interrupt
Circuits
servicing routines) must be executed such that the CPU can resume further operations.
For example, assume that the interrupt occurs due to an attempt by an executing
program for execution of an illegal or privileged instruction, then ISR for such
interrupt may terminate the execution of the program that has caused this condition.
Thus, on occurrence of an Interrupt the related ISR is executed by the CPU. The ISRs
are pre-defined programs written for specific interrupt conditions.
Considering these requirements let us work out the steps, which CPU must perform on
the occurrence of an interrupt.
! The CPU must find out the source of the interrupt, as this will determine which
interrupt service routine is to be executed.
! The CPU then acquires the address of the interrupt service routine, which are
stored in the memory (in general).
! What happens to the program the CPU was executing before the interrupt? This
program needs to be interrupted till the CPU executes the Interrupt service
program. Do we need to do something for this program? Well the context of this
program is to be saved. We will discuss this a bit later.
! Finally, the CPU executes the interrupt service routine till the completion of the
routine. A RETURN statement marks the end of this routine. After that, the
control is passed back to the interrupted program.
16
The Basic Computer
ii) MAR and MBR both are needed to fetch the data /instruction from the
memory.
v) In case multiple interrupts occur at the same time, then only one of
the interrupt will be acknowledged and rest will be lost.
2) What is an interrupt?
.....................................................................................................................................
.....................................................................................................................................
.....................................................................................................................................
………………………………………………………………………………………..
The ancestors of modern age computer were the mechanical and electromechanical
devices. This ancestry can be traced as far back as the 17th Century, when the first
machine capable of performing four mathematical operations, viz. addition,
subtraction, division and multiplication, appeared. In the subsequent subsection we
present a very brief account of Mechanical Computers.
1.5.1 The Beginning
Blaise Pascal made the very first attempt towards automatic computing. He invented a
device, which consisted of lots of gears and chains which used to perform repeated
additions and subtractions. This device was called Pascaline. Later many attempts
were made in this direction.
Charles Babbage, the grandfather of the modern computer, had designed two
computers:
The Difference Engine: It was based on the mathematical principle of finite
differences and was used to solve calculations on large numbers using a formula. It
was also used for solving the polynomial and trigonometric functions.
The Analytical Engine by Babbage: It was a general purpose-computing device,
which could be used for performing any mathematical operation automatically. The
basic features of this analytical engine were:
! It was a general-purpose programmable machine.
! It had the provision of automatic sequence control, thus, enabling programs to
alter its sequence of operations.
! The provision of sign checking of result existed.
! A mechanism for advancing or reversing of control card was permitted thus
enabling execution of any desired instruction. In other words, Babbage had
devised the conditional and branching instructions. The Babbage’s machine was
fundamentally the same as the modern computer. Unfortunately, Babbage work
could not be completed. But as a tribute to Charles Babbage his Analytical
Engine was completed in the last decade of the 20th century and is now on
display at the Science Museum at London.
The next notable attempts towards computers were electromechanical. Zuse used
electromechanical relays that could be either opened or closed automatically. Thus,
the use of binary digits, rather than decimal numbers started, in computers.
19
Introduction to Digital The basic drawbacks of these mechanical and electromechanical computers were:
Circuits
! Friction/inertia of moving components limited the speed.
! The data movement using gears and liners was quite difficult and unreliable.
! The change was to have a switching and storing mechanism with no moving
parts. The electronic switching device “triode” vacuum tubes were used and
hence the first electronic computer was born.
The trends, which were encountered during the era of first generation computers were:
20
The Basic Computer
But how do we characterise the future generation of computers?
The generations of computers are basically differentiated by the fundamental
hardware technology. The advancement in technology led to greater speed, large
memory capacity and smaller size in various generations. Thus, second generation
computers were more advanced in terms of arithmetic and logic unit and control unit
than their counterparts of the first generation and thus, computationally more
powerful. On the software front at the same time use of high level languages started
and the developments were made for creating better Operating System and other
system software.
One of the main computer series during this time was the IBM 700 series. Each
successful member of this series showed increased performance and capacity and
reduced cost. In these series two main concepts, I/O channels - an independent
processor for Input/Output, and Multiplexer - a useful routing device, were used.
These two concepts are defined in the later units.
21
Figure 8: Silicon Wafer, Chip and Gates
Introduction to Digital An integrated circuit is constructed on a thin wafer of silicon, which is divided into a
Circuits
matrix of small areas (size of the order of a few millimeter squares). An identical
circuit pattern is fabricated in a dust free environment on each of these areas and the
wafer is converted into chips. (Refer figure 8). A chip consists of several gates, which
are made using transistors. A chip also has a number of input and output connection
points. A chip then is packaged separately in a housing to protect it. This housing
provides a number of pins for connecting this chip with other devices or circuits. For
example, if you see a microprocessor, what you are looking and touching is its
housing and huge number of pins.
Different circuits can be constructed on different wafers. All these packaged circuit
chips then can be interconnected on a Printed-circuit board (for example, a
motherboard of computer) to produce several complex electronic circuits such as in a
computer.
The Integration Levels:
Initially, only a few gates were integrated reliably on a chip. This initial integration
was referred to as small-scale integration (SSI).
With the advances in microelectronics technologies the SSI gave way to Medium
Scale Integration where 100’s of gates were fabricated on a chip.
Next stage was Large Scale Integration (1,000 gates) and very large integration (VLSI
1000,000 gates on a single chip). Presently, we are in the era of Ultra Large Scale
Integration (ULSI) where 100,000,000 or even more components may be fabricated
on a single chip.
What are the advantages of having densely packed Integrated Circuits? These are:
! Reliability: The integrated circuit interconnections are much more reliable than
soldered connections. In addition, densely packed integrated circuits enable
fewer inter-chip connections. Thus, the computers are more reliable. In fact, the
two unreliable extremes are when the chips are in low-level integration or
extremely high level of integration almost closer to maximum limits of
integration.
! Low cost: The cost of a chip has remained almost constant while the chip
density (number of gates per chip) is ever increasing. It implies that the cost of
computer logic and memory circuitry has been reducing rapidly.
! Greater Operating Speed: The more is the density, the closer are the logic or
memory elements, which implies shorter electrical paths and hence higher
operating speed.
! Smaller computers provide better portability
! Reduction in power and cooling requirements.
The third generation computers mainly used SSI chips. One of the key concept which
was brought forward during this time was the concept of the family of compatible
computers. IBM mainly started this concept with its system/360 family.
A family of computers consists of several models. Each model is assigned a model
number, for example, the IBM system/360 family have, Model 30,40, 50,65 and 75.
The memory capacity, processing speed and cost increases as we go up the ladder.
However, a lower model is compatible to higher model, that is, program written on a
lower model can be executed on a higher model without any change. Only the time of
execution is reduced as we go towards higher model and also a higher model has more
22
The Basic Computer
number of instructions. The biggest advantage of this family system was the flexibility
in selection of model.
For example, if you had a limited budget and processing requirements you could
possibly start with a relatively moderate model. As your business grows and your
processing requirements increase, you can upgrade your computer with subsequent
models depending on your need. However, please note that as you have gone for the
computer of the same family, you will not be sacrificing investment on the already
developed software as they can still be used on newer machines also.
Let us summarise the main characteristics of a computer family. These are:
But how was the family concept implemented? Well, there were three main features
of implementation. These were:
! Increased complexity of arithmetic logic unit;
! Increase in memory - CPU data paths; and
! Simultaneous access of data in higher end members.
The major developments which took place in the third generation, can be summarized
as:
! Application of IC circuits in the computer hardware replacing the discrete
transistor component circuits. Thus, computers became small in physical size
and less expensive.
! Use of Semiconductor (Integrated Circuit) memories as main memory replacing
earlier technologies.
! The CPU design was made simple and CPU was made more flexible using a
technique called microprogramming (will be discussed in later Blocks).
! Certain new techniques were introduced to increase the effective speed of
program execution. These techniques were pipelining and multiprocessing. The
details on these concepts can be found in the further readings.
! The operating system of computers was incorporated with efficient methods
of sharing the facilities or resources such as processor and memory space
automatically. These concepts are called multiprogramming and will be
discussed in the course on operating systems.
23
Introduction to Digital
Circuits
1.5.5 Later Generations
One of the major milestones in the IC technology was the very large scale integration
(VLSI) where thousands of transistors can be integrated on a single chip. The main
impact of VLSI was that, it was possible to produce a complete CPU or main memory
or other similar devices on a single IC chip. This implied that mass production of
CPU, memory etc. can be done at a very low cost. The VLSI-based computer
architecture is sometimes referred to as fourth generation computers.
The Fourth generation is also coupled with Parallel Computer Architectures. These
computers had shared or distributed memory and specialized hardware units for
floating point computation. In this era, multiprocessing operating system, compilers
and special languages and tools were developed for parallel processing and distributed
computing. VAX 9000, CRAY X-MP, IBM/3090 were some of the systems
developed during this era.
Fifth generation computers are also available presently. These computers mainly
emphasise on Massively Parallel Processing. These computers use high-density
packaging and optical technologies. Discussions on such technologies are beyond the
scope of this course.
However, let us discuss some of the important breakthroughs of VLSI technologies in
this subsection:
Semiconductor Memories
Initially the IC technology was used for constructing processor, but soon it was
realised that the same technology can be used for construction of memory. The first
memory chip was constructed in 1970 and could hold 256 bits. The cost of this first
chip was high. The cost of semiconductor memory has gone down gradually and
presently the IC RAM’s are quite cheap. Although the cost has gone down, the
memory capacity per chip has increased. At present, we have reached the 1 Gbits on a
single memory chip. Many new RAM technologies are available presently. We will
give more details on these technologies later in Block 2.
Microprocessors
Keeping pace with electronics as more and more components were fabricated on a
single chip, fewer chips were needed to construct a single processor. Intel in 1971
achieved the breakthrough of putting all the components on a single chip. The single
chip processor is known as a microprocessor. The Intel 4004 was the first
microprocessor. It was a primitive microprocessor designed for a specific application.
Intel 8080, which came in 1974, was the first general-purpose microprocessor. This
microprocessor was meant to be used for writing programs that can be used for
general purpose computing. It was an 8-bit microprocessor. Motorola is another
manufacturer in this area. At present 32 and 64 bit general-purpose microprocessors
are already in the market. Let us look into the development of two most important
series of microprocessors.
Hyper-threading:
Nonthreaded program instructions are executed in a single order at a time, till the
program completion. Suppose a program have 4 tasks namely A, B, C, D. Assume
that each task consist of 10 instructions including few I/O instructions. A simple
sequential execution would require A $ B $ C $ D sequence.
The other architecture that has gained popularity over the last decade is the power PC
family. These machines are reduced set instruction computer (RISC) based
technologies. RISC technologies and are finding their application because of
simplicity of Instructions. You will learn more about RISC in Block 3 of this course.
The IBM made an alliance with Motorola and Apple who has used Motorola 68000
chips in their Macitosh computer to create a POWER PC architecture. Some of the
processors in this family are:
S.No. Processor Year Bus Width Comment
The VLSI technology is still evolving. More and more powerful microprocessors and
more storage space now is being put in a single chip. One question which we have still
not answered, is: Is there any classification of computers? Well-for quite sometime
computers have been classified under the following categories:
! Micro-controllers
! Micro-computers
! Engineering workstations
! Mini computers
! Mainframes
! Super computers
! Network computers.
Microcomputers
A microcomputer’s CPU is a microprocessor. They are typically used as single user
computer although present day microcomputers are very powerful. They support
highly interactive environment specially like graphical user interface like windows.
These computers are popular for home and business applications. The microcomputer
originated in late 1970’s. The first microcomputers were built around 8-bit
microprocessor chips. What do we mean by an 8-bit chip? It means that the chip can
retrieve instructions/data from storage, manipulate, and process an 8-bit data at a time
or we can say that the chip has a built- in 8-bit data transfer path.
An improvement on 8-bit chip technology was seen in early 1980s, when a series of
16-bit chips namely 8086 and 8088 were introduced by Intel Corporation, each one
with an advancement over the other.
8088 was an 8/16 bit chip i.e. an 8-bit path is used to move data between chip and
primary storage (external path), but processing was done within the chip using a 16-
bit path (internal path) at a time. 8086 was a 16/16-bit chip i.e. the internal and
external paths both were 16 bits wide. Both these chips could support a primary basic
memory of storage capacity of 1 Mega Byte (MB).
Similar to Intel’s chip series exists another popular chip series of Motorola. The first
16-bit microprocessor of this series was MC 68000. It was a 16/32-bit chip and could
support up to 16 MB of primary storage. Advancement over the 16/32 bit chips was
the 32/32 chips. Some of the popular 32-bit chips were Intel’s 80486 and MC 68020
chip.
Most of the popular microcomputers were developed around Intel’s chips, while most
of the minis and super minis were built around Motorola’s 68000 series chips. With
the advancement of display and VLSI technology a microcomputer was available in
very small size. Some of these are laptops, note book computers etc. Most of these are
of the size of a small notebook but equivalent capacity of an older mainframe.
Workstations
The workstations are used for engineering applications such as CAD/CAM or any
other types of applications that require a moderate computing power and relatively
high quality graphics capabilities. Workstations generally are required with high
resolution graphics screen, large RAM, network support, a graphical user interface,
and mass storage device. Some special type of workstation comes, without a disk.
These are called diskless terminals/ workstations. Workstations are typically linked
together to form a network. The most common operating systems for workstations are
UNIX, Windows 2003 Server, and Solaris etc.
Please note that networking workstation means any computer connected to a local
area network although it could be a workstation or a personal computer.
Workstations may be a client to server Computers. Server is a computer that is
optimised to provide services to other connected computers through a network.
Servers usually have powerful processors, huge memory and large secondary storage
space.
Minicomputer
The term minicomputer originated in 1960s when it was realised that many computing
tasks do not require an expensive contemporary mainframe computers but can be
solved by a small, inexpensive computer.
27
Introduction to Digital The mini computers support multi-user environment with CPU time being shared
Circuits
among multiple users. The main emphasis in such computer is on the processing
power and less for interaction. Most of the present day mini computers have
proprietary CPU and operating system. Some common examples of a mini-computer
are IBM AS/400 and Digital VAX. The major use of a minicomputer is in data
processing application pertaining to departments/companies.
Mainframes
Mainframe computers are generally 32-bit machines or higher. These are suited to big
organisations, to manage high volume applications. Few of the popular mainframe
series were DEC, IBM, HP, ICL, etc. Mainframes are also used as central host
computers in distributed systems. Libraries of application programs developed for
mainframe computers are much larger than those of the micro or minicomputers
because of their evolution over several decades as families of computing. All these
factors and many more make the mainframe computers indispensable even with the
popularity of microcomputers.
Supercomputers
The upper end of the state of the art mainframe machine are the supercomputers.
These are amongst the fastest machines in terms of processing speed and use
multiprocessing techniques, where a number of processors are used to solve a
problem. There are a number of manufacturers who dominate the market of
supercomputers-CRAY, IBM 3090 (with vector), NEC Fujitsu, PARAM by C-DEC
are some of them. Lately, a range of parallel computing products, which are
multiprocessors sharing common buses, have been in use in combination with the
mainframe supercomputers. The supercomputers are reaching upto speeds well over
25000 million arithmetic operations per second. India has also announced its
indigenous supercomputer. They support solutions to number crunching problems.
Supercomputers are mainly being used for weather forecasting, computational fluid
dynamics, remote sensing, image processing, biomedical applications, etc. In India,
we have one such mainframe supercomputer system-CRAY XMP-14, which is at
present, being used by Meteorological Department.
Let us discuss about PARAM Super computer in more details
PARAM is a high-performances, scalable, industry standard computer. It has evolved
from the concepts of distributes scalable computers supporting massive parallel
processing in cluster of networked of computers. The PARAM’s main advantages is
its Scalability. PARAM can be constructed to perform Tera-floating point operations
per second. It is a cost effective computer. It supports a number of application
software.
PARAM is made using standard available components. It supports Sun’s Ultra
SPARC series servers and Solaris Operating System. It is based on open
environments and standard protocols. It can execute any standard application
available on Sun Solaris System.
Some of the applications that have been designed to run in parallel computational
mode on PARAM include numerical weather forecasting, seismic data processing,
Molecular modelling, finite element analysis, quantum chemistry.
It also supports many languages and Software Development platforms such as:
Solaris 2.5.1 Operating system on I/O and Server nodes, FORTRAN 77, FORTRAN
90, C and C++ language compilers, and tools for parallel program debugging,
Visualisation and parallel libraries, Distributed Computing Environment, Data
warehousing tools etc.
28
The Basic Computer
Check Your Progress 3
1) What is a general purpose machine?
.....................................................................................................................................
.....................................................................................................................................
………………………………………………………………………………………..
1.6 SUMMARY
This completes our discussion on the introductory concepts of computer architecture.
The von-Neumann architecture discussed in the unit is not the only architecture but
many new architectures have come up which you will find in further readings.
The information given on various topics such as interrupts, classification, history of
computer although is exhaustive yet can be supplemented with additional reading. In
fact, a course in an area of computer must be supplemented by further reading to keep
your knowledge up to date, as the computer world is changing with leaps and bounds.
In addition to further readings the student is advised to study several Indian Journals
on computers to enhance his knowledge.
29
Introduction to Digital Check Your Progress 2
Circuits
1.
i) False
ii) True
iii) True
iv) False
v) False, they may be acknowledged as per priority.
30
The Basic Computer
31
Data Representation
UNIT 2 DATA REPRESENTATION
Structure Page Nos.
2.0 Introduction 31
2.1 Objectives 31
2.2 Data Representation 31
2.3 Number Systems: A Look Back 32
2.4 Decimal Representation in Computers 36
2.5 Alphanumeric Representation 37
2.6 Data Representation For Computation 39
2.6.1 Fixed Point Representation
2.6.2 Decimal Fixed Point Representation
2.6.3 Floating Point Representation
2.6.4 Error Detection And Correction Codes
2.7 Summary 56
2.8 Solutions/ Answers 56
2.0 INTRODUCTION
In the previous Unit, you have been introduced to the basic configuration of the
Computer system, its components and working. The concept of instructions and their
execution was also explained. In this Unit, we will describe various types of binary
notations that are used in contemporary computers for storage and processing of data.
As far as instructions and their execution is concerned it will be discussed in detailed
in the later blocks.
The Computer System is based on the binary system; therefore, we will be devoting
this complete unit to the concepts of binary Data Representation in the Computer
System. This unit will re-introduce you to the number system concepts. The number
systems defined in this Unit include the Binary, Octal, and Hexadecimal notations. In
addition, details of various number representations such as floating-point
representation, BCD representation and character-based representations have been
described in this Unit. Finally the Error detection and correction codes have been
described in the Unit.
2.1 OBJECTIVES
At the end of the unit you will be able to:
31
Introduction to Digital How is the Information represented in a computer?
Circuits
Well, it is in the form of Binary Digit popularly called Bit.
How is the input and output presented in a form that is understood by us?
One of the minimum requirements in this case may be to have a representation for
characters. Thus, a mechanism that fulfils such requirement is needed. In Computers
information is represented in digital form, therefore, to represent characters in
computer we need codes. Some common character codes are ASCII, EBCDIC, ISCII
etc. These character codes are discussed in the subsequent sections.
How are the arithmetic calculations performed through these bits?
We need to represent numbers in binary and should be able to perform operations on
these numbers.
Let us try to answer these questions, in the following sections. Let us first recapitulate
some of the age-old concepts of the number system.
1×25+0×24+1×23+ 0×22+1×21+0×20
= 1×32 + 0×16 + 1×8 + 0×4 + 1×2 + 0×1
= 32 + 8 + 2
= 42 in decimal.
Octal Numbers: An octal system has eight digits represented as 0,1,2,3,4,5,6,7. For
finding equivalent decimal number of an octal number one has to find the quantity of
the octal number which is again calculated as:
32
Octal number (23.4)8 . Data Representation
(Please note the subscript 8 indicates it is an octal number, similarly, a subscript 2 will
indicate binary, 10 will indicate decimal and H will indicate Hexadecimal number, in
case no subscript is specified then number should be treated as decimal number or else
whatever number system is specified before it.)
Decimal equivalent of Octal Number:
(23.4)8
= 2 " 81 +3 " 80 +4 " 8-1
= 2 " 8+3 " 1+4 " 1/8
=16+3+0.5
= (19.5)10
Hexadecimal Numbers: The hexadecimal system has 16 digits, which are represented
as 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F. A number (F2)H is equivalent to
F×161 +2×160
= 240 + 2
= (242)10
! You will get the Integer part of the number, if you READ the remainder in the
direction of the Arrow.
33
Introduction to Digital Integer part after
Circuits Fraction On Multiplication by 2 Read
Multiplication
0.125 0.250 0
0.250 0.500 0
0.500 1.000 1
One easy direct method in Decimal to binary conversion for integer part is to first
write the place values as:
26 25 24 23 22 21 20
64 32 16 8 4 2 1
Step 1: Take the integer part e.g. 43, find the next lower or equal binary place value
number, in this example it is 32. Place 1 at 32.
Step 2: Subtract the place value from the number, in this case subtract 32 from 43,
which is 11.
Step 3: Repeat the two steps above till you get 0 at step 2.
Step 4: On getting a 0 put 0 at all other place values.
32 16 8 4 2 1
32 16 8 4 2 1
1 - - - - -1 43 -32 =11
1 - 1 - - - 11- 8 = 3
1 - 1 - 1 - 3-2 = 1
1 - 1 - 1 1 1-1= 0
1 0 1 0 1 1 is the required number.
You can extend this logic to fractional part also but in reverse order. Try this method
with several numbers. It is fast and you will soon be accustomed to it and can do the
whole operation in single iteration.
Conversion of Binary to Octal and Hexadecimal: The rules for these conversions
are straightforward. For converting binary to octal, the binary number is divided into
34
groups of three, which are then combined by place value to generate equivalent octal. Data Representation
For example the binary number 1101011.00101 can be converted to Octal as:
1 5 3 . 1 2
(Please note the number is unchanged even though we have added 0 to complete the
grouping. Also note the style of grouping before and after decimal. We count three
numbers from right to left while after the decimal from left to right.)
Thus, the octal number equivalent to the binary number 1101011.00101 is (153.12)8.
Similarly by grouping four binary digits and finding equivalent hexadecimal digits for
it can make the hexadecimal conversion. For example the same number will be
equivalent to (6B.28)H..
Conversely, we can conclude that a hexadecimal digit can be broken down into a
string of binary having 4 places and an octal can be broken down into string of binary
having 3 place values. Figure 1 gives the binary equivalents of octal and hexadecimal
numbers.
Hexadecimal Binary-coded
Octal Number Binary coded Octal
Number Hexadecial
0 000 0 0000
1 001 1 0001
2 010 2 0010
3 011 3 0011
4 100 4 0100
5 101 5 0101
6 110 6 0110
7 111 7 0111
8 1000
9 1001
-Decimal-
A 10 1010
B 11 1011
C 12 1100
D 13 1101
E 14 1110
F 15 1111
.....................................................................................................................................
.....................................................................................................................................
0 0000
1 0001
2 0010
3 0011
4 0100
5 0101
6 0110
7 0111
8 1000
9 1001
10 0001 0000
11 0001 0001
12 0001 0010
13 0001 0011
.. ……….
20 0010 0000
.. ………..
30 0011 0000
4 3 . 1 2 5
0100 0011 . 0001 0010 0101
Compare the equivalent BCD with equivalent binary value. Both are different.
ASCII
One such standard code that allows the language encoding that is popularly used is
ASCII (American Standard Code for Information Interchange). This code uses 7 bits
37
Introduction to Digital to represent 128 characters, which include 32 non-printing control characters,
Circuits
alphabets in lower and upper case, decimal digits, and other printable characters that
are available on your keyboard. Later as there was need for additional characters to be
represented such as graphics characters, additional special characters etc., ASCII was
extended to 8 bits to represent 256 characters (called Extended ASCII codes). There
are many variants of ASCII, they follow different code pages for language encoding,
however, having the same format. You can refer to the complete set of ASCII
characters on the web. The extended ASCII codes are the codes used in most of the
Microcomputers.
The major strength of ASCII is that it is quite elegant in the way it represents
characters. It is easy to write a code to manipulate upper/lowercase ASCII characters
and check for valid data ranges because of the way of representation of characters.
In the original ASCII the 8th bit (the most significant bit) was used for the purpose of
error checking as a check bit. We will discuss more about the check bits later in the
Unit.
EBCDIC
Extended Binary Coded Decimal Interchange Code (EBCDIC) is a character-encoding
format used by IBM mainframes. It is an 8-bit code and is NOT Compatible to ASCII.
It had been designed primarily for ease of use of punched cards. This was primarily
used on IBM mainframes and midrange systems such as the AS/400. Another strength
of EBCDIC was the availability of wider range of control characters for ASCII. The
character coding in this set is based on binary coded decimal, that is, the contiguous
characters in the alphanumeric range are represented in blocks of 10 starting from
0000 binary to 1001 binary. Other characters fill in the rest of the range. There are
four main blocks in the EBCDIC code:
There are several different variants of EBCDIC. Most of these differ in the
punctuation coding. More details on EBCDIC codes can be obtained from further
reading and web pages on EBCDIC.
Comparison of ASCII and EBCDIC
EBCDIC is an easier to use code on punched cards because of BCD compatibility.
However, ASCII has some of the major advantages on EBCDIC. These are:
While writing a code, since EDCDIC is not contiguous on alphabets, data comparison
to continuous character blocks is not easy. For example, if you want to check whether
a character is an uppercase alphabet, you need to test it in range A to Z for ASCII as
they are contiguous, whereas, since they are not contiguous range in EDCDIC these
may have to be compared in the ranges A to I, J to R, and S to Z which are the
contiguous blocks in EDCDIC.
Some of the characters such as [] \{}^~| are missing in EBCDIC. In addition, missing
control characters may cause some incompatibility problems.
UNICODE
This is a newer International standard for character representation. Unicode provides
a unique code for every character, irrespective of the platform, Program and
Language. Unicode Standard has been adopted by the Industry. The key players that
have adopted Unicode include Apple, HP, IBM, Microsoft, Oracle, SAP, Sun, Sybase,
Unisys and many other companies. Unicode has been implemented in most of the
38
latest client server software. Unicode is required by modern standards such as XML, Data Representation
Java, JavaScript, CORBA 3.0, etc. It is supported in many operating systems, and
almost all modern web browsers. Unicode includes character set of Dev Nagari. The
emergence of the Unicode Standard, and the availability of tools supporting it, is
among the most significant recent global software technology trends.
One of the major advantages of Unicode in the client-server or multi-tiered
applications and websites is the cost saving over the use of legacy character sets that
results in targeting website and software products across multiple platforms,
languages and countries without re-engineering. Thus, it helps in data transfer through
many different systems without any compatibility problems. In India the suitability of
Unicode to implement Indian languages is still being worked out.
Indian Standard Code for information interchange (ISCII)
The ISCII is an eight-bit code that contains the standard ASCII values till 127 from
128-225 it contains the characters required in the ten Brahmi-based Indian scripts. It
is defined in IS 13194:1991 BIS standard. It supports INSCRIPT keyboard which
provides a logical arrangement of vowels and consonants based on the phonetic
properties and usage frequencies of the letters of Bramhi-scripts. Thus, allowing use
of existing English keyboard for Indian language input. Any software that uses ISCII
codes can be used in any Indian Script, enhancing its commercial viability. It also
allows transliteration between different Indian scripts through change of display
mode.
How are these codes actually used to represent data for scientific calculations?
The computer is a discrete digital device and stores information in flip-flops (see Unit
3, 4 of this Block for more details), which are two state devices, in binary form. Basic
requirements of the computational data representation in binary form are:
! Representation of sign
! Representation of Magnitude
! If the number is fractional then binary or decimal point, and
! Exponent
The solution to sign representation is easy, because sign can be either positive or
negative, therefore, one bit can be used to represent sign. By default it should be the
left most bit (in most of the machines it is the Most Significant Bit).
Thus, a number of n bits can be represented as n+l bit number, where n+lth bit is the
sign bit and rest n bits represent its magnitude (Please refer to Figure 3).
39
Figure 3: A (n + 1) bit number
Introduction to Digital The decimal position can be represented by a position between the flip-flops (storage
Circuits
cells in computer). But, how can one determine this decimal position? Well to
simplify the representation aspect two methods were suggested: (1) Fixed point
representation where the binary decimal position is assumed either at the beginning or
at the end of a number; and (2) Floating point representation where a second register
is used to keep the value of exponent that determines the position of the binary or
decimal point in the number.
But before discussing these two representations let us first discuss the term
“complement” of a number. These complements may be used to represent negative
numbers in digital computers.
Complement: There are two types of complements for a number of base (also called
radix) r. These are called r’s complement and (r- 1)’s complement. For example, for
decimal numbers the base is 10, therefore, complements will be 10’s complement and
(10-1) = 9’s complement. For binary numbers we talk about 2’s and 1’s complements.
But how to obtain complements and what do these complements means? Let us
discuss these issues with the help of following example:
Example 2: Find the 9’s complement and 10’s complement for the decimal number
256.
Solution:
9’s complement: The 9’s complement is obtained by subtracting each digit of the
number from 9 (the highest digit value). Let us assume that we want to represent a
maximum of four decimal digit number range. 9’s complement can be used for BCD
numbers.
9 9 9 9
9’s complement of 256 -0 -2 -5 -6
9 7 4 3
Similarly, for obtaining 1’s complement for a binary number we have to subtract each
binary digit of the number from the digit 1.
10’s complement: Adding 1 in the 9’s complement produces the 10’s complement.
10’s complement of 0256 = 9743+1 = 9744
Please note on adding the number and its 9’s complement we get 9999 (the maximum
possible number that can be represented in the four decimal digit number range) while
on adding the number and its 10’s complement we get 10000 (The number just higher
than the range. This number cannot be represented in four digit representation.)
Example3: Find 1’s and 2’s complement of 1010 using only four-digit representation.
Solution:
1’s complement: The 1’s complement of 1010 is
1 1 1 1
-1 -0 -1 -0
0 1 0 1
The number is 1 0 1 0
40
Please note that wherever you have a digit 1 in number the complement contains 0 for Data Representation
that digit and vice versa. In other words to obtain 1’s complement of a binary number,
we only have to change all the 1’s of the number to 0 and all the zeros to 1’s. This can
be done by complementing each bit of the binary number.
2’s complement: Adding 1 in 1’s complement will generate the 2’s complement
The number is 1 0 1 0
The 1’s complement is 0 1 0 1
For 2’s complement add 1 in 1’s complement - - - 1
Please note that 1+1 = 1 0 in binary 0 1 1 0
The number is 1 0 1 0
The 1’s complement is 0 1 1 0
The 2’s complement can also be obtained by not complementing the least significant
zeros till the first 1 is encountered. This 1 is also not complemented. After this 1 the
rest of all the bits are complemented on the left.
Therefore, 2’s complement of the following number (using this method) should be
(you can check it by finding 2’s complement as we have done in the example).
The number is 0 0 1 0 0 1 0 0
The number is 1 0 0 0 0 0 0 0
The number is 0 0 1 0 1 0 0 1
No change in this
bit only
2.6.1 Fixed Point Representation
The fixed-point numbers in binary uses a sign bit. A positive number has a sign bit 0,
while the negative number has a sign bit 1. In the fixed-point numbers we assume that
the position of the binary point is at the end, that is, after the least significant bit. It
implies that all the represented numbers will be integers. A negative number can be
represented in one of the following ways:
Arithmetic addition
The complexity of arithmetic addition is dependent on the representation, which has
been followed. Let us discuss this with the help of following example.
Solution:
Please note how easy it is to add two numbers using signed 2’s Complement. This
procedure requires only one control decision and only one circuit for adding the two
numbers. But it puts on additional condition that the negative numbers should be
stored in signed 2’s complement notation in the registers. This can be achieved by
43
Introduction to Digital complementing the positive number bit by bit and then incrementing the resultant by 1
Circuits
to get signed 2’s complement.
Signed 1’s complement representation
Another possibility, which also is simple, is use of signed 1’s complement. Signed 1’s
complement has a rule. Add the two numbers, including the sign bit. If carry of the
most significant bit or sign bit is one, then increment the result by 1 and discard the
carry over. Let us repeat all the operations with 1’s complement.
Add carry
to sum and 1
discard it
- 1 100 1000
Representation +0 -0
Signed magnitude 0 000 0000 1 000 0000
Signed 1’s complement 0 000 0000 1 111 1111
44
But, in signed 2’s complement there is just one zero and there is no positive or Data Representation
negative zero.
+0 in 2’s Complement Notation: 0 000 0000
-0 in 1’s complement notation: 1 111 1111
Add 1 for 2’s complement: 1
Discard the Carry Out 1 0 000 0000
Thus, -0 in 2’s complement notation is same as +0 and is equal to 0 000 0000. Thus,
both +0 and -0 are same in 2’s complement notation. This is an added advantage in
favour of 2’s complement notation.
The highest number that can be accommodated in a register, also depends on the type
of representation. In general in an 8 bit register 1 bit is used as sign, therefore, the rest
7 bits can be used for representing the value. The highest and the lowest numbers that
can be represented are:
For signed magnitude representation (27 – 1) to – (27 – 1)
= (128–1) to – (128– 1)
= 127 to –127
For signed 1’s complement 127 to –127
But, for signed 2’s complement we can represent +127 to –128. The – 128 is
represented in signed 2’s complement notation as 10000000.
Arithmetic Subtraction: The subtraction can be easily done using the 2’s
complement by taking the 2’s complement of the value that is to be subtracted
(inclusive of sign bit) and then adding the two numbers.
Signed 2’s complement provides a very simple way for adding and subtracting two
numbers. Thus, many computers (including IBM PC) adopt signed 2’s complement
notation. The reason why signed 2’s complement is preferred over signed 1’s
complement is because it has only one representation for zero.
Overflow: An overflow is said to have occurred when the sum of two n digits number
occupies n+ 1 digits. This definition is valid for both binary as well as decimal digits.
What is the significance of overflow for binary numbers?
Well, the overflow results in errors during binary arithmetic as the numbers are
represented using a fixed number of digits also called the size of the number. Any
value that results from computation must be less than the maximum of the allowed
value as per the size of the number. In case, a result of computation exceeds the
maximum size, the computer will not be able to represent the number correctly, or in
other words the number has overflowed. Every computer employs a limit for
representing numbers e.g. in our examples we are using 8 bit registers for calculating
the sum. But what will happen if the sum of the two numbers can be accommodated in
9 bits? Where are we going to store the 9th bit, The problem will be better understood
by the following example.
Example: Add the numbers 65 and 75 in 8 bit register in signed 2’s complement
notation.
65 0 100 0001
75 0 100 1011
For example
Sign
Although this scheme wastes considerable amount of storage space yet it does not
require conversion of a decimal number to binary. Thus, it can be used at places where
the amount of computer arithmetic is less than that of the amount of input/output of
data e.g. calculators or business data processing situations. The arithmetic in decimal
can also be performed as in binary except that instead of signed complement, signed
nine’s complement is used and instead of signed 2’s complement signed 9’s
complement is used. More details on decimal arithmetic are available in further
readings.
Check Your Progress 2
1) Write the BCD equivalent for the three numbers given below:
i) 23
ii) 49.25
iii) 892
46
..................................................................................................................................... Data Representation
.....................................................................................................................................
.....................................................................................................................................
.....................................................................................................................................
2) Find the 1’s and 2’s complement of the following fixed-point numbers.
i) 10100010
ii) 00000000
iii) 11001100
.....................................................................................................................................
.....................................................................................................................................
.....................................................................................................................................
……………………………………………………………………………………….
3) Add the following numbers in 8-bit register using signed 2’s complement
notation
i) +50 and – 5
ii) +45 and – 65
iii) +75 and +85
Also indicate the overflow if any.
.....................................................................................................................................
………………..............................................................................................................
…….. ...........................................................................................................................
…………………………………………………………………………………………
47
Introduction to Digital This number in any of the above forms (if represented in BCD) requires 17 bits for
Circuits
mantissa (1 for sign and 4 each decimal digit as BCD) and 9 bits for exponent (1 for
sign and 4 for each decimal digit as BCD). Please note that the exponent indicates the
correct decimal location. In the first case where exponent is +2, indicates that actual
position of the decimal point is two places to the right of the assumed position, while
exponent– 2 indicates that the assumed position of the point is two places towards the
left of assumed position. The assumption of the position of point is normally the same
in a computer resulting in a consistent computational environment.
0 1100 0100
Sign Normalised Mantissa Exponent (assuming fractional Mantissa
Sign bit
0 101000100 0 00100
Mantissa (Integer) Exponent
A zero cannot be normalised as all the digits in mantissa in this case have to be zero.
Arithmetic operations involved with floating point numbers are more complex in
nature, take longer time for execution and require complex hardware. Yet the floating-
point representation is a must as it is useful in scientific calculations. Real numbers
are normally represented as floating point numbers.
The following figure shows a format of a 32-bit floating-point number.
0 1 8 9 31
Sign Biased Exponent = 8 bits Significand = 23 bits
Figure 4: Floating Point Number Representation
Now, let us define the range that a normalised mantissa can represent. Let us assume
that our present representations has the normalised mantissa, thus, the left most bit
48
cannot be zero, therefore, it has to be 1. Thus, it is not necessary to store this first bit Data Representation
and it is being assumed implicitly for the number. Therefore, a 23-bit mantissa can
represent 23 + 1 = 24 bit mantissa in our representation.
Thus, the smallest mantissa value may be:
The implicit first bit as 1 followed by 23 zero’s, that is,
49
Figure 5: Binary floating-point number range for given 32 bit format
Introduction to Digital In floating point numbers, the basic trade-off is between the range of the numbers and
Circuits
accuracy, also called the precision of numbers. If we increase the exponent bits in 32-
bit format, the range can be increased, however, the accuracy of numbers will go
down, as size of mantissa will become smaller. Let us take an example, which will
clarify the term precision. Suppose we have one bit binary mantissa then we can
represent only 0.10 and 0.11 in the normalised form as given in above example
(having an implicit 1). The values such as 0.101, 0.1011 and so on cannot be
represented as complete numbers. Either they have to be approximated or truncated
and will be represented as either 0.10 or 0.11. Thus, it will create a truncation or round
off error. The higher the number of bits in mantissa better will be the precision.
In floating point numbers, for increasing both precision and range more number of
bits are needed. This can be achieved by using double precision numbers. A double
precision format is normally of 64 bits.
Institute of Electrical and Electronics Engineers (IEEE) is a society, which has created
lot of standards regarding various aspects of computer, has created IEEE standard 754
for floating-point representation and arithmetic. The basic objective of developing this
standard was to facilitate the portability of programs from one to another computer.
This standard has resulted in development of standard numerical capabilities in
various microprocessors. This representation is shown in figure 6.
0 1 8 9 31
0 1 11 12 63
Figure 7 gives the floating-point numbers specified by the IEEE Standard 754.
Please note that IEEE standard 754 specifies plus zero and minus zero and plus
infinity and minus infinity. Floating point arithmetic is more sticky than fixed point
arithmetic. For floating point addition and subtraction we have to follow the following
steps:
Here, the assumption is that exponent of x (Ex) is greater than exponent of y (Ey), Nx
and Ny represent significand of x and y respectively.
While for multiplication and division operations the significand need to be multiplied
or divided respectively, however, the exponents are to be added or to be subtracted
respectively. In case we are using bias of 128 or any other bias for exponents then on
addition of exponents since both the exponents have bias, the bias gets doubled.
Therefore, we must subtract the bias from the exponent on addition of exponents.
However, bias is to be added if we are subtracting the exponents. The division and
multiplication operation can be represented as:
x × y = (Nx × Ny) × 2Ex+Ey
x ÷ y = (Nx ÷ Ny) × 2Ex-Ey
For more details on floating point arithmetic you can refer to the further readings.
The Objective : Data should be transmitted between a source data pair reliably,
indicating error, or even correcting it, if possible.
The Process:
! An error detection function is applied on the data available at the source end an
error detection code is generated.
! The data and error detection or correction code are stored together at source.
! On receiving the data transmission request, the stored data along with stored
error detection or correction code are transmitted to the unit requesting data
(Destination).
! On receiving the data and error detection/correction code from source, the
destination once again applies same error detection/correction function as has
been applied at source on the data received (but not on error detection/
correction code received from source) and generates destination error
detection/correction code.
! Source and destination error codes are compared to flag or correct an error as
the case may be.
The parity bit is only an error detection code. The concept of error detection and
correction code has been developed using more than one parity bits. One such code is
Hamming error correcting code.
Hamming Error-Correcting Code: Richard Hamming at Bell Laboratories devised
this code. We will just introduce this code with the help of an example for 4 bit data.
Let us assume a four bit number b4, b3, b2, b1. In order to build a simple error
detection code that detects error in one bit only, we may just add an odd parity bit.
However, if we want to find which bit is in error then we may have to use parity bits
for various combinations of these 4 bits such that a bit error can be identified
uniquely. For example, we may create four parity sets as
Source Parity Destination Parity
52
b1, b2, b3 P1 D1 Data Representation
b2, b3, b4 P2 D2
b3, b4, b1 P3 D3
b1, b2, b3, b4 P4 D4
Now, a very interesting phenomena can be noticed in the above parity pairs. Suppose
data bit b1 is in error on transmission then, it will cause change in destination parity
D1, D3, D4.
ERROR IN Cause change in Destination Parity
(one bit only)
b1 D1, D3, D4
b2 D1, D2, D4
b3 D1, D2,D3, D4
b4 D2, D3, D4
Figure 9 : The error detection parity code mismatch
Thus, by simply comparing parity bits of source and destination we can identify that
which of the four bits is in error. This bit then can be complemented to remove error.
Please note that, even the source parity bit can be in error on transmission, however,
under the assumption that only one bit (irrespective of data or parity) is in error, it will
be detected as only one destination parity will differ.
What should be the length of the error detection code that detects error in one bit?
Before answering this question we have to look into the comparison logic of error
detection. The error detection is done by comparing the two ‘i’ bit error detection and
correction codes fed to the comparison logic bit by bit (refer to figure 8). Let us have
comparison logic, which produces a zero if the compared bits are same or else it
produces a one.
Therefore, if similar Position bits are same then we get zero at that bit Position, but if
they are different, that is, this bit position may point to some error, then this Particular
bit position will be marked as one. This way a matching word is constructed. This
matching word is ‘i’ bit long, therefore, can represent 2i values or combinations.
For example, a 4-bit matching word can represent 24=16 values, which range from 0
to 15 as:
0000, 0001, 0010, 0011, 0100, 0101, 0110, 0111
1000, 1001, 1010, 1011, 1100, 1101, 1110, 1111
The value 0000 or 0 represent no error while the other values i.e. 2i-1
(for 4 bits 24– 1=15, that is from 1 to 15) represent an error condition. Each of these
2i – 1(or 15 for 4 bits) values can be used to represent an error of a particular bit.
Since, the error can occur during the transmission of ‘N’ bit data plus ‘i’ bit error
correction code, therefore, we need to have at least ‘N+i’ error values to represent
them. Therefore, the number of error correction bits should be found from the
following equation:
2i – 1 >= N+i
If we are assuming 8-bit word then we need to have
2i – 1 >= 8+i
Say at i=3 LHS = 23 –1 = 7; RHS = 8+3 = 11
53
Introduction to Digital
Circuits i=4 2i-1 = 24 – 1 = 15; RHS = 8+4 = 12
Therefore, for an eight-bit word we need to have at least four-bit error correction code
for detecting and correcting errors in a single bit during transmission.
Similarly for 16 bit word we need to have i = 5
25 –1 = 31 and 16+i = 16+5 = 21
For 16-bit word we need to have five error correcting bits.
Let us explain this with the help of an example:
Let us assume 4 bit data as 1010
The logic is shown in the following table:
Source:
Source Data Odd parity bits at source
b4 b3 b2 b1 P1 P2 P3 P4
(b1, b2, b3) (b2, b3, b4) (b3, b4, b1) (b1, b2, b3,b4 )
1 0 1 0 0 1 0 1
1 0 1 0 0 1 0 1
Thus, P1 – D1, P3 – D3, P4 –D4 pair differ, thus, as per Figure 9, b1 is in error, so
correct it by completing b1 to get correct data 1010.
Case 2: Data Received as 1000 (Error in b2)
b4 b3 b2 b1 D1 D2 D3 D4
(b1, b2, b3) (b2, b3, b4) (b3, b4, b1) (b1, b2, b3,b4 )
1 0 0 0 0 1 0 0
54
Normally, Single Error Correction (SEC) code is used in semiconductor memories for Data Representation
correction of single bit errors, however, it is supplemented with an added feature for
detection of errors in two bits. This is called a SEC-DED (Single Error Correction-
Double Error Detecting) code. This code requires an additional check bit in
comparison to SEC code. We will only illustrate the working principle of SEC-DED
code with the help of an example for a 4-bit data word. Basically, the SEC-DED code
guards against the errors of two bits in SEC codes.
Case: 4
Let us assume now that two bit errors occur in data.
Data received:
b4 b3 b2 b1
1 1 0 0
b4 b3 b2 b1 D1 D2 D3 D4
(b1, b2, b3) (b2, b3, b4) (b3, b4, b1) (b1, b2, b3,b4 )
1 0 0 0 0 1 0 0
.....................................................................................................................................
.....................................................................................................................................
3) Find the length of SEC code and SEC-DED code for a 16-bit word data transfer.
55
Introduction to Digital .....................................................................................................................................
Circuits
.....................................................................................................................................
.....................................................................................................................................
2.7 SUMMARY
This unit provides an in-depth coverage of the data representation in a computer
system. We have also covered aspects relating to error detection mechanism. The
unit covers number system, conversion of number system, conversion of numbers to a
different number system. It introduces the concept of computer arithmetic using 2’s
complement notation and provides introduction to information representation codes
like ASCII, EBCDIC, etc. The concept of floating point numbers has also been
covered with the help of a design example and IEEE-754 standard. Finally error
detection and correction mechanism is detailed along with an example of SEC &
SEC-DED code.
The information given on various topics such as data representation, error detection
codes etc. although exhaustive yet can be supplemented with additional reading. In
fact, a course in an area of computer must be supplemented by further reading to keep
your knowledge up to date, as the computer world is changing with by leaps and
bounds. In addition to further reading the student is advised to study several Indian
Journals on computers to enhance his knowledge.
2.8 SOLUTIONS/ANSWERS
thus; Integer = (1 " 23+1 " 22+0 " 21+0 " 20) = (23+22) = (8+4) = 12
Fraction = (1 " 2 +1 " 2-2+0 " 2-3+1 " 2-4) = 2-1+2-2+2-4 = 0.5+0.125 + 0.0625 =0.6875
-1
ii) 10101010
7 6 5 4 3 2 1 0
2 2 2 2 2 2 2 2
1 0 1 0 1 0 1 0
128 64 32 16 8 4 2 1
=1 0 1 0 1 0 1 0
= 128 + 32 + 8 + 2 = 170
2.
i) 16 8 4 2 1
1 0 1 1 1
ii) Integer is 49.
56
32 16 8 4 2 1 Data Representation
1 1 0 0 0 1
Fraction is 0.25
1/2 1/4 1/8 1/16
0 1 0 0
The decimal number 49.25 is 110001.010
iii)
512 256 128 64 32 16 8 4 2 1
1 1 0 1 1 1 1 1 0 0
The decimal number 892 in binary is 1101111100
3)
i) Decimal to Hexadecimal
16) 23 (1
-16
7
Hexadecimal is 17
Binary to Hexadecimal (hex)
= 1 0111 (from answer of 2 (i)
0001 0111
1 7
ii)
49.25 or 110001.010
Decimal to hex
Integer part = 49
16 ) 49 ( 3
-48
1
Integer part = 31
Fraction part = .25 " 16
= 4.000 So fraction part = 4
Hex number is 31.4
Binary to hex 11 0001 . 010
= 0011 0001 . 0100
= 3 1 . 4
= 31.4
No carry into sign bit, no carry out of sign bit. Therefore, no overflow.
+20 is 0 0010100
Therefore, -20 is 1 1101100
which is the given sum
58
(iii) +75 is 0 1001011 Data Representation
+85 is 0 1010101
1 0100000
Carry into sign bit =1
Carry out of sign bit =0
Overflow.
Check Your Progress 3
1.
i) 1010.0001
= 1.0100001 " 23
So, the single precision number is :
Significand = 010 0001 000 0000 0000 0000
Exponent = 3+127 = 130 = 10000010
Sign=0
So the number is = 0 1000 0010 010 0001 0000 0000 0000 0000
ii) -0.0000111
-1.11 " 2-5
Significand = 110 0000 0000 0000 0000 0000
59
Introduction to Digital
Circuits
UNIT 3 PRINCIPLES OF LOGIC CIRCUITS I
Structure Page Nos.
3.0 Introduction 60
3.1 Objectives 60
3.2 Logic Gates 60
3.3 Logic Circuits 62
3.4 Combinational Circuits 63
3.4.1 Canonical and Standard Forms
3.4.2 Minimization of Gates
3.5 Design of Combinational Circuits 72
3.6 Examples of Logic Combinational Circuits 73
3.6.1 Adders
3.6.2 Decoders
3.6.3 Multiplexer
3.6.4 Encoder
3.6.5 Programmable Logic Array
3.6.6 Read Only Memory ROM
3.7 Summary 82
3.8 Solutions/ Answers 82
3.0 INTRODUCTION
In the previous units, we have discussed the basic configuration of computer system
von Neumann architecture, data representation and simple instruction execution
paradigm. But ‘How does a computer actually perform computations?’. Now, we will
attempt to find answer of this basic query. In this unit, you will be exposed to some
of the basic components that form the most essential parts of a computer. You will
come across terms like logic gates, binary adders, logic circuits and combinational
circuits etc. These circuits are the backbone of any computer system and knowing
them is quite essential. The characteristics of integrated digital circuits are also
discussed in this unit.
3.1 OBJECTIVES
! define and describe some of the useful circuits of a computer system such as
multiplexer, decoders, ROM etc.
60
In general we can represent each gate through a distinct graphic symbol and its Principles of Logic
Circuits I
operation can be given by means of algebraic expression. To represent the input-
output relationship of binary variables in each gate, truth tables are used. The
notations and truth -tables for different logic gates are given in Figure 3.1.
The truth table of NAND and NOR can be made from NOT (A AND B) and NOT
(A OR B) respectively. Exclusive OR (XOR) is a special gate whose output is one
only if the two inputs are not equal. The inverse of exclusive OR, called as XNOR
gate, can be a comparator which will produce a 1 output if two inputs are equal.
The digital circuits use only one or two types of gates for simplicity in fabrication
purposes. Therefore, one must think in terms of functionally complete set of gates.
What does functionally complete set imply? A set of gates by which any Boolean
function can be implemented is called a functionally complete set. The functionally
complete sets are: [AND, NOT], [NOR], [NAND], [OR, NOT].
61
Introduction to Digital
Circuits 3.3 LOGIC CIRCUITS
A Boolean function can be implemented into a logic circuit using the basic gates:-
AND , OR & NOT. Consider, for example, the Boolean function: -
F (A,B,C) = A B + C
The relationship between this function and its binary variables A, B, C can be
represented in a truth table as shown in figure 3.2(a) and figure 3.2(b) shows the
corresponding logic circuit.
Inputs Output
A B C F
0 0 0 0
0 0 1 1
0 1 0 1
0 1 1 1
1 0 0 0
1 0 1 1
1 1 0 0
1 1 1 1
.....................................................................................................................................
The basic design issue related to combinational circuits is: the Minimization of
number of gates. The normal circuit constraints for combinational circuit design are :
! The depth of the circuit should not exceed a specific level,
! Number of input lines to a gate (fan in) and to how many gates its output can be
fed (fan out) are constraint by the circuit power constraints.
Similar type of term used in POS form is called Maxterm or Standard Sums .
Maxterm is a term of POS expression, which contains all the variables of the function
in true or complemented form. For example, F (A, B, C) = (A + B + C). ( A + B + C)
has two maxterms. A maxterm has a value 0, for only one combination of input
values.
The maxterm A + B+C will has 0 value only for A = 0, B = 0 and C = 0 for all other
combination of values of A, B, C it will have a value 1.
Figure 3.4 indicates the 2n different minterms and maxterms where n is number of
variables.
We can represent any Boolean function alegebrically directly in minterm and maxterm
form from the truth table. For minterms, consider each combination of variables that
produces a 1 output in function and then taking OR of all those terms. For example,
the function F in figure 3.5 is represented in minterm form by ORing the terms where
the output F is 1 i.e. a b c , a b c a b c, a b c & a b c .
a b c F
0 0 0 0 m0
0 0 1 1 m1
0 1 0 1 m2
0 1 1 1 m3
1 0 0 0 m4
1 0 1 0 m5
1 1 0 1 m6
1 1 1 1 m7
Figure 3.5: Function of three variables
Thus, F (a,b,c) = a b c + a b c + a b c+ a b c + a b c
= m1 + m2+ m3 + m6 + m7
= / (1,2,3,6,7)
64
The complement of function F can be obtained by ORing of the minterms Principles of Logic
Circuits I
corresponding to the combinations that produce a 0 output in function. Thus,
F (a, b, c) = a b c + a b c + a b c
If we take the complement of F , we get the function F in maxterm form.
F (a, b, c) = ( F ) = ( a b c + a b c + a b c) = ( a b c ) . (a b c ) . (a b c)
= (a + b + c) ( a + b + c) ( a + b + c ) [De Morgan’s law]
= M0 . M4 . M5
= 0 (0, 4, 5)
Algebraic Simplification
We have already discussed algebraic simplification of logic circuit. An algebraic
expression can exist in POS or SOP forms. Let us examine the following example to
understand how it helps in implementing any logic circuit.
Example : Consider the function F (a,b,c) = a b c + a b c + a b . The logic circuit
implementation of this function is shown in fig 3.6(a).
65
(a) F = a b c , a b c , a b
Introduction to Digital
Circuits
(b) F = a b c , a b c , a b
Figure 3.6 : Two logic diagrams for same boolean expression
66
Principles of Logic
Circuits I
Please note:
1) Decimal equivalents of column are given for help in understanding where the
position of the respective set lies. It is not the value filled in the square. A
square can contain one or nothing.
2) The 00, 01, 11 etc written on the top implies the value of the respective
variables.
3) Wherever the value of a variable is 0 it is said to represent its compliment form.
4) The value of only one variable changes when we move from one row to the next
row or one column to the next column.
Step 2: The next step in Karnaugh map is to map the truth table into the map. The
mapping is done by putting a 1 in the respective square belonging to the 1
value in the truth table. This mapped map is used to arrive at simplified
Boolean expression which then can be used for drawing up the optimal
logical circuit. Step 2 will be more clear in the example.
Step 3: Now, create simple algebraic expression from the K-Map. These
expressions are created by using adjacency if we have two adjacent 1’s then
the expression for those can be simplified together since they differ only in
1 variable. Similarly, we search for the adjacent pairs of 4, 8 and so on. A 1
can appear in more than one adjacent pairs. We should search for octets
first then quadrets and then for doublets. The following example will clarify
the step 3.
67
Introduction to Digital Example: Now, let us see how to use K map simplification for finding the
Circuits
Boolean function for the cases whose truth table is given in figure 3.8(a)
and 3.8(B) shows the K-Map for this.
Decimal A B C D Output F
0 0 0 0 0 1
1 0 0 0 1 1
2 0 0 1 0 1
3 0 0 1 1 0
4 0 1 0 0 0
5 0 1 0 1 0
6 0 1 1 0 1
7 0 1 1 1 0
Or F = / ( 0, 1, 2, 6, 8, 9, 10)
8 1 0 0 0 1
9 1 0 0 1 1
10 1 0 1 0 1
11 1 0 1 1 0
12 1 1 0 0 0
13 1 1 0 1 0
14 1 1 1 0 0
15 1 1 1 1 0
Let us see what the pairs which can be considered as adjacent in the Karnaugh’s here.
The pairs are:
1) The four corners
2) The four 1’s as in top and bottom in column 00 & 01
3) The two 1’s in the top two rows of last column.
The corners can be represented by the expressions :
1) Four corners
= ( A B C D + A B C D ) + (A B C D +A B C D )
= A B D ( C +C) + A B D ( C +C) [as C+ C = 1]
= A B D+A B D
0 = B D ( A + A)
= BD
68
2) The four 1’s in column 00 and 01 gives the following terms Principles of Logic
Circuits I
= ( A B C D + A B C D) + (A B C D + A B C D)
= A B C ( D + D) + A B C ( D + D)
= A B C+ A B C
= BC
= A BCD + AB CD
= A C D ( B + B)
= ACD
F = B D+B C+ACD
[Note : This expression can be directly obtained from the K-Map after making
quadrets and doublets. Try to find how ?]
The expressions so obtained through K-Maps are in the forms of the sum of the
product form i.e. it is expressed as the sum of the products of the variables. This
expression can be expressed in product of sum form, but for this special method are
required to be used [already discussed in last section].
Let us see how we can modify K-Map simplification to obtain POS form. Suppose in
the previous example instead of using 1 we combined the adjacent 0 squares then we
will obtain the inverse function and on taking transform of this function we will get
the POS form.
Another important aspect about this simple method of digital circuit design is
DONOT care conditions. These conditions further simplify the algebraic function.
These conditions imply that it does not matter whether the output produced is 0 or 1
for the specific input. These conditions can occur when the combination of the
number of inputs are more than needed. For example, calculation through BCD where
4 bits are used to represent a decimal digit implies we can represent 24 = 16 digits but
since we have only 10 decimal digits therefore 6 of those input combination values do
not matter and are a candidate for DONOT care condition.
For the purpose of exercises you can do the exercise from the reference [1], [2] ,[3]
given in Block introduction.
What will happen if we have more than 4– 6 variables? As the numbers of variables
increases K-Maps become more and more cumbersome as the numbers of possible
combinations of inputs keep on increasing.
Quine McKluskey Method
A tabular method was suggested to deal with the increasing number of variables
known as Quine McKluskey Method. This method is suitable for programming and
hence provides a tool for automating design in the form of minimizing Boolean
expression.
The basic principle behind the Quine McKluskey Method is to remove the terms,
which are redundant and can be obtained by other terms.
To understand Quine - Mc Kluskey method, lets us see following example:-
Term/var A B C D E Checked/Unchecked
ABCDE 1 1 1 1 1 "
ABC D E 1 1 1 0 1 "
A B C DE 1 0 0 1 1 "
A BCD E 0 1 1 1 0 "
A B CD E 1 0 1 1 0 "
A B C DE 0 0 0 1 1 "
AB C DE 1 0 0 0 1 "
A B C DE 0 0 0 0 0 "!
!
Step II : Forming the pairs which differ in only one variable, also put check (v)
against the terms selected and finding resultant terms as follows :-
AB C D E
ABCE
AB C D E
A B CD E "
BCDE
AB CD E
AB C DE
ACDE
A BC DE
AB C DE "
BCDE
A B C DE
In the new terms, again find all the terms which differ only in one variable and put a
check (") across those terms i.e.
BC DE
BCE
B C DE
A CD E " "
Thus all columns have mark ‘X’. Thus the final expression is:
F (A,B,C,D,E) = A B C E + A C DE + B CE
The process can be summarised as follows:-
70
Step I : Build a table in which each term of the expression is represented in row Principles of Logic
Circuits I
(Expression should be in SOP form). The terms can be represented in the
0 (Complemented) or 1 (normal) form.
Step II : Check all the terms that differ in only one variable and then combine the
pairs by removing the variable that differs in those terms. Thus a new
table is formed.
This process is repeated, if necessary, in the new table also until all
uncommon terms are left i.e. no matches left in table.
Step III :
a) Finally, a two dimensional table is formed all terms which are not
eliminated in the table form rows and all original terms form the column.
b) At each intersection of row and column where row term is subset of column
term, a ‘X’ is placed.
Step IV :
a) Put a square around each ‘X’ which is alone in column
b) Put a circle around each ‘X’ in any row which contains a squared
‘X’
c) If every column has a squared or circled ‘X’ then the process is complete
and the corresponding minimal expression is formed by all row terms which
have marked Xs.
Check Your Progress 2
1) Prepare the truth table for the following boolean expressions:
(i) A B C+A BC
(ii) (A+B) . ( A + B )
2 Simplify the following functions using algebraic simplification procedures and
draw the logic diagram for the simplified function.
(i) F = ( ( A .B) + B )
(ii) F = ( ( A. B) . ( A B ) )
.....................................................................................................................................
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
3) Simplify the following boolean functions in SOP and POS forms by means of
K-Maps.
Also draw the logic diagram.
F (A,B,C,D) = 1 (0,2,8,9,10,11,14,15)
. ....................................................................................................................................
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
71
Introduction to Digital
Circuits 3.5 DESIGN OF COMBINATIONAL CIRCUITS
The digital circuits, which we use now-a-days, are constructed with NAND or NOR
gates instead of AND–OR–NOT gates. NAND & NOR gates are called Universal
Gates as we can implement any digital system with these gates. To prove this point
we need to only show that the basic gates : AND , OR & NOT, can be implemented
with either only NAND or with only NOR gate. This is shown in figure 3.9 below:
Figure 3.9 : Basic Logic Operations with NAND and NOR gates
Any Boolean expression can be implemented with NAND gates, by expressing the
function in sum of product form.
Example: Consider the function F (A, B, C) = 1 (1,2,3,4,5,7). Firstly bring it in
SOP form. Thus, from the K-Map shown in figure 3.10(a), we find
+ (
F (A.B.C) 2 C , AB , A B 2 )) C , AB , A B &&
* '
+ (
2 ) C . (A B) . (A B) &
* '
Figure 3.10: K-Map & Logic circuit for function F (A, B, C) = 1 (1,2,3,4,5,7).
72
Similarly, any Boolean expression can be implemented with only NOR gate by Principles of Logic
Circuits I
expressing in POS form. Let us take same example, F (A, B, C) = 1 (1,2,3,4,5,7).
As discussed in section 3.4.1, the above function F can be represented in POS form as
F (A, B, C) = 3 (0,6)
4 54 5 4 54
2 A , B , C . A , B , C 2 A , B , C . A. , B , C 5
2 +) A , B , C (& , +) A , B , C (&
* ' * '
Figure 3.11: Logic circuit for function F (A, B, C) = 1 (1,2,3,4,5,7) using NOR gates
After discussing so much about the design let us discuss some important
combinational circuits. We will not go into the details of their design in this unit.
3.6.1 Adders
Adders play one of the most important roles in binary arithmetic. In fact fixed point
addition is often used as a simple measure to express processor’s speed. Addition and
subtraction circuit can be used as the basis for implementation of multiplication and
division. ( we are not giving details of these, you can find it in Suggested Reading).
Thus, considerable efforts have been put in designing of high speed addition and
substraction circuits. It is considered to be an important task since the time of
Babbage. Number codes are also responsible for adding to the complexity of
arithmetic circuit. The 2’s complement notation is one of the most widely used codes
for fixed-point binary numbers because of ease of performing addition and subtraction
through it.
A combinational circuit which performs addition of two bits is called a half adder,
while the combinational circuit which performs arithmetic addition of three bits (the
third bit is the previous carry bit) is called a full adder.
In half adder the inputs are:
73
Introduction to Digital The augend lets say ‘x’ and addend ‘y’ bits.
Circuits
The outputs are sum ‘S’ and carry ‘C’ bits.
The logical relationship between these are given by the truth table as shown in figure
3.12 (a). Carry ‘C’ can be obtained on applying AND gate on ‘x’ & ‘y’ inputs,
therefore , C = x.y, while S can be found from the Karnaugh Map as shown in figure
3.12(b). The corresponding logic diagram is shown in figure 3.12(c).
Thus, the sum and carry equations of half- adder are:
S = x. y + x .y
C = x.y
(c ) Logic Diagram
Let us take the full adder. For this another variable carry from previous bit addition is
added let us call it ‘p’. The truth table and K-Map for this is shown in figure 3.13.
74
Principles of Logic
Circuits I
a) x y p +xyp
= xp(y+ y)
= xp
b) xyp +xy p
= xy
c) xyp+xyp
= yp
Thus, C = x p + x y + y p
In case of K-Map for ‘S’, there are no adjacencies. Therefore,
S= x y p+ x y p +xy p +xyp
Till now we have discussed about addition of bit only but what will happen if we are
actually adding two numbers. A number in computer can be 4 byte i.e. 32 bit long or
even more. Even for these cases the basic unit is the full adder. Let us see (for
example) how can we construct an adder which adds two 4 bit numbers. Let us
assume that the numbers are: x3 x2 x1 x0 and y3 y2 y1 y0; here xi and yi (i = 0 to 3)
represent a bit. The 4-bit adder is shown in figure 3.14.
The control input ‘x’ controls the operations i.e. if x =0 then the circuit behaves like
an adder and if x =1 then circuit behaves like a subtractor. The operation is
summarized as :
a) When x = 0, c = 0, the output of all XOR gates will be the same as the
corresponding input Bi where i = 0 to 3. Thus, Ai & Bi are added through full
adders giving Sum, Si & carry Ci
76
b) When x = 1, the output of all XOR gates will be complement of input Bi where i Principles of Logic
Circuits I
=0 to 3, to which carry C0=1 is added. Thus, the circuit finds A plus 2’s
complement of B, that is equal to A!B.
3.6.2 Decoders
Decoder converts one type of coded information to another form. A decoder has ‘n’
inputs and an enable line (a sort of selection line) and 2n output lines. Let us see an
example of 3 6 8 decoder which decodes a 3 bit information and there is only one
output line which gets the value 1 or in other words, out of 23 = 8 lines only 1 output
line is selected. Thus, depending on selected output line the information of the 3 bits
can be recognized or decoded.
Please make sure while constructing the logic diagram wherever the values in the truth
table are appearing as zero in input and one in output the input should be fed in
complemented form e.g. the first 4 entries of truth table contains 0 in I0 position and
hence I0 value 0 is passed through a NOT gate and fed to AND gates ‘a’, ‘b’, ‘c’ and
‘d’ which implies that these gates will be activated/selected only if I0 is 0. If I0 value is
1 then none of the top 4 AND gates can be activated. Similar type of logic is valid for
I1. Please note the output line selected is named 000 or 010 or 111 etc. The output
value of only one of the lines will be 1. These 000, 010 indicates the label and suggest
that if you have these I0 I1 I2 input values the labeled line will be selected for the
output. The enable line is a good resource for combining two 3 6 8 decoders to make
one 4 6 16 decoder.
77
Introduction to Digital
Circuits
3.6.3 Multiplexer
Multiplexer is one of the basic building units of a computer system which in principle
allows sharing of a common line by more than one input lines. It connects multiple
input lines to a single output line. At a specific time one of the input lines is selected
and the selected input is passed on to the output line. The diagram 4 6 1 multiplexer
( MUX) is given in figure 3.16.
( c) Logic diagram
But how does the multiplexer know which line to select? This is controlled by the
select lines. The select lines provide the communication among the various
components of a computer. Now let us see how the multiplexer also known as MUX
works, here for simplicity we will take the example of 4 6 1 MUX i.e. there are 4
input lines connected to 1 output line. For the sake of consistency we will call input
line as I, and output line as O and control line a selection line S or enable as E.
Please notice the way in which S0 and S1 are connected in the circuit. To the ‘a’ AND
gate S0 and S1 are inputted in complement form that means ‘a’ gate will output I0 when
both the selection lines have a value 0 which implies S0 = 1 and S1 = 1, i.e. S0= 0
and S1=0 and hence the first entry in the truth table. Please note that at S0 = 0 and S1 =
0, AND gate ‘b’, ‘c’, ‘d’ will yield 0 output and when all these outputs will pass OR
gate ‘e’ they will yield I0 as the output for this case. That is for S0=0 and S1=0 the
output becomes I0, which in other words can be said as “ For S0 = 0 and S1 = 0, I0
input line is selected by MUX”. Similarly other entries in the truth table are
corresponding to the logical nature of the diagram. Therefore, by having two control
lines we could have a 4 6 1 MUX. To have 8 6 1 MUX we must have 3 control lines or
with 3 control lines we could make 23 = 8 i.e. 8 6 1 MUX. Similarly, with ‘n’ control
lines we can have
2n 6 1 MUX. Another parameter which is predominant in MUX design is a number of
inputs to AND gate. These inputs are determined by the voltage of the gate, which
normally support a maximum of 8 inputs to a gate.
78
Where can these devices used in the computer? The multiplexers are used in digital Principles of Logic
Circuits I
circuits for data and controlled signal routing.
We have seen a concept where out of ‘n’ input lines, 1 can be selected, can we have a
reverse concept i.e. if we have one input line and data is transmitted to one of the
possible 2n lines where ‘n’ represents the number of selection lines. This operation is
called Demultiplexing.
3.6.4 Encoders
An Encoder performs the reverse function of the decoder. An encoder has 2n input
lines and ‘n’ output line. Let us see the 8 6 3 encoder which encodes 8 bit information
and produces 3 outputs corresponding to binary numbers. This type of encoder is also
called octal–to– binary encoder. The truth table of encoder is shown in figure 3.17.
I0 I1 I2 I3 I4 I5 I6 I7 O2 O1 O0
1 0 0 0 0 0 0 0 D0 0 0 0
0 1 0 0 0 0 0 0 D1 0 0 1
0 0 1 0 0 0 0 0 D2 0 1 0
0 0 0 1 0 0 0 0 D3 0 1 1
0 0 0 0 1 0 0 0 D4 1 0 0
0 0 0 0 0 1 0 0 D5 1 0 1
0 0 0 0 0 0 1 0 D6 1 1 0
0 0 0 0 0 0 0 1 D7 1 1 1
From the encoder table, it is evident that at any given time only one input is assumed
to have 1 value. This is a major limitation of encoder. What will happen when two
inputs are together active? The obvious answer is that since the output is not defined
the ambiguity exists. To avoid this ambiguity the encoder circuit has input priority so
that only one input is encoded. The input with high subscript can be given higher
priority. For example, if both D2 and D6 are 1 at the same time, then the output will be
110 because D6 has higher priority then D2.
The encoder can be implimented with 3 OR gates whose inputs can be determined
from the truth table. The output can be expressed as:
O0 = I1 + I3 + I5 + I7
O1 = I2 + I3 + I6 + I7
O2 = I4 + I5 + I6 + I7
You can draw the K-Maps to determine above functions and draw the related
combinational circuit
79
Introduction to Digital
Circuits
3.6.5 Programmable Logic Array
Till now the individual gates are treated as basic building blocks from which various
logic functions can be derived. We have also learned about the stratergies of
minimization of number of gates. But with the advancement of technology the
integration provided by integrated circuit technology has increased resulting into
production of one to ten gates on a single chip (in small scale integration). The gate
level designs are constructed at the gate level only but if the design is to be done using
these SSI chips the design consideration needs to be changed as a number of such SSI
chips may be used for developing a logic circuit. With MSI and VLSI we can put even
more gates on a chip and can also make gate interconnections on a chip. This
integeration and connection brings the advantages of decreased cost, size and
increased speed. But the basic drawback faced in such VLSI & MSI chip is that for
each logic function the layout of gate and interconnection needs to be designed. The
cost involved in making such custom designed is quite high. Thus, came the concept
of Programmable Logic Array, a general purpose chip which can be readily adopted
for any specific purpose.
The PLA are designed for SOP form of Boolean function and consist of regular
arrangements of NOT, AND & OR gate on a chip. Each input to the chip is passed
through a NOT gate, thus the input and its complement are available to each AND
gate. The output of each AND gate is made available for each OR gate and the output
of each OR gate is treated as chip output. By making appropriate connections any
logic function can be implemented in these Programmable Logic Array.
80
The figure 3.18(a) shows a PLA of 3 inputs and 2 outputs. Please note the Principles of Logic
Circuits I
connectivity points, all these points can be connected if desired. Figure 3.18(b) shows
an implementation of logic function:
Figure 3.19 shows the block diagram of ROM. It consists of ‘k’ input address lines
and ‘n’ output data lines. An m 6 n ROM is an array of binary cell organised into m
(2k = m) words of ‘n’ bits each. The ROM does not have any data input because the
write operation is not defined for ROM. ROM is classified as a combinational circuit
and constructed internally with decoder and a set of OR gates.
In general, a m 6 n ROM (where m= 2k, k = no. of address lines) will have an internal
k 6 2k decoder and ‘n’ OR gate. Each OR gates has 2k inputs which are connected to
each of the outputs of the decoder.
81
Introduction to Digital Check Your Progress 3
Circuits
1) Draw a Karnaugh Map for 5 variables.
………………………………………………………………………………..
……………………………………………………………………………….
……………………………………………………………………………….
2) Map the function having 4 variables in a K- Map and draw the truth table. The
funcion is
F (A, B, C, D) = (2,6,10,14).
………………………………………………………………………………..
………………………………………………………………………………..
………………………………………………………………………………..
3) Find the optimal logic expression for the above function. Draw the reasultant
logic diagram.
………………………………………………………………………………..
……………………………………………………………………………….
……………………………………………………………………………….
4) What are the advantages of PLA?
……………………………………………………………………………….
……………………………………………………………………………….
……………………………………………………………………………….
5) Can a full adder be constructed using 2 half adders?
………………………………………………………………………………
………………………………………………………………………………
………………………………………………………………………………
3.7 SUMMARY
This unit provides you the information regarding a basis of a computer system. The
key elements for the design of a combinational circuit like adders etc. are discussed in
this unit. With the advent of PLA’s the designing of circuit is changing and now the
scenario is moving towards micro processors. With this developing scenario in the
forefront and the expectation of Ultra- Large- Integration (ULSI) in view, time is not
far of when design of logic circuits will be confined to single microchip components.
You can refer to latest trends of design and development including VHDL (a hardware
design language) in the further readings.
3. 8 SOLUTIONS/ANSWERS
4 54
= A,B. A,B 5
= ( A + B ) . (A+ B )
= ( A + B ). A + ( A + B ) B
= A .A+A B + A . B + B . B
= 0 + A B+A B+B
= 0 + B (A+ A ) + B
= 0 + B+B = B
3.
4.
5.
A B C F= (A B C + A B C )
0 0 0 0
0 0 1 0
0 1 0 1
0 1 1 0
83
Introduction to Digital 1 0 0 1
Circuits
1 0 1 0
1 1 0 0
1 1 1 0
(ii)
A B F=(A+B). ( A + B )
0 0 0
0 1 1
1 0 1
1 1 0
2 (i)
F = ( ( A.B ) + B)
= +A+B + B
= A+1 (B+ B is always 1)
= 1
(ii)
F = ( A.B) . ( A B )
= ( A + B ). ( A B )
= AAB + A BB
= AB + AB
= AB
3
F2 A B, BC , A D
4
F2 A B, BC , A D 5
F = 4A B5. 4B C 5. 4A D 5
F = 4A , B5 . 4B , C 5. 4A , D 5
84
Principles of Logic
Circuits I
2 K-Map
Truth table
A B C D F
0 0 0 0 0
0 0 0 1 0
0 0 1 0 1
0 0 1 1 0
0 1 0 0 0
0 1 0 1 0
0 1 1 0 1
0 1 1 1 0
1 0 0 0 0
85
Introduction to Digital 1 0 0 1 0
Circuits
1 0 1 0 1
1 0 1 1 0
1 1 0 0 0
1 1 0 1 0
1 1 1 0 1
1 1 1 1 0
5.
86
Principles of Logic
UNIT 4 PRINCIPLE OF LOGIC CIRCUITS II Circuits II
4.0 INTRODUCTION
By now you are aware of the basic configuration of computer systems, how the data is
represented in computer systems, logic gates and combinational circuits. In this unit
you will learn how all the computations are performed inside the system. You will
come across terms like flip flops, registers, counters, sequential circuits etc. Here, you
will also learn how to make circuits using combinational and sequential circuits.
These circuit design will help you in performing practicals in MCSL-017 lab course.
4.1 OBJECTIVES
After going through this unit you will be able to:
87
Introduction to Digital
Circuits
These sequential circuits unlike combinational circuits are time dependent. The
sequential circuits are broadly classified, depending upon the time at which these are
observed and their internal state changes. The two broad classifications of sequential
circuits are:
! Synchronous
! Asynchronous
Synchronous circuits use flip-flops and their status can change only at discrete
intervals (Doesn’t it seems as good choice for discrete digital devices such a
computers?). Asynchronous sequential circuits may be regarded as combinational
circuit with feedback path. Since the propagation delays of output to input are small,
they may tend to become unstable at times Thus, complex asynchronous circuits are
difficult to design.
The synchronization in a sequential circuit is achieved by a clock pulse generator,
which gives continuous clock pulse. Figure. 4.2. shows the form of a clock pulse.
A clock pulse can have two states: - 0 or 1; disabled or active state. The storage
elements can change their state only when a clock pulse occurs. Sequential circuits
that have clock pulses as input to flip-flops are called clocked sequential circuit.
4.3 FLIP-FLOPS
Let us see flip-flops in detail. A flip-flop is a binary cell, which stores 1-bit of
information. It itself is a sequential circuit. By now we know that flip-flop can change
its state when clock pulse occurs but when? Generally, a flip-flop can change its state
when the clocks transitions from 0 to 1 (rising edge) or from 1 to 0 (falling edge) and
not when clock is 1. If the storage element changes its state when clock is exactly at 1
then it is called latch. In simple words, flip-flop is edge-triggered and latch is level-
triggered.
88
Principles of Logic
4.3.1 Basic Flip-flops Circuits II
Let us first see a basic latch. A latch or a flip-flop can be constructed using two NOR
or NAND gates. Figure 4.3 (a) shows logic diagram for S-R latch using NOR gates.
The latch has two inputs S & R for set and reset respectively. When the output is
Q=1 & Q =0, the latch is said to be in the set state. When Q=0 & Q =1, it is the reset
state. Normally, The outputs Q & Q are complement of each other. When both inputs
are equal to 1 at the same time, an undefined state results, as both outputs are equal to
0.
Figure 4.3 (b) Shows truth table for S-R latch. Let us examine the latch more
closely.
iii) When both S & R go to 1 simultaneously, the two outputs go to 0. This gives
undefined state.
Let us try to construct most common flip- flops from this basic latch.
R-S Flip flop - The graphic symbol of S-R flip-flop is shown in Fig 4.4. It has three
inputs, S (set), R (reset) and C (for clock). The Q(t+1) is the next state of flip-flop
after the occurrence of a clock pulse. Q(t) is the present state, that is present Q value
(Set-1 or Reset– 0).
89
Introduction to Digital
Circuits
In figure 4.4 (a), the arrowhead symbol in front of clock pulse C indicates that the
flip-flop responds to leading edge (from 0 to 1) of input clock signal.
Operation of R-S flip-flop can be summarised as:
1) If no clock signal i.e. C=0 then output can not change irrespective of R & S
values
2) When clock signal changes from 0 to 1 and S=1, R=0 then output Q=1 & Q =0
(Set)
3) If R=1 S=0 & clock signal C changes from 0 to 1 then output Q=0 & Q =1
(Reset)
4) During positive clock transition if both S & R become 1 then output is not
defined, as it may become 0 or 1 depending upon internal timing delays
occurring in circuit.
D Flip -Flop
The D (data) flip-flop is modification of RS flip-flop. The problem of undefined
output in SR flip-flop when both R & S become 1 gets avoided in D flip-flop. The
simple solution to avoid such condition is by providing just a single input. Thus, the
non-clocked inputs to AND gates (S &R of fig 4.4 (b)) are guaranteed to be opposite
of each other by inserting an inverter between them. The logic diagram and
characteristic table of D flip flop is shown in figure 4.5.
T flip-flop
T (Toggle) flip-flop is obtained from JK flip-flop by joining inputs J &K together. The
implementation of T flip-flop is shown in figure. 4.7. When T=0, the clock pulse
transition does not change the state. When T=1, the clock pulse transition complement
the state of the flip-flop.
91
Introduction to Digital
Circuits
4.3.2 Excitation Tables
The characteristic tables of flip-flops provide the next state when inputs and the
present state are known. These tables are useful for analysis of sequential circuits.
But, during the design process, we know the required transition from present state to
next state and wish to find the required flip-flop inputs. Thus comes the need of a
table that lists the required input values for given change of state. Such a table is
called excitation Table. Fig 4.8 shows excitation tables for all flip-flops.
Q(t) & Q(t+1) indicates present and next state for flip a flop, respectively. The symbol
X in the table means don’t care condition i.e. doesn’t matter whether input is 0 or 1.
Let us discuss more deeply, how these excitation tables are formed. For this, we take
an example of J-K Flip flop.
1) The state transition from present state 0 to next state 0 (Figure 408 (a) can be
achieved when
(a) J=0, K=0, then no change in the state of flip flop
(b) J=0, K=1, then flip flop resets i.e. 0
(remember J-K Characterstic table from figure 4.6)
Thus in either case J=0 but K can be 0 or 1 that is represented by don’t care
condition X.
2) The state transition from present state 0 to next state 1 can be achieved when
(a) J=1, K=0, then flip flop is set i.e. 1
(b) J=1, K=1, then flip flop is complemented i.e.change from 0 to 1
Here, also in either case J=1 but K can be 0 or1 that means again K is
represented as a don’t care case.
3) Similarly, state transition from present state 1 to next state 0 can be achieved
when
(a) J=0, K=1, flip flop is reset i.e.0
(b) J=1, K=1, flip flop is complemented i.e. changes from 1 to 0
92
This indicates that in either case K=1 but J can be either 0 or 1 thus don’t care case. Principles of Logic
Circuits II
4) For state transition from present state 1 to next state 1 can be achieved when
(a) J=0, K=0, no change in flip flop
(b) J=1, K=0, flip flop is set i.e 1
Thus J is don’t care case but K=0.
This whole process can be summarized in the table below:
Present State Next State Can be achieved
…………………………………………………………………………………………
.....................................................................................................................................
…………………………………………………………………………………………
93
Introduction to Digital
Circuits
(ii) When inputs are applied at JK and clock pulse becomes 1, only master gets
activated resulting in intermediate output Y going to state 0 or 1 depending on
the input and previous state. Remember that during this time slave is also
maintaining its previous state only. As the clock pulse becomes 0, the master
becomes inactive and slave acquires the same state as master as explained in (a)
and (b) conditions above.
But why do we require this master-slave combination? To understand this, consider a
situation where output of one flip-flop is going to be input of other flip-flop. Here, the
assumption is that clock pulse inputs of all flip-flops are synchronized and occur at the
same time. The change of state of master occurs when the clock pulse goes to 1 but
during that time the output of slave still has not changed, thus the state of the flip-
flops in the system can be changed simultaneously during the same clock pulse even
though output of flip-flops are connected to the inputs of other flip-flops.
94
Principles of Logic
Circuits II
Output
can not Output
change cannot
Positive … change
Transition Negative
Transition
(a) Pulse in positive edge-triggered flip-flop (b) Pulse in Negative edge-triggered flip flop
D Q D Q
CC
The effective positive clock transition includes a minimum time called setup time, for
which the D input must be maintained at constant value before the occurrence of clock
transition. Similarly, a minimum time called hold time, for which the D input must
not change after the application of positive transition of the pulse.
Check Your Progress 2
1. What are the advantages of master- slave flip-flop?
.....................................................................................................................................
.....................................................................................................................................
95
Introduction to Digital 1) Draw state table or state diagram from the problem statement, (if state diagram
Circuits
is available, draw state table also)
2) Give binary codes to states.
3) From state table, make input equation in simplified form. i.e. generating
Boolean functions which describes signals for the inputs of flip-flops.
4) From state table, derive output equation in simplified form.
5) Draw logic diagram with required flip-flops and combinational circuits.
Let us take an example to illustrate the above procedure. Suppose we want to design
2-bit binary counter using D flip-flop. The circuit goes through repeated binary states
00, 01, 10 and 11 when external input X = 1 is applied. The state of circuit will not
change when X = 0. The state table & state diagram for this is shown in figure 4.12.
But how do we make this state diagram? Please note the number of flip-flops– 2 in
our example as we are designing 2 bits counter. Various states of two bit input would
be 00 01 10 and 11. These are shown in circle. The arrow indicate the transitions
on an input value X. For example, when the counter is in state 00 and input value
X=0 occurs, the counter remains in 00 state. Hence the loop back on X= 0. However,
on encountering X=1 the counter moves to state 01. Like wise in all other states
similar transition occur. For making state table remember the excitation table of D
flip-flop given in figure 4.8 (c).
The present state of the two flip-flops and next states of the flip-flops are put into the
table along with any input value. For example, if the present state of flip-flops is 01
and input value is 1 then counter will move to state 10. Notice these values in the
fourth row of the values in the state table (figure 4.12 (a)
Or we can write as
A B A (Next) B (Next)
0 1 X =1 1 0
This implies that flip-flop. A has moved from state clear to set. As we are making the
counter using D flip-flop, the question is what would be the input DA value of A flip-
flop that allows this transition that is Q(t) = 0 to Q(t+1) =1 possible for A flip flop.
On checking the excitation table for D Flip-flop, we find the value of D input of A
flip-flop (called DA in this example) would be 1. Similarly, the B flip-flop have a
transition Q(t) = 1 to Q(t+1)=0, thus, DB, would be 0. Hence notice the values of flip-
flop inputs DA and DB. (Row 3).
Next step indicates simplification of input equation to flip-flop which is done using
K-Maps as shown in fig 4.13. But why did we make K-map for DA or DB which
happens to be flip-flop input values? Please note in sequential circuit design, we are
96
designing the combinational logic that controls the state transition of flip-flops. Thus, Principles of Logic
Circuits II
each input to a flip-flop is one output of this combinational logic and the present state
of flip-flops and any other input value form the input values to this combinational
logic.
DA = A B + A X + A BX
DB = B X + BX
97
Introduction to Digital Note: Similarly, the sequential circuits can be designed using any number of flip-
Circuits
flops using state diagrams and combinational circuits design methods.
4.5.1 Registers
A register is a group of flip-flops, which store binary information, and gates, which
controls when and how information is transferred to the register. An n-bit register has
n flip-flops and stores n-bits of binary information. Two basic types of registers are:
parallel registers and shift registers.
A parallel register is one of the simplest registers, consisting of a set of flip-flops that
can be read or written simultaneously. Fig. 4.15 shows a 4-bit register with parallel
input-output. The signal lines Io to I3 inputs to flip-flops, which may be output of other
arithmetic circuits like multipliers, so that data from different sources can be loaded
into the register. It has one additional line called clear line, which can clears the
register completely. This register is called a parallel register as all the bits of the
register can be loaded in a single clock pulse.
A shift register is used for shifting the data to the left or right. A shift register operates
in serial input-output mode i.e. data is entered in the register one bit at a time from one
end of the register and can be read from the other end as one bit at a time. Fig. 4.16
shows a 4-bit right shift register using D logical shift functions.
98
Please note that in this register signal shift enable is used instead of clock pulse, why? Principles of Logic
Circuits II
Because it is not necessary that we want the register to perform shift on each clock
pulse.
A register, which shifts data only in one direction, is called uni-directional shift
register and a register, which can shift data in both directions, is called bi-directional
shift register. Shift register can be constructed for bi-directional shift with parallel
input-output. A general shift register structure may have parallel data transfer to or
from the register along with added facility of left or right shift. This structure will
require additional control lines for indicating whether parallel or serial output is
desired and left or right shift is required. A general symbolic diagram is shown in Fig.
4.17 for this register.
There are 3 main control lines shown in the above figure. If parallel load enable is
active, parallel input-output operation is done otherwise serial input- output shift
select line for selecting right or left shift. If it has value 0 then right shift is performed
and for value 1, left shift is done. Shift enable signal indicates when to start shift.
99
Introduction to Digital
Circuits
The input line to J & K of all flip-flops is kept high i.e. logic1. Each time a clock
pulse occurs the value of flip-flop is complemented (Refer to characteristic table of J
K flip-flop in Figure. 4.6 (c). Please note that the clock pulse is given only to first flip-
flop and second flip-flop onwards, the output of previous flip-flop is fed as clock
signal. This implies that these flip-flops will be complemented if the previous flip-flop
has a value 1. Thus, the effect of complement will ripple through these flip-flops.
You can understand the working of this counter by analyzing the sequence of states
(O0, O1, O2) given in Figure 4.20
O2 O1 O0
0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
1 1 1
0 0 0
RAMs are organized (logically) as words of fixed length. The memory communicates
with other devices through data input and output lines, address selection lines and
control lines that specify the direction of transfer.
Now, let us try to understand how data is stored in memory. The internal construction
of a RAM of ‘m’ words and ‘n’ bits per word consists of m " n binary cells and
associated circuits for dectecting individual words. Figure 4.21 shows logic diagram
and block digram of a binary cell.
The input is fed to AND gate ‘a’ in complemented form. The read operation is
indicated by 1 on read/ write signal. Therefore during the read operation only the
‘AND’ gate ‘c’ becomes active. If the cell has been selected, then the output will
become equal to the state of flip flop i.e. the data value stored in flip flop is read. In
write operation ‘a’ & ‘b’ gates become active and they set or clear the J-K flip flop
depending upon the input value. Please note in case input is 0, the flip flop will go to
clear state and if input is 1, the flip flop will go to set state. In effect, the input data is
101
Introduction to Digital reflected in the state of flip-flop. Thus, we say that input data has been stored in flip-
Circuits
flop or binary cell.
Fig 4.22 is the extension of this binary cell to an IC RAM circuit, where a 2 " 4
decoder is used to select one of the four words. (For 4 words we need 2 address lines)
Please note that each decoder output is connected to a 4bit word and the read/write
signal is given to each binary cell. Once the decoder selects the word, the read/write
input tells the operation. This is derived using an OR gate, since all the non-selected
cells will produce a zero output. When the memory select input to decoder is 0, none
of the words is selected and the contents of the cell are unchanged irrespective of
read/write input.
102
After discussing so much about combinational circuits and sequential circuits, let us Principles of Logic
Circuits II
discuss in the next section an example having a combination of both circuits.
[NOTE : Remember excitation table for J-K flip flop given in fig 4.8]
There are 4 flip-flop inputs for decade counter i.e. A, B, C, D. The next state of
flip-flop is given in the table. JA & KA indicates the flip flop input corresponding to
flip-flop-A. Please note this counter require 4-flip-flops.
From this the flip flop input equations are simplified using K-Maps as shown in figure
4.24. The unused minterms from 1010 through 1111 are taken as don’t care
conditions.
103
Introduction to Digital
Circuits
104
Principles of Logic
Check Your Progress 3 Circuits II
1) Differentiate between synchronous & asynchronous counters?
.....................................................................................................................................
.....................................................................................................................................
.....................................................................................................................................
2) Can ripple counter be constructed from a shift register?
.....................................................................................................................................
.....................................................................................................................................
.....................................................................................................................................
3) Can we design a counter with the following repeated binary sequence:
0,1,2,3,4,5,6. If yes, design it using J K flip flop.
.....................................................................................................................................
.....................................................................................................................................
.....................................................................................................................................
4.7 SUMMARY
As told to you earlier this unit provides you information regarding sequential circuits
which is the foundation of digital design. Flip-flops are basic storage unit in sequential
circuits are derived from the latches. The sequential circuit can be formed using
combinational circuits (discussed in the last unit) and flip flops. The behavior of
sequential circuit can be analyzed using tables & state diagrams.
Registers, counters etc. are structured sequential blocks. This unit has outlined the
construction of registers, counters, RAM etc. Lastly, we discussed how a circuit can
be designed using both sequential & combinational circuits. For more details, the
students can refer to further reading.
2) Flip flop is the basic storage element for synchronous sequential circuits.
Whereas latches are bistable devices whose state normally depends upon the
asynchronous inputs and are not suitable for use in synchronous sequential
circuits using single clock.
3) Excitation table indicates that if present and next state are known then what will
be inputs whereas a characteristics table indicates just opposite of this i.e. inputs
are known the, next state has to be found.
105
Introduction to Digital Check Your Progress 3
Circuits
1) The main difference is the time when the counter flip-flops change its states. In
synchronous counter all the flip flops that need to change; change
simultaneously. In asynchronous counter the complement if to be done may
ripple through a series of flip-flops.
2) Yes, but this: circuit will generate sequence of states where only 1-bit changes
at a time i.e. 0000, 1000, 1100, 1110, 1111, 0111, 0011, 0001
3) Yes, We require 23 i.e. three flip flops for the sequence 0, 1, 2, 3, 4, 5&6.
0 0 0 0 0 1 0 X 0 X 1 X
0 0 1 0 1 0 0 X 1 X X 1
0 1 0 0 1 1 0 X X 0 1 X
0 1 1 1 0 0 1 X X 1 X 1
1 0 0 1 0 1 X 0 0 X 1 X
1 0 1 1 1 0 X 0 1 X X 1
1 1 0 0 0 0 X 1 X 1 0 X
The state is don’t care condition: Make the suitable K-maps. The following are the
flip-flop input values:
JA= BC KA = B
JB = C KB = C + A
JC = A + B KC = 1
106
The Memory System
UNIT 1 THE MEMORY SYSTEM
Structure Page Nos.
1.0 Introduction 5
1.1 Objectives 5
1.2 The Memory Hierarchy 5
1.3 RAM, ROM, DRAM, Flash Memory 7
1.4 Secondary Memory and Characteristics 13
1.4.1 Hard Disk Drives
1.4.2 Optical Memories
1.4.3 CCDs, Bubble Memories
1.5 RAID and its Levels 21
1.6 The Concepts of High Speed Memories 26
1.6.1 Cache Memory
1.6.2 Cache Organisation
1.6.3 Memory Interleaving
1.6.4 Associative Memory
1.7 Virtual Memory 34
1.8 The Memory System of Micro-Computer 36
1.8.1 SIMM, DIMM, etc., Memory Chips
1.8.2 SDRAM, RDRAM, Cache RAM Types of Memory
1.9 Summary 39
1.10 Solutions /Answers 39
1.0 INTRODUCTION
In the previous Block, we have touched upon the basic foundation of computers,
which include concepts on von Neumann machine, instruction, execution, the digital
data representation and logic circuits. In this Block we will define some of the most
important component units of a computer, which are the memory unit and the input-
output units. In this unit we will discuss various components of the memory system
of a computer system. Computer memory is organised into a hierarchy to minimise
cost. Also, it does not compromise the overall speed of access. Memory hierarchy
include cache memory, main memory and other secondary storage technologies. In
this Unit, we will discuss the main memory, the secondary memory and high-speed
memories such as cache memory, and the memory system of microcomputer.
1.1 OBJECTIVES
After going though this Unit, you will be able to:
5
Basic Computer “The storage devices along with the algorithm or information on how to control and
Organisation
manage these storage devices constitute the memory system of a computer.”
A memory system is a very simple system, yet it exhibits a wide range of technology
and types. The basic objective of a computer system is to increase the speed of
computation. Likewise the basic objective of a memory system is to provide fast,
uninterrupted access by the processor to the memory such that the processor can
operate at the speed it is expected to work.
But does this kind of technology where there is no speed gap between processor and
memory speed exist? The answer is yes, it does. Unfortunately as the access time
(time taken by CPU to access a location in memory) becomes less the cost per bit of
memory becomes higher. In addition, normally these memories require power supply
till the information needs to be stored. Both these things are not very convenient, but
on the other hand the memories with smaller cost have very high access time that will
result is in slower operation of the CPU. Thus, the cost versus access time anomaly
has led to a hierarchy of memories where we supplement fast memories with larger,
cheaper, slower memories. These memory units may have very different physical and
operational characteristics; therefore, the memory system is very diverse in type, cost,
organisation, technology and performance. This memory hierarchy will work only if
the frequency of access to the slower memories is significantly less than the faster
memories. The memory hierarchy system consists of all storage devices employed in a
computer system from the slow but high capacity auxiliary memory to a relatively
faster main memory, to an even smaller and faster cache memory accessible to the
high speed registers and processing logic. Figure 1 illustrates the components of a
typical memory system.
Magnetic Disks
A typical storage hierarchy is shown in Figure 1 above. Although Figure 1 shows the
block diagram, it includes the storage hierarchy:
Register
Cache memory
Main memory
Secondary Storage and
Mass Storage.
As we move up the hierarchy, we encounter storage elements that have faster access
time, higher cost per bit stored, and slower access time as a result of moving down the
hierarchy. Thus, cache memory generally has the fastest access time, the smallest
storage capacity, and the highest cost per bit stored. The primary memory (main
memory) falls next in the storage hierarchy list. On-line, direct-access secondary
storage devices such as magnetic hard disks make up the level of hierarchy just below
the main memory. Off-line, direct-access and sequential access secondary storage
devices such as magnetic tape, floppy disk, zip disk, WORM disk, etc. fall next in the
storage hierarchy. Mass storage devices, often referred to as archival storage, are at
6
the bottom of the storage hierarchy. They are cost-effective for the storage of very The Memory System
large quantities of data when fast access time is not necessary.
Let us now discuss various forms of memories in the memory hierarchy in more
details.
The construction shown in Figure 2(a) is made up of one JK flip-flop and 3 AND
gates. The two inputs to the system are one input bit and read/write signal. Input is fed
in complemented form to AND gate ‘a’. The read/write signal has a value of 1 if it is a
read operation. Therefore, during the read operation the AND gate ‘c’ has the
read/write input as 1. Since AND gate ‘a’ and ’b’ have 0 read/write input, and if the
7
Basic Computer chip is selected i.e. this cell is currently being selected, then output will become equal
Organisation
to the state of flip-flop. In other words the data value stored in flip-flop has been read.
In write operation only ‘a’ and ‘b’ gates get a read/write value of 1 and they set or
clear the JK flip-flop depending on the data input value. If the data input is 0, the flip-
flop will go to clear state and if data input is 1, the flip-flop will go to set state. In
effect, the input data is reflected in the state of the flip-flop. Thus, we say that the
input data has been stored in flip-flop or binary cell.
A 32 " 4 RAM means that this RAM has 32 words, 5 address lines (25 = 32), and 4 bit
data word size. Please note that we can represent a RAM using 2A"D, where A is the
number of address lines and D is the number of Data lines. Figure 2 (b) is the
extension of the binary cell to an integrated 32 " 4 RAM circuit where a 5 " 32 bit
decoder is used. The 4 bit data inputs come through an input buffer and the 4-bit data
output is stored in the output buffer.
A chip select ( CS ) control signal is used as a memory enable input. When CS = 0 that
is CS = 1, it enables the entire chip for read or write operation. A R/W signal can be
used for read or write operation. The word that is selected will determine the overall
output. Since all the above is a logic circuit of equal length that can be accessed in
equal time, thus, the word RAM.
9
Basic Computer For the write operation (please refer to Figure 3 (a), a voltage signal is applied to the
Organisation
bit line; a high voltage represents 1, and a low voltage represents 0. A signal is then
applied to the address line, allowing a charge to be transferred to the capacitor.
For the read operation, when the address line is selected, the transistor turns on and
the charge stored on the capacitor is fed out onto a bit line and to the sense amplifier.
The sense amplifier compares the capacitor voltage to a reference value and
determines if the cell contains logic 1 or logic 0. The read out from the cell
discharges the capacitor, which must be restored to complete the operation.
Although the DRAM cell is used to store a single bit (0 or 1), it is essentially an
analog device. The capacitor can store any charge value within a range; a threshold
value determines whether the charge is interpreted as 1 or 0.
Organisation of DRAM Chip
The Figure 3(b) is a typical organisation of 16 mega bit DRAM. It shows a typical
organisation of 2048 × 2048 × 4 bit DRAM chip. The memory array in this
organisation is a square array that is (2048 × 2048) words of 4 bits each.
Each element, which consists of 4 bits of array, is connected by horizontal row lines
and vertical column lines. The horizontal lines are connected to the select input in a
row, whereas the vertical line is connected to the output signal through a sense
amplifier or data in signal through data bit line driver. Please note that the selection of
input from this chip requires:
! Row address selection specifying the present address values A0 to A10 (11
address lines only). For the rows, it is stored in the row address buffer through
decoder.
! Row decoder selects the required row.
! The column address buffer is loaded with the column address values, which
are also applied to through A0 to A10 lines only. Please note that these lines
should contain values for the column.
! This job will be done through a change in external signal RAS (Row address
Strobe) because this signal is high at the rising edge of the clock.
! CAS (Column address Strobe) causes the column address to be loaded with
these values.
! Each column is of 4 bits, that is, those require 4 bit data lines from input/output
buffer. On memory write operation data in bit lines being activated while on
read sense lines being activated.
! This chip requires 11 address lines (instead of 22), 4 data in and out lines and
other control lines.
! As there are 11 row address lines and 11 column address lines and each
column is of 4 bits, therefore, the size of the chip is 211 " 211 " 4 = 2048 "
2048 " 4 = 16 mega bits. On increasing address lines from 11 to 12 we have
212 " 212 " 4 = 64 mega bits, an increase of a factor of 4. Thus, possible sizes
of such chips may be 16K, 256K, 1M, 4M, 16M, and so on.
! Refreshing of the chip is done periodically using a refresh counter. One simple
technique of refreshing may be to disable read-write for some time and refresh
all the rows one by one.
n address lines
Input Output
n I1 I2 O1 O2
2 "m
ROM 0 0 0 1
0 1 1 0
1 0 1 1
M output bits 1 1 0 0
(a) ROM Block diagram (b) Truth table (c) A Sample ROM
Figure 4: ROM
A ROM is characterised by the number of words (2n ) and the number of bits (m) per
word. For example, a 32 × 8 ROM which can be written as 25 × 8 consists of 32 words
of 8 bit each, which means there are 8 output lines and 32 distinct words stored in the
unit. There are only 5 input lines because 32 = 25and with 5 binary variables, we can
specify 32 addresses.
ROMs are the memories on which it is not possible to write the data when they are on-
line to the computer. They can only be read. This is the reason why it is called read-
only memory (ROM). Since ROM chips are non-volatile, the data stored inside a
ROM are not lost when the power supply is switched off, unlike the case of a volatile
RAM chip. ROMs are also known as permanent stores.
The ROMs can be used for storing micro-programs, system programs and subroutines.
ROMs are non-volatile in nature and need not be loaded in a secondary storage
device. ROMs are fabricated in large numbers in a way where there is no room for
even a single error. But, this is an inflexible process and requires mass production.
Therefore, a new kind of ROM called PROM was designed which is also non-volatile
and can be written only once and hence the name Programmable ROM(PROM). The
supplier or the customer can perform the writing process in PROM electrically.
Special equipment is needed to perform this writing operation. Therefore, PROMs are
more flexible and convenient than ROMs.
The ROMs / PROMs can be written just once, but in both the cases whatever is
written once cannot be changed. But what about a case where you read mostly but
write only very few times? This led to the concepts of read mostly memories and the
best example of these are EPROMs (Erasable PROMs) and EEPROMs (Electrically
Erasable PROMs).
The EPROMs can be read and written electrically. But, the write operation is not
simple. It requires erasure of whole storage cells by exposing the chip to ultra violet
light, thus bringing them to the same initial state. Once all the cells have been brought
to same initial state, then the EPROM can be written electrically. EEPROMs are
becoming increasingly popular, as they do not require prior erasure of previous
11
Basic Computer contents. However, in EEPROMS the writing time is considerably higher than the
Organisation
reading time. The biggest advantage of EEPROM is that it is non-volatile memory and
can be updated easily, while the disadvantages are the high cost and at present they
are not completely non-volatile and the write operation takes considerable time. But
all these advantages are disappearing with growth in technology. In general, ROMs
are made of cheaper and slower technology than RAMs.
Flash Memory
This memory is another form of semiconductor memory, which was first introduced in
the mid-1980. These memories can be reprogrammed at high speed and hence the
name flash. This is a type of non-volatile, electronic random access memory.
Basically this memory falls in between EPROM and EEPROM. In flash memory the
entire memory can be erased in a few seconds by using electric erasing technology.
Flash memory is used in many I/O and storage devices. Flash memory is also used to
store data and programming algorithms in cell phones, digital cameras and MP3
music players.
Flash memory serves as a hard drive for consumer devices. Music, phone lists,
applications, operating systems and other data are generally stored on stored on flash
chips. Unlike the computer memory, data are not erased when the device is turned off.
Data Storage Flash made by San Disk, Toshiba, etc. It stores data and comes in
digital cameras and MP3 players.
12
It is desirable that the operating speed of the primary storage of a computer system be The Memory System
as fast as possible because most of the data transfer to and from the processing unit is
via the main memory. For this reason, storage devices with fast access times, such as
semiconductors, are generally used for the design of primary storage. These
high-speed storage devices are expensive and hence the cost per bit of storage is also
high for a primary storage. But the primary memory has the following limitations:
a) Limited capacity: The storage capacity of the primary storage of today ’s
computers is not sufficient to store the large volume of data handled by most of
the data processing organisations.
b) Volatile: The primary storage is volatile and the data stored in it is lost when the
electric power is turned off. However, the computer systems need to store data on
a permanent basis for several days, months or even several years.
The result is that an additional memory called secondary storage is used with most of
the computer systems. Some popular memories are described in this section.
Magnetic Disk
A disk is circular platter constructed of nonmagnetic material, called the substrate,
coated with a magnetisable material. This is used for storing large amount of data.
Traditionally, the substrate has been an aluminium or aluminium alloy material; more
recently, glass substrates have been introduced. The glass substrate has a number of
benefits, including the following:
13
Basic Computer
Organisation
The write mechanism is based on the fact that electricity flowing through a coil
produces a magnetic field. Pulses are sent to the write head, and magnetic patterns are
recorded on the surface below, with different patterns for positive and negative
currents. The write head itself is made of easily magnetisable material and is in the
shape of a rectangular doughnut with a gap along one side and a few turns of
conducting wire along the opposite side (Figure 6). An electric current in the wire
induces a magnetic field across the gap, which in turn magnetizes a small area of the
recording medium. Reversing the direction of the current reverses the direction of the
magnetization on the recording medium.
The traditional read mechanism is based on the fact that a magnetic field moving
relative to a coil produces an electrical current in the coil. When the surface of the
disk passes under the head, it generates a current of the same polarity as the one
already recorded. The structure of the head for reading is in this case essentially the
same as for writing and therefore the same head can be used for both. Such single
heads are used in floppy disk systems and in older rigid disk systems.
Figure 7 depicts this data layout. Adjacent tracks are separated by gaps. This prevents,
or at least minimizes, errors due to misalignment of the head. Data are transferred to
Figure 7: Layout of Magnetic Disk
and from the disk in sectors. To identify the sector position normally there may be a
starting point of a track and a starting and end point of each sector. But the question is
how is a sector of a track recognised? A disk is formatted to record control data on it
such that some extra data are stored on it for identical purpose. This control data is
14
accessible only to the disk drive and not to the user. Please note that in Figure 7 as we The Memory System
move away from the centre of a disk the physical size of the track is increasing. Does
it mean we store more data on the outside tracks? No. A disk rotates at a constant
angular velocity. But, as we move away from centre the liner velocity is more than the
liner velocity nearer to centre. Thus, the density of storage of information decreases as
we move away from the centre of the disk. This results in larger physical sector size.
Thus, all the sectors in the disk store same amount of data.
An example of disk formatting is shown in Figure 8. In this case, each track contains
30 fixed–length sectors of 600 bytes each. Each sector holds 512 bytes of data plus
control information useful to the disk controller. The ID field is a unique identifier or
address used to locate a particular sector. The SYNC byte is a special bit pattern that
delimits the beginning of the field. The track number identifies a track on a surface.
The head number identifies a head, because this disk has multiple surfaces. The ID
and data fields each contain an error-detecting code.
Physical Characteristics
Figure 9 lists the major characteristics that differentiate among the various types of
magnetic disks. First, the head may either be fixed or movable either respect to the
radial direction of the platter. In a fixed-head disk, there is one read-write head per
track. All of the heads are mounted on a rigid arm that extends across all tracks; such
systems are rare today. In a movable-head disk, there is only one read-write head.
Again, the head is mounted on an arm. Because the head must be able to be positioned
above any track, the arm can be extended or retracted for this purpose.
Head Motion Platters
Fixed head (one per track) Single platter
Moveable head (one per surface) Multiple platter
The disk itself is mounted in a disk drive, which consists of the arm, a shaft that
rotates the disk, and the electronics needed for input and output binary data. A non-
removable disk is permanently mounted in the disk drive; the hard disk in a personal
computer is a non-removable disk. A removable disk can be removed and replaced
with another disk. The advantage of the latter type is that unlimited amounts of data
are available with a limited number of disk systems. Furthermore, ZIP cartridge disks
are examples of removable disks. Figure 10 shows other components of the disks.
Platter
Surface
Read write
hea
d
Spindle
Head arm
Read/write
The head mechanism provides a classification of disks into three types. Traditionally,
the read-write head has been positioned at a fixed distance above the platter, allowing
an air gap. At the other extreme is a head mechanism that actually comes into physical
contact with the medium during a read or write operation. This mechanism is used
with the floppy disk, which is a small, flexible platter and the least expensive type of
disk.
To understand the third type of disk, we need to comment on the relationship between
data density and the distance of head from the surface. The head generates or senses
an electromagnetic field of sufficient magnitude to write and read properly. The
narrower the head is, the closer it must be to the platter surface to function. A
narrower head means narrower tracks and therefore greater data density, which is
desirable. However, the closer the head is to the disk, the greater are the risks of errors
from impurities or imperfections.
To push the technology further, the Winchester disk was developed. Winchester
heads are used in sealed drive assemblies that are almost free of contaminants. They
are designed to operate closer to the disk’s surface than conventional rigid disk heads,
thus allowing greater data density. The head is actually an aerodynamic foil that rests
16
lightly on the platter’s surface when the disk is motionless. The air pressure generated The Memory System
by a spinning disk is enough to make the foil rise above the surface. The resulting
non-contact system can be engineered to use narrower heads that operate closer to the
platter’s surface than conventional rigid disk heads.
17
Basic Computer
Organisation
! Sync: The sync field identifies the beginning of a block. It consists of a byte of
all 0s, 10 bytes of all 1s, and bytes of all 0s.
! Header: The header contains the block address and the mode byte. Mode 0
specifies a blank data field; mode 1 specifies the use of an error-correcting code
and 2048 bytes of data; mode 2 specifies 2336 bytes of user data with no error
correcting code.
! Data: User data.
! Auxiliary: Additional user data in mode 2. In mode 1, this is a 288-byte error
correcting code.
18
CD-ROM is appropriate for the distribution of large amounts of data to a large The Memory System
number of users. CD-ROMs are a common medium these days for distributing
information. Compared with traditional hard disks, the CD-ROM has three
advantages:
1. Large data/information storage capability.
2. The optical disk together with information stored on it can be mass replicated
inexpensively, unlike a magnetic disk. The database on a magnetic disk has to be
reproduced by copying data from one disk to second disk, using two disk drives.
3. The optical disk is removable, allowing the disk itself to be used for archival
storage. Most magnetic disks are non-removable. The information on non-
removable magnetic disks must first be copied on tape before the disk drive / disk
can be used to store new information.
The CD-R medium is similar to but not identical to that of a CD or CD-ROM. For
CDs and CD-ROMs, information is recorded by the pitting of the surface of the
medium, which changes reflectivity. For a CD-R, the medium includes a dye layer.
The resulting disk can be read on a CD-R drive or a CD-ROM drive.
The CD-R optical disk is attractive for archival storage of documents and files. It
provides a permanent record of large volumes of user data.
19
Basic Computer With the capacious digital versatile disk (DVD), the electronics industry has at last
Organisation
found an acceptable replacement for the videotape used in videocassette recorders
(VCRs) and, more important for this discussion, replace the CD-ROM in personal
computers and servers. The DVD has taken video into the digital age. It delivers
movies with impressive picture quality, and it can be randomly accessed like audio
CDS, which DVD machines can also play. Vast volumes of data can be crammed onto
the disk, several times as much as a CD-ROM. With DVD’s huge storage capacity and
vivid quality, PC games will become more realistic and educational software will
incorporate more video.
In these devices deposition of a soft magnetic material called Perm alloy is made as a
predetermined path, thus making a track. Bubbles are forced to move continuously in
a fixed direction on these tracks. In these memories the presence of a bubble
represents a 1 while absence represents a 0 state. For writing data into a cell, a bubble
generator to introduce a bubble or a bubble annihilator to remove a bubble, are
required. A bubble detector performs the read operation. Magnetic bubble memories
having capacities of 1M or more bits per chip have been manufactured. The cost and
performance of these memories fall between semi-conductor RAMs and magnetic
disks.
These memories are non-volatile in contrast to semi-conductor RAMs. In addition,
since there are no moving parts, they are more reliable than a magnetic disk. But these
memories are difficult to manufacture and difficult to interface with in conventional
processors. These memories at present are used in specialized applications, e.g., as a
secondary memory of air or space borne computers, where extremely high reliability
is required.
Check Your Progress 1
1. State True or False:
a) Bubble memories are non-volatile. T/F
20
b) The disadvantage of DRAM over static RAM is the need to refresh the The Memory System
capacitor charge every few milliseconds. T/F
c) Flash memory is a volatile RAM. T/F
2. Fill in the blanks:
a) The EPROM is ____________ erasable and _______________
programmable.
b) __________ memory requires a rechargeable cycle in order to retain its
information.
c) Memory elements employed specifically in computer memories are generally
___________ circuits.
3. Differentiate among RAM, ROM, PROM and EPROM.
……………………………………………………………………………………
4. What is a flash memory? Give a few of its typical uses.
……………………………………………………………………………………
5. A memory has a capacity of 4K " 8
(a) How many data input and data output lines does it have?
(b) How many address lines does it have?
(c) What is the capacity in bytes?
……………………………………………………………………………….......…
………………………………………………………………………………..….
6. Describe the internal architecture of a DRAM that stores 4K bytes chip size and
uses a square register array. How many address lines will be needed? Suppose the
same configuration exists for an old RAM, then how many address lines will be
needed?
………………………………………………………………………………………
………………………………………………………………………………………
7. How many RAM chips of size 256K " 1 bit are required to build 1M Byte
memory?
……………………………………………………………………………………...
……………………………………………………………………………………...
b0 b1 b2 b3 Parity(b)
RAID has been proposed at various levels, which are basically aimed to cater for the
widening gap between the processor and on-line secondary storage technology.
The basic strategy used in RAID is to replace the large capacity disk drive with
multiple smaller capacity disks. The data on these disks is distributed to allow
simultaneous access, thus improving the overall input/output performance. It also
allows an easy way of incrementing the capacity of the disk. Please note that one of
the main features of the design is to compensate for the increase in probability of
failure of multiple disks through the use of parity information. The seven levels of
RAID are given in Figure 13 shown above. Please note that levels 2 and 4 are not
commercially offered.
23
Basic Computer
Organisation
1.6 THE CONCEPTS OF HIGH SPEED
MEMORIES
I/O Request Rate Data Transfer Typical
RAID Category Features
(Read /write) Rate (Read /write) Application
Level
0 Striping a) The disk is divided into strips, Large strips: Small strip: Applications
maybe a block, a sector or other unit. Excellent Excellent requiring high
b) Non-redundant. performance for
non-critical data
1 Mirroring a) Every disk in the array has a mirror Good / fair Fair /fair System drives;
disk that contains the same data. critical files
b) Recovery from a failure is simple.
When a drive fails, the data may still
be recovered from the second drive.
2 Parallel a) All member disks participate in the Poor Excellent Commercially
Access execution of every I/O request by not useful.
synchronising the spindles of all the
disks to the same position at a time.
b) The strips are very small, often a
single byte or word.
c) Redundancy via hamming code
which is able to correct single-bit
errors and detect double-bit errors.
3 Parallel a) Employs parallel access as that of Poor Excellent Large I/O
Access level 2, with small data strips. request size
b) A simple parity bit is computed for application, such
the set of data instead of an error- as imaging CAD
correcting code in case a disk fails.
4 Independe a) Each member disk operates Excellent/ fair Fair / poor Commercially
nt access independently, thus enabling not useful.
fulfilment of separate input/output
requests in parallel.
b) Data strip is large and bit by bit
parity strip is created for bits of strips
of each disk.
c) Parity strip is stored on a separate
disk.
5 Independe a) Employs independent access as that Excellent / fair Fair / poor High request
nt access of level 4 and distributes the parity rate read
strips across all disks. intensive, data
b) The distribution of parity strips lookup
across all drives avoids the potential
input/output bottleneck found in level
4.
6 Independe a) Also called the P+Q redundancy Excellent/ poor Fair / poor Application
nt access scheme, is much like level 5, but requiring
stores extra redundant information to extremely high
guard against multiple disk failures. availability
b) P and Q are two different data
check algorithms. One of the two is
the exclusive-or calculation used in
level 4 and 5. The other one is an
independent data check algorithm.
Why are high-speed memories needed? Is the main memory not a high-speed
memory? The answer to the second question is definitely “No”, but why so? For this,
we have to go to the fundamentals of semiconductor technology, which is beyond the
scope of the Unit. Then if the memories are slower, then how slow are they? On an
average it has been found that the operating speed of main memories lack by a factor
24
of 5 to 10 than that of the speed of processors (such as CPU or Input / Output The Memory System
Processors).
In addition, each instruction requires several memory accesses (it may range from 2 to
7 or even more sometimes). If an instruction requires even 2 memory accesses, even
then almost 80% of the time of executing an expression, processors waits for memory
access.
Hardware researchers are taking care of the first point. Let us discuss some high speed
memories that are in existence at present.
The obvious question that arises is how the system can know in advance which data
and instruction are needed in present processing so as to make it available beforehand
in the cache. The answer to this question comes from a principle known as locality of
reference. According to this principle, during the course of execution of most
programs, memory references by the processor, for both instructions and data, tend to
cluster. That is, if an instruction is executed, there is a likelihood of the nearby
instruction being executed soon. Locality of reference is true not only for reference to
program instruction but also for references to data. As shown in Figure 14, the cache
memory acts as a small, fast-speed buffer between the processor and main memory.
25
Basic Computer
Organisation
Many computer systems are designed to have two separate cache memories called
instruction cache and data cache. The instruction cache is used for storing program
instruction and the data cache is used for storing data. This allows faster identification
of availability of accessed word in the cache memory and helps in further improving
the processor speed. Many computer systems are also designed to have multiple
levels of caches (such as level one and level two caches, often referred to as L1 and
L2 caches). L1 cache is smaller than L2 cache and is used to store more frequently
accessed instruction/data as compared to those in the L2 cache.
The use of cache memory requires several design issues to be addressed. Some key
design issues are briefly summarised below:
1. Cache Size: Cache memory is very expensive as compared to the main memory
and hence its size is normally kept very small. It has been found through
statistical studies that reasonably small caches can have a significant impact on
processor performance. As a typical example of cache size, a system having 1
GB of main memory may have about 1 MB of cache memory. Many of today’s
personal computers have 64KB, 128KB, 256KB, 512KB, or 1 MB of cache
memory.
2. Block Size: Block size refers to the unit of data (few memory words) exchanged
between cache and main memory. As the block size increases from very small to
larger size, the hit ratio (fraction of times that referenced instruction/data is found
in cache) will at first increase because of the principle of locality since more and
more useful words are brought into the cache. However, the hit ratio will begin to
decrease as the block size further increases because the probability of using the
newly fetched words becomes less than the probability of reusing the words that
must be moved out of the cache to make room for the new block. Based on this
fact, the block size is suitably chosen to maximise the hit ratio.
3. Replacement Policy: When a new block is to be fetched into the cache, another
may have to be replaced to make room for the new block. The replacement policy
decides which block to replace in such a situation. Obviously, it will be best to
replace a block that is least likely to be needed again in the near future.
4. Write Policy: If the contents of a block in the cache are altered, then it is
necessary to write it back to main memory before replacing it. The write policy
decides when the altered words of a block are written back to main memory. At
26
one extreme, an updated word of a block is written to the main memory as soon The Memory System
as such updates occur in the block. At the other extreme, all updated words of the
block are written to the main memory only when the block is replaced from the
cache. The latter policy minimises overheads of memory write operations but
temporarily leaves main memory in an inconsistent (obsolete) state.
The fundamental idea of cache organisation is that by keeping the most frequently
accessed instructions and data in the fast cache memory; hence the average memory
access time will approach the access time of the cache.
The basic operation of the cache is as follows. When the CPU needs to access
memory, the cache is examined. If the word addressed by the CPU is not found in the
cache, the main memory is accessed to read the word. A block of words is then
transferred from main memory to cache memory.
The average memory access time of a computer system can be improved considerably
by use of a cache. For example, if memory read cycle takes 100 ns and a cache read
cycle takes 20 ns, then for four continuous references, the first one brings the main
memory contents to cache and the next three from cache.
Thus, the closer are the reference, the better is the performance of cache.
The basic characteristic of cache memory is its fast access time. Therefore, very little
or no time must be wasted when searching for words in the cache. The transformation
of data from main memory to cache memory is referred to as a mapping process. The
mapping procedure for the cache organization is of three types:
1. Associative mapping
2. Direct mapping
3. Set-associative mapping
Main memory
32K×12 27
CPU
Cache
Basic Computer
Organisation
Size of main memory address (Given word size of 12 bits) = 32 K words = 2 15 words
# 15 bits are needed for address
Block Size of Cache = 2 Main Memory Words
For every word stored in cache, there is a duplicate copy in the main memory. The
CPU communicates with both memories. It first sends a 15 bits (32K =25 × 210 = 215)
address to cache. If there is a hit, the CPU uses the relevant 12 bits data from 24 bit
cache data. If there is a miss, the CPU reads the block containing the relevant word
from the main memory. So the key here is that a cache must store the address and data
portions of the main memory to ascertain whether the given information is available in
the cache or not. However, let us assume the block size as 1 memory word for the
following discussions.
Associative Mapping
The most flexible and fastest cache organization uses an associative memory which is
shown in Figure 16. The associative memory stores both the address and data of the
memory word. This permits any location in cache to store any word from the main
memory. The address value of 15 bits is shown as a five-digit octal number and its
corresponding 12 bits word is shown as a five digit octal number. A CPU address of
15 bits is placed in the argument register and the associative memory is searched for a
matching address. If the address is found, the corresponding 12 bits data is read and
sent to the CPU. If no matches are found, the main memory is accessed for the word.
The address-data pair is then transferred to the associative cache memory. This
address checking is done simultaneously for the complete cache in an associative way.
Argument register
Address Data
01001 3450
03767 7613
23245 1234
24250 2205
28
The Memory System
(All numbers are in octal)
The direct mapping cache organization uses the n-bit address to access the main
memory and k-bit index to access the cache. The internal organization of the words in
the cache memory is as shown in Figure 17. Each word in cache consists of the data
word and its associated tag. When a new word is first brought into the cache, the tag
bits are stored alongside the data bits. When the CPU generates a memory request, the
index field is used for the address to access the cache.
The tag field of the CPU address is compared with the tag in the word read from the
cache. If the two tags match, there is a hit and the desired data word is in cache. If
there is no match, there is a miss and the required word is read from the main
memory.
Let us consider a numerical example shown in Figure 18. The word at address zero is
at present stored in the cache (index = 000, tag = 00, data = 1456). Suppose that the
CPU wants to access the word at address 02000. The index address is 000, so it is
used to access the cache. The two tags are then compared. The cache tag is 00 but the
address tag is 02, which does not produce a match. Therefore, the main memory is
accessed and the data word 4254 is transferred to the CPU. The cache word at index
address 000 is then replaced with a tag of 02 and data of 4254.
29
Basic Computer
Organisation
Set-Associative Mapping
A third type of cache organization called set-associative mapping is an improvement
on the direct mapping organization in that each word of cache can store two or more
words of memory under the same index address. Each data word is stored together
with its tag and the number of tag data items in one word of cache is said to form a
set.
Let us consider an example of a set-associative cache organization for a set size of two
as shown in the Figure 19. Each index address refers to two data words and their
associated tags. Each tag requires six bits and each data word has 12 bits, so the word
length of cache is 2(6+12) = 36 bits. An index address of nine bits can accommodate
512 words. Thus, the size of cache memory is 512 " 36. In general, a Set-Associative
cache of set size K will accommodate K-words of main memory in each word of
cache.
Index Tag Data Tag Data
30
(a) Write through: Write the data in cache as well as main memory. The other The Memory System
CPUs - Cache combination has to watch with traffic to the main memory and
make suitable amendment in the contents of cache. The disadvantage of this
technique is that a bottleneck is created due to large number of accesses to the
main memory by various CPUs.
(b) Write block: In this method updates are made only in the cache, setting a bit
called Update bit. Only those blocks whose update bit is set is replaced in the
main memory. But here all the accesses to the main memory, whether from other
CPUs or input/output modules, need to be from the cache resulting in complex
circuitry.
(c) Instruction Cache: An instruction cache is one which is employed for accessing
only the instructions and nothing else. The advantage of such a cache is that as
the instructions do not change we need not write the instruction cache back to
memory, unlike data storage cache.
Figure 20 illustrates the memory interleaving architecture. The Figure shows a 4- way
(n=4) interleaved memory system.
31
Basic Computer
Organisation
Hardware Organization
The block diagram of an associative memory is shown in Figure 21. It consists of a
memory array and logic for m words with n bits per word. The argument register A
and key register K each have n bits, one for each bit of a word. The match register M
has m bits, one for each memory word. Each word in memory is compared in parallel
with the content of the argument register; the words that match the bits of the
argument register set a corresponding bit in the match register. After the matching
process, those bits in the match register that have been set indicate the fact that their
corresponding words have been matched. Reading is accomplished by a sequential
access to memory for those words whose corresponding bits in the match register have
been set.
The key register provides a mask for choosing a particular field or key in the
argument word. The entire argument is compared with each memory word if the key
register contains all 1s. Otherwise, only those bits in the argument that have 1s in their
corresponding positions of the key register are compared. Thus the key provides a
mask or identifying information, which specifies how reference to memory is made.
32
The Memory System
To illustrate with a numerical example, suppose that the argument register A and the
key register K have the bit configuration shown below. Only the three leftmost bits of
a compared with memory words because K has 1’s on these positions
A 101 111100
K 111 000000
Word 1 100 111100 no match
Word 2 101 000001 match
Word 2 matches the unmasked argument field because the three leftmost bits of the
argument and the word are equal.
3. How can the Cache memory and interleaved memory mechanisms be used to
improve the overall processing speed of a Computer system?
33
Basic Computer ……………………………………………………………………………………
Organisation
……………………………………………………………………………………
……………………………………………………………………………………
4. Assume a Computer having 64 word RAM (assume 1 word = 16 bits) and cache
memory of 8 blocks (block size = 32 bits). Where can we find Main Memory
Location 25 in cache if (a) Associative Mapping (b) Direct mapping and (c) 2
way set associative (2 blocks per set) mapping is used.
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
34
Consider a computer with a main-memory capacity of 64K words (K=1024). 16-bits The Memory System
are needed to specify a physical address in memory since 64K = 216. Suppose that the
computer has auxiliary memory for storing information equivalent to the capacity of
16 main memories. Let us denote the address space by N and the memory space by M,
we then have for this example N = 16 × 64 K = 1024K and M = 64K.
In a multiprogramming computer system, programs and data are transferred to and
from auxiliary memory and main memory based on demands imposed by the CPU.
Suppose that program 1 is currently being executed in the CPU. Program 1 and a
portion of its associated data are moved from secondary memory into the main
memory as shown in Figure 22. Portions of programs and data need not be in
contiguous locations in memory since information is being moved in and out, and
empty spaces may be available in scattered locations in memory.
In our example, the address field of an instruction code will consist of 20 bits but
physical memory addresses must be specified with only 16-bits. Thus CPU will
reference instructions and data with a 20 bits address, but the information at this
address must be taken from physical memory because access to auxiliary storage for
individual words will be prohibitively long. A mapping table is then needed, as shown
in Figure 23, to map a virtual address of 20 bits to a physical address of 16 bits. The
mapping is a dynamic operation, which means that every address is translated
immediately as a word is referenced by CPU.
Virtual address
Virtual
Address Memory Main memory
Main
Register Mapping Address
Memory
(20 bits) Table Register (16 bits)
A group of chips, typically 8 to 16, is mounted on a tiny printed circuit board and sold
as a unit. This unit is called a SIMM or DIMM depending on whether it has a row of
connectors on one side or both sides of the board.
A typical SIMM configuration might have 8 chips with 32 megabits (4MB) each on
the SIMM. The entire module then holds 32MB. Many computers have room for four
modules, giving a total capacity of 128MB when using 32MB SIMMs. The first
SIMMs had 30 connectors and delivered 8 bits at a time. The other connectors were
addressing and control. A later SIMM had 72 connectors and delivered 32 bits at a
time. For a machine like Pentium, which expected 64-bits at once, 72-connectors
SIMMs were paired, each one delivering half the bits needed.
A DIMM is capable of delivering 64 data bits at once. Typical DIMM capacities are
64MB and up. Each DIMM has 84 gold patted connectors on each side for a total of
168 connectors. SIMM and DIMM are shown in Figure 24 (a) and (b) respectively.
How they are put on a motherboard is shown in Figure 24 (c).
SIMM
DIMM
36
The Memory System
In a typical DRAM, the processor presents addresses and control levels to the
memory, indicating that a set of data at a particular location in memory should be
either read from or written into the DRAM. After a delay, the access time, the DRAM
either writes or reads the data during the access-time delay. The DRAM performs
various internal functions, such as activating the high capacitance of the row and
column lines, sensing the data and routing the data out through the output buffers. The
processor must simply wait through this delay, slowing system performance.
With synchronous access, the DRAM moves data in and out under control of the
system clock. The processor or other master issues the instruction and address
information, which is latched on to by the DRAM. The DRAM then responds after a
set number of clock cycles. Meanwhile, the master can safely do other tasks while the
SDRAM is processing the request.
The SDRAM employs a burst mode to eliminate the address setup time. In burst
mode, a series of data bits can be clocked out rapidly after the first bit has been
accessed. The mode is useful when all the bits to be accessed are in sequence and in
the same row of the array as the initial access. In addition, the SDRAM has a
multiple-bank internal architecture that improves opportunities for on-chip
parallelism.
37
Basic Computer The mode register and associated control logic is another key feature differentiating
Organisation
SDRAMs from conventional DRAMs. It provides a mechanism to customize the
SDRAM to suit specific system needs. The mode register specifies the burst length,
which is the number of separate units of data synchronously fed onto the bus. The
register also allows the programmer to adjust the latency between receipt of a read
request and the beginning of data transfer.
The SDRAM performs best when it is transferring large blocks of data serially, such
as for applications like word processing, spreadsheets, and multimedia.
The special RDRAM bus delivers address and control information using an
asynchronous block-oriented protocol. After an initial 480 ns access time, this
produces the 1.6 GBps data rate. The speed of RDRAM is due to its high speed Bus.
Rather than being controlled by the explicit RAS CAS R/W, and CE signals used in
conventional DRAMs an RDAR gets a memory request over the high-speed bus. This
request contains the desired address, the type of operation and the number of bytes in
the operation.
The SRAM on the CDRAM can also be used as a buffer to support the serial access of
a block of data. For example, to refresh a bit-mapped screen, the CDRAM can
prefetch the data from the DRAM into the SRAM buffer. Subsequent accesses to chip
result in accesses solely to the SRAM.
38
The Memory System
1.9 SUMMARY
In this unit, we have discussed the details of the memory system of the computer.
First we discussed the concept and the need of the memory hierarchy. Memory
hierarchy is essential in computers as it provides an optimised low-cost memory
system. The unit also covers details on the basic characteristics of RAMs and different
kinds of ROMs. These details include the logic diagrams of RAMs and ROMs giving
basic functioning through various control signals. We have also discussed the latest
secondary storage technologies such as CD-ROM, DVD-ROM, CD-R, CD-RW etc.
giving details about their data formats and access mechanisms.
6. 4K bytes is actually 4 " 1024 = 4096 bytes and the DRAM holds 4096 eight bit
words. Each word can be thought of as being stored in an 8 bit register and there
are 4096 registers connected to a common data bus internal to the chip. Since
4096 = (64)2, the registers are arranged in a 64 " 64 array, that is there are 64=26
rows and 64=26 columns. This requires a 6 " 64 decoder to decode six- address
inputs for the row select and a second 6 " 64 decoder to decode six other address
39
Basic Computer inputs for the column select. Using the structure as shown in Figure 3 (b), it
Organisation
requires only 6 bit address input.
While in the case of an old RAM, the chip requires 12 address lines ( Please refer
to Figure 2(b)), since 4096 = 212 and there are 4096 different addresses.
2. a) True
b) True
c) False
d) True
e) False
3. The Cache memory is a very fast, small memory placed between CPU and main
memory whose access time is closer to the processing speed of the CPU. It acts
as a high-speed buffer between CPU and main memory and is used to
temporarily store data and instruction needed during current processing. In
memory interleaving, the main memory is divided into n number of equal size
modules. When a program is loaded in to the main memory, its successive
instruction in also available for the CPU, thus, it avoids memory access after each
instruction execution and the total time speeds up.
40
Following figure gives the details of above schemes. The Memory System
Memory Address 0 1 1 0 0 1
# Memory Address = 25
Block Address # Block Address = 12 and
0 1 1 0 0 1
. Block offset = 1
Cache Address # Tag = 1; Index = 4 and
0 1 1 0 0 1
. Block offset = 1
Please note that a main memory address 24 would have a block offset as 0.
The Tag is used here to check whether a given address is in a specified set. This
cache has 2 blocks per set, thus, the name two way set associative cache. The
total number of sets here is 8 / 2 = 4.
For Associative mapping the Block address is checked directly in all location of
cache memory.
The latest SIMMs and DIMMs are capable of delivering 64 data bits at once.
Each DIMM has 84 gold patted connectors on each side for a total of 168
connectors while each SIMM 72 connectors.
2. The virtual address is 1 GB = 230, thus, 30 bit Virtual address, that will be
translated to physical memory address of 26 bits (64 Mega words = 226 ).
42
The Input / Output
UNIT 2 THE INPUT/OUTPUT SYSTEM System
2.0 INTRODUCTION
In the previous Unit, we have discussed the memory system for a computer system
that contains primary memory, secondary memory, high speed memory and their
technologies; the memory system of micro-computers i.e., their chips and types of
memory. Another important component in addition to discussing the memory system
will be the input/output system. In this unit we will discuss Input /Output controllers,
device drivers, the structure of I/O interface, the I/O techniques. We will also discuss
about the Input / Output processors which were quite common in mainframe
computers.
2.1 OBJECTIVES
At the end of this unit you should be able to:
Registers
Video Keyboard
Processor
EISA
VRAM
Mouse FDD
Display
Device USB & other I/O Buses
Scanner
Mouse Digital
camera SCSI
Additional Primary
RAM/ROM HDD
LAN/Network
The microcomputer has a single microprocessor, a number of RAM and ROM chips
and an interface units communicates with various external devices through the I/O
Bus.
The Input / Output subsystem of a computer, referred to as I/O, provides an efficient
mode of communication between the central system and the output environment.
External devices that are under the direct control of the computers are said to be
connected on-line. These devices are designed to read information into or out of the
memory unit upon command from the CPU and are considered to be part of the
computer system. Input / Output devices attached to the computer are also called
peripherals. We can broadly classify peripherals or external devices into 3 categories:
! Human readable: suitable for communicating with the computer user, e.g., video
display terminals (VDTs) & printers.
! Machine-readable: suitable for communicating with equipment, e.g., magnetic
disks and tape system.
! Communication: suitable for communicating with remote devices, e.g., terminal,
a machine-readable device.
44
communication link is to resolve the differences that exist between the central The Input / Output
System
computer and each peripheral. The major differences are:
1. The processor enquires from the I/O interface to check the status of the attached
device. The status can be busy, ready or out of order.
2. The I/O interface returns the device status.
3. If the device is operational and ready to transmit, the processor requests the
transfer of data by means of a command, which is a binary signal, to the I/O
interface.
4. The I/O interface obtains a unit of data (e.g., 8 or 16 bits) from the external
device.
5. The data is transferred from the I/O interface to the processor.
45
Basic Computer 1. Commands such as READ SECTOR, WRITE SECTOR, SEEK track number
Organisation
and SCAN record-id sent over the control bus.
2. Data that are exchanged between the processor and I/O interface sent over the
data bus.
3. Status: As peripherals are so slow, it is important to know the status of the I/O
interface. The status signals are BUSY or READY or in an error condition from
I/O interface.
4. Address recognition as each word of memory has an address, so does each I/O
device. Thus an I/O interface must recognize one unique address for each
peripheral it controls.
46
! Each I/O device is linked through a hardware interface called I/O Port. The Input / Output
System
! Single and Multi-port device controls single or multi-devices.
! The communication between I/O controller and Memory is through bus only in
case of Direct Memory Access (DMA), whereas the path passes through the CPU
for such communication in case of non-DMA.
CPU Memory
Device Device
Controller Controller
(Multi-port) (Single-port)
Using device controllers for connecting I/O devices to a computer system instead of
connecting them directly to the system bus has the following advantages:
! A device controller can be shared among multiple I/O devices allowing many I/O
devices to be connected to the system.
! I/O devices can be easily upgraded or changed without any change in the
computer system.
! I/O devices of manufacturers other than the computer manufacturer can be easily
plugged in to the computer system. This provides more flexibility to the users in
buying I/O devices of their choice.
! There is a need of I/O logic, which should interpret and execute dialogue
between the processor and I/O interface. Therefore, there need to be control lines
between processors and I/O interface.
! The data line connecting I/O interface to the system bus must exist. These lines
serve the purpose of data transfer.
! Data registers may act as buffer between processor and I/O interface.
! The I/O interface contains logic specific to the interface with each device that it
controls.
47
Basic Computer
Organisation
Figure 3 above is a typical diagram of an I/O interface which in addition to all the
registers as defined above has status/control registers which are used to pass on the
status information or the control information.
In UNIX the device drivers are usually linked onto the object code of the kernel (the
core of the operating system). This means that when a new device is to be used, which
was not included in the original construction of the operating system, the UNIX kernel
has to be re-linked with the new device driver object code. This technique has the
advantages of run-time efficiency and simplicity, but the disadvantage is that the
addition of a new device requires regeneration of the kernel. In UNIX, each entry in
the /dev directory is associated with a device driver which manages the
communication with the related device. A list of some device names is as shown
below:
48
Device name Description The Input / Output
System
/dev/console system console
/dev/tty01 user terminal 1
/dev/tty02 user terminal 2
/dev/lp line printer
/dev/dsk/f03h 1.44 MB floppy drive
In MS-DOS, device drivers are installed and loaded dynamically, i.e., they are loaded
into memory when the computer is started or re-booted and accessed by the operating
system as required. The technique has the advantage that it makes addition of a new
driver much simpler, so that it could be done by relatively unskilled users. The
additional merit is that only those drivers which are actually required need to be
loaded into the main memory. The device drivers to be loaded are defined in a special
file called CONFIG.SYS, which must reside in the root directory. This file is
automatically read by MS-DOS at start-up of the system, and its contents acted upon.
A list of some device name is as shown below:
Device name Description
con: keyboard/screen
com1: serial port1
com2: serial port2
lpt1: printer port1
A: first disk drive
C: hard disk drive
In the Windows system, device drivers are implemented as dynamic link libraries
(DLLs). This technique has the advantages that DLLs contains shareable code which
means that only one copy of the code needs to be loaded into memory. Secondly, a
driver for a new device can be implemented by a software or hardware vendor without
the need to modify or affect the Windows code, and lastly a range of optional drivers
can be made available and configured for particular devices.
In the Windows system, the idea of Plug and Play device installation is required to
add a new device such as a CD drive, etc. The objective is to make this process largely
automatic; the device would be attached and the driver software loaded. Thereafter,
the installation would be automatic; the settings would be chosen to suit the host
computer configuration.
49
Basic Computer 3. What is a device driver? Differentiate between device controller and device
Organisation
drivers.
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
! Programmed input/output
! Interrupt driven input/output
! Direct memory access
In programmed I/O, the I/O operations are completely controlled by the processor.
The processor executes a program that initiates, directs and terminate an I/O
operation. It requires a little special I/O hardware, but is quite time consuming for the
processor since the processor has to wait for slower I/O operations to complete.
With interrupt driven I/O, when the interface determines that the device is ready for
data transfer, it generates an interrupt request to the computer. Upon detecting the
external interrupt signal, the processor stops the task it is processing, branches to a
service program to process the I/O transfer, and then returns to the task it was
originally performing which results in the waiting time by the processor being
reduced.
With both programmed and interrupt-driven I/O, the processor is responsible for
extracting data from the main memory for output and storing data in the main memory
during input. What about having an alternative where I/O device may directly store
data or retrieve data from memory? This alternative is known as direct memory access
(DMA). In this mode, the I/O interface and main memory exchange data directly,
without the involvement of processor.
50
The Input / Output
System
Interrupt
Read status Read status Read status Interrupt
Not ready of I/O I/O CPU of I/O I/O CPU of DMA
(try again) Interface interface interface DMA CPU
Next instruction
Check Issue Check Issue error
Status Error Status Condition
Condition
Not ready
Ready Ready
is not
possible
No No
Completed? Completed?
With the programmed I/O method, the responsibility of the processor is to constantly
check the status of the I/O device to check whether it is free or it has finished
inputting the data. Thus, this method is very time consuming where the processor
wastes a lot of time in checking and verifying the status of an I/O device. Figure 5(a)
gives an example of the use of programmed I/O to read in a block of data from a
peripheral device into memory.
51
Basic Computer I/O Commands
Organisation
There are four types of I/O commands that an I/O interface may receive when it is
addressed by a processor:
! Control: These commands are device specific and are used to provide specific
instructions to the device, e.g. a magnetic tape requiring rewinding and moving
forward by a block.
! Test: This command checks the status such as if a device is ready or not or is in
error condition.
! Read: This command is useful for input of data from input device.
! Write: this command is used for output of data to output device.
I/O Instructions:
An I/O instruction is stored in the memory of the computer and is fetched and
executed by the processor producing an I/O-related command for the I/O interface.
With programmed I/O, there is a close correspondence between the I/O-related
instructions and the I/O commands that the processor issues to an I/O interface to
execute the instructions.
In systems with programmed I/O, the I/O interface, the main memory and the
processors normally share the system bus. Thus, each I/O interface should interpret
the address lines to determine if the command is for itself. There are two methods for
doing so. These are called memory-mapped I/O and isolated I/O.
With memory-mapped I/O, there is a single address space for memory locations and
I/O devices. The processor treats the status and data registers of I/O interface as
memory locations and uses the same machine instructions to access both memory and
I/O devices. For a memory-mapped I/O only a single read and a single write line are
needed for memory or I/O interface read or write operations. These lines are activated
by the processor for either memory access or I/O device access. Figure 6 shows the
memory-mapped I/O system structure.
52
The interrupt-driven I/O mechanism for transferring a block of data is shown in Figure
5(b). Please note that after issuing a read command (for input) the CPU goes off to do
other useful work while I/O interface proceeds to read data from the associated
device. On the completion of an instruction cycle, the CPU checks for interrupts
(which will occur when data is in data register of I/O interface and it now needs
CPU’s attention). Now CPU saves the important register and processor status of the
executing program in a stack and requests the I/O device to provide its data, which is
placed on the data bus by the I/O device. After taking the required action with the
data, the CPU can go back to the program it was executing before the interrupt.
2.6.3 Interrupt-Processing
The occurrence of an interrupt fires a numbers of events, both in the processor
hardware and software. Figure 8 shows a sequence.
53
Basic Computer
Organisation
When an I/O device completes an I/O operation, the following sequence of hardware
events occurs:
55
Figure 9: Interrupt Handling
Basic Computer Thus, interrupt handling involves interruption of the currently executing program,
Organisation
execution of interrupt servicing program and restart of interrupted program from the
point of interruption.
1) How does the processor determine which device issued the interrupt?
2) If multiple interrupts have occurred, how does the processor decide which one to
be processed first?
To solve these problems, four general categories of techniques are in common use:
An example of an interrupt vector can be a personal computer, where there are several
IRQs (Interrupt request) for a specific type of interrupt.
! Which operations (read or write) to be performed, using the read or write control
lines.
! The address of I/O devices, which is to be used, communicated on the data lines.
! The starting location on the memory where the information will be read or
written to be communicated on the data lines and is stored by the DMA interface
in its address register.
! The number of words to be read or written is communicated on the data lines and
is stored in the data count register.
The DMA interface transfers the entire block of data, one word at a time, directly to or
from memory, without going through the processor. When the transfer is complete,
the DMA interface sends an interrupt signal to the processor. Thus, in DMA the
processor involvement can be restricted at the beginning and end of the transfer,
which can be shown as in the figure above. But the question is when should the DMA
take control of the bus?
57
Basic Computer
Organisation
The DMA mechanism can be configured into a variety of ways. Some possibilities are
shown below in Figure 12(a), in which all interfaces share the same system bus. The
DMA acts as the supportive processor and can use programmed I/O for exchanging
data between memory and I/O interface through DMA interface. But once again this
spoils the basic advantage of DMA not using extra cycles for transferring information
from memory to/from DMA and DMA from/to I/O interface.
58
Figure 12: DMA Configuration
The Figure 12(b) configuration suggests advantages over the one shown above. In The Input / Output
System
these systems a path is provided between I/O interface and DMA interface, which
does not include the system bus. The DMA logic may become part of an I/O interface
and can control one or more I/O interfaces. In an extended concept an I/O bus can be
connected to this DMA interface. Such a configuration (shown in Figure 12 (c)) is
quite flexible and can be extended very easily. In both these configurations, the added
advantage is that the data between I/O interface and DMA interface is transferred off
the system bus, thus eliminating the disadvantage we have witnessed for the first
configuration.
1. Which of the I/O techniques does not require an Interrupt Signal? Is this
technique useful in Multiprogramming Operating Systems? Give reason.
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
2. What are the techniques of identifying the device that has caused the Interrupt?
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
3. What are the functions of I/O interface? What is DMA?
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
4 State True or False:
a) Daisy chain provides software poll. T/F
b) I/O mapped I/O scheme requires no additional lines from CPU to I/O device
except for the system bus. T/F
c) Most of the I/O processors have their own memory while a DMA module
does not have its own memory except for a register or a simple buffer area.
T/F
d) The advantage of interrupt driven I/O over programmed I/O is that in the
first the interrupt mechanisms free I/O devices quickly. T/F
With the last two steps (4 and 5), a major change occurs with the introduction of the
concept of an I/O interface capable of executing a program. For steps 5, the I/O
interface is often referred to as an I/O channel and I/O processor.
A multiplexer channel can handle I/O with multiple devices at the same time. If the
devices are slow then byte multiplexer is used. Let us explain this with an example. If
we have three slow devices which need to send individual bytes as:
X1 X2 X3 X4 X5 ……
Y1 Y2 Y3 Y4 Y5……
Z1 Z2 Z3 Z4 Z5……
In serial interface only one line is used to transmit data, therefore only one bit is
transferred at a time. Serial printers are used for serial printers and terminals. With a
new generation of high-speed serial interfaces, parallel interfaces are becoming less
common.
In both cases, the I/O interface must engage in a dialogue with the peripheral. The
dialogue for a read or write operation is as follows:
The connection between an I/O interface in a computer system and external devices
can be either point-to-point or multipoint. A point-to-point interface provides a
dedicated line between the I/O interface and the external device. For example
keyboard, printer and external modems are point-to-point links. The most common
serial interfaces are RS-232C and EIA-232.
A multipoint external interface used to support external mass storage devices (such as
disk and tape drives) and multimedia devices (such as CD-ROM, video, audio).
61
Basic Computer Check Your Progress 3
Organisation
1. What is the need of I/O channels?
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
2. What is the need of external Communication Interfaces?
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
2.9 SUMMARY
This unit is totally devoted to the I/O of computer system. In this unit we have
discussed the identification of I/O interface, description of I/O techniques such as
programmed I/O, interrupt-driven I/O and direct memory access. These techniques are
useful for increasing the efficiency of the input-output transfer process. The concepts
of device drivers for all types of operating systems and device controllers are also
discussed with this unit. We have also defined an input/output processor, the external
communication interfaces such as serial and parallel interfaces and interrupt
processing. The I/O processors are the most powerful I/O interfaces that can execute
the complete I/O instructions. You can always refer to further reading for detail
design.
2. (a) False (b) True (c) True (d) True (e) True (f) False (g) True (h) False
62
Check Your Progress 2 The Input / Output
System
1. The technique Programmed I/O does not require an Interrupt. It is very inefficient
for Multiprogramming environment as the processor is busy waiting for the I/O
to complete, while this time would have been used for instruction execution of
other programs.
! Multiple Interrupt Lines: Having separate line for a device, thus direct
recognition.
! Software Poll: A software driven roll call to find from devices whether it has
made an interrupt request.
! Daisy Chain: A hardware driven passing the buck type signal that moves
through the devices connected serially. The device on receipt of signal on his
turn, if has interrupt informs its address.
! Bus Arbitration: In this scheme, the I/O interface requests for control of the
Bus. This is a common process when I/O processors are used.
DMA is an I/O technique that minimises the CPU intervention at the beginning
and end of a time consuming I/O. One, commonplace where DMA is used is
when I/O is required from a Hard Disk, since one single I/O request requires a
block of data transfer which on the average may take a few milliseconds. Thus,
DMA will free CPU to do other useful tasks while I/O is going on.
4. a) False
b) False
c) True
d) False
1. The I/O channels were popular in older mainframes, which included many I/O
devices and I/O requests from many users. The I/O channel takes control of all
I/O instructions from the main processor and controls the I/O requests. It is
mainly needed in situations having many I/O devices, which may be shared
among multiple users.
2. The external interfaces are the standard interfaces that are used to connect third
party or other external devices. The standardization in this area is a must.
63
Basic Computer
Organisation UNIT 3 SECONDARY STORAGE
TECHNIQUES
Structure Page No.
3.0 Introduction 64
3.1 Objectives 64
3.2 Secondary Storage Systems 65
3.3 Hard Drives 65
3.3.1 Characteristics: Drive Speed, Access Time, Rotation Speed
3.3.2 Partitioning & Formatting: FAT, Inode
3.3.3 Drive Cache
3.3.4 Hard Drive Interface: IDE, SCSI, EIDE, Ultra DMA & ATA/66
3.4 Removable Drives 72
3.4.1 Floppy Drives
3.4.2 CD-ROM & DVD-ROM
3.5 Removable Storage Options 75
3.5.1 Zip, Jaz & Other Cartridge Drives
3.5.2 Recordable CDs & DVDs
3.5.3 CD-R vs CD-RW
3.5.4 Tape Backup
3.6 Summary 78
3.7 Solutions /Answers 78
3.0 INTRODUCTION
In the previous units of this block, we have discussed the primary memory system,
high speed memories, the memory system of microcomputer, and the input/output
interfaces and techniques for a computer. In this unit we will discuss the secondary
storage devices such as magnetic tapes, magnetic disks and optical disks, also known
as backing storage devices. The main purpose of such a device is that it provides a
means of retaining information on a permanent basis. The main discussion provides
the characteristics of hard-drives, formatting, drive cache, interfaces, etc. The detailed
discussion on storage devices is being presented in the Unit. The storage technologies
have moved a dimension from very small storage devices to Huge Giga byte
memories. Let us also discuss some of the technological achievements that made such
a technology possible.
3.1 OBJECTIVES
Storage is the collection of places where long-term information is kept. At the end of
the unit you will be able to:
! describe the characteristics of the different secondary storage drives, i.e., their
drive speed, access time, rotation speed, density etc.;
! describe the low-level and high level formatting of a blank disk and also the use
of disk partitioning;
! distinguish among the various types of drives, i.e., hard drives , optical drives
removable drives and cartridge drive; and
! define different type of disk formats.
64
Secondary Storage
3.2 SECONDARY STORAGE SYSTEMS Techniques
As discussed in Block 2 Unit 1, there are several limitations of primary memory such
as limited capacity, that is, it is not sufficient to store a very large volume of data; and
volatility, that is, when the power is turned off the data stored is lost. Thus, the
secondary storage system must offer large storage capacities, low cost per bit and
medium access times. Magnetic media have been used for such purposes for a long
time. Current magnetic data storage devices take the form of floppy disks and hard
disks and are used as secondary storage devices. But audio and video media, either in
compressed form or uncompressed form, require higher storage capacity than the
other media forms and the storage cost for such media is significantly higher.
Optical storage devices offer a higher storage density at a lower cost. CD-ROM can be
used as an optical storage device. Many software companies offer both operating
system and application software on CD-ROM today. This technology has been the
main catalyst for the development of multimedia in computing because it is used in
the multimedia external devices such as video recorders and digital recorders (Digital
Audio Tape) which can be used for the multimedia systems.
Removable disk, tape cartridges are other forms of secondary storage devices are used
for back-up purposes having higher storage density and higher transfer rate.
! the disk and read/write heads are enclosed in a sealed airtight unit;
! the disk(s) spin at a high speed, one such speed may be 7200 revolutions per
minute;
! the read/write head do not actually touch the disk surface;
! the disk surface contains a magnetic coating;
! the data on disk surface (platter) are arranged in the series of concentric rings.
Each ring is called a track, is subdivided into a number of sectors, each sector
holding a specific number of data elements called bytes or characters.
! The smallest unit that can be written to or read from the disk is a sector. The
storage capacity of the disk can be determined as the number of tracks, number of
sectors, byte per sector and number of read/write heads.
65
Basic Computer
Organisation
Bad Blocks: The drive maintains an internal table which holds the sectors or tracks
which cannot be read or written to because of surface imperfections. This table is
called the bad block table and is created when the disk surface is initially scanned
during a low-level format.
Sector Interleave: This refers to the numbering of the sectors located in a track. A
one to one interleave has sectors numbered sequentially 0,1,2,3,4 etc. The disk drive
rotates at a fixed speed 7200 RPM, which means that there is a fixed time interval
between sectors. A slow computer can issue a command to read sector 0, storing it in
an internal buffer. While it is doing this, the drive makes available sector 1 but the
computer is still busy storing sector 0. Thus the computer will now have to wait one
full revolution till sector 1 becomes available again. Renumbering the sectors like
0,8,1,9,2,10,3,11 etc., gives a 2:1 interleave. This means that the sectors are alternated,
giving the computer slightly more time to store sectors internally than previously.
Drive Speed: The amount of information that can be transferred in or out of the
memory in a second is termed as disk drive speed or data transfer rate. The speed of
the disk drive depends on two aspects, bandwidth and latency.
! Bandwidth: The bandwidth can be measured in bytes per second. The sustained
bandwidth is the average data rate during a large transfer, i.e., the number of
bytes divided by the transfer time. The effective bandwidth is the overall data
rate provided by the drive. The disk drive bandwidth ranges from less than 0.25
megabytes per second to more than 30 megabytes per second.
! Access latency: A disk access simply moves the arm to the selected cylinder and
waits for the rotational latency, which may take less than 36ms. The latency
66
depends upon the rotation speed of the disk which may be anywhere from 300 Secondary Storage
Techniques
RPM to 7200 RPM. An average latency of a disk system is equal to half the time
taken by the disk to rotate once. Hence, the average latency of a disk system
whose rotation speed is 7200 RPM will be 0.5 / 7200 minutes = 4.1 ms.
Rotation Speed: This refers to the speed of rotation of the disk. Most hard disks
rotate at 7200 RPM (Revolution per Minute). To increase data transfer rates, higher
rotation speeds, or multiple read/write heads arranged in parallel or disk arrays are
required.
Access Time: The access time is the time required between the requests made for a
read or write operation till the time the data are made available or written at the
requested location. Normally it is measured for read operation. The access time
depends on physical characteristics and access mode used for that device.
! Seek Time: The seek time is the time for the disk arm to move the heads to the
cylinder containing the desired sector.
! Latency Time: The latency time is the additional time waiting for the disk to
rotate the desired sector to the disk head.
The sums of average seek and latency time is known as the average access time.
For example, we can run both Windows and Linux operating systems from the same
storage of the PC.
A new magnetic disk is just platters of a magnetic recording material. Before a disk
can store data, it must be divided into sectors that the disk controller can read and
write. This is called low level formatting. Low level formatting fills the disk with a
special data structure for each sector, which consists of a header, a data area, and a
trailer. The low level formatting is placing track and sector information plus bad block
tables and other timing information on the disks. Sector interleave can also be
specified at this time.
In any disk system, space at some time in use will become unwanted and hence will
be ‘free’ for another application. The operating system allocates disk space on demand
by user programs. Generally, space is allocated in units of fixed size called an
allocation unit or a cluster, which is a simple multiple of the disk physical sector size,
usually 512 bytes. The DOS operating system forms a cluster by combining two or
more sectors so that the smallest unit of data access from a disk becomes a cluster, not
a sector. Normally, the size of the cluster can range from 2 to 64 sectors per cluster.
High level formatting involves writing directory structures and a map of free and
allocated space (FAT or INODE) to the disk. Often this also means transferring the
boot file for the operating system onto the hard disks.
67
Basic Computer FAT and Inode
Organisation
The FAT maps the usage of data space of the disk. It contains information about the
space used by each individual file, the unused disk space and the space that is
unusable due to defects in the disk. Since FAT contains vital information, two copies
of FAT are stored on the disk, so that in case one gets destroyed, the other can be
used. A FAT entry can contain any of the following:
! unused cluster
! reserved cluster
! bad cluster
! last cluster in file
! next cluster number in the file.
The DOS file system maintains a table of pointers called FAT (File allocation table)
which consists of an array of 16-bit values. There is one entry in the FAT for each
cluster in the file area, i.e., each entry of the FAT (except the two) corresponds to one
cluster of disk space. If the value in the FAT entry doesn’t mark an unused, reserved
or defective cluster, then the cluster corresponding to the FAT entry is part of a file
and the value in the FAT entry would indicate the next cluster in the file.
The first two entries (0 & 1) in FAT are reserved for use by the operating system.
Therefore, the cluster number 2 corresponds to the first cluster in the data space of the
disk. Prior to any data being written on to the disk, the FAT entries are all set to zero
indicating a ‘free’ cluster .The FAT chain for a file ends with the hexadecimal value,
i.e., FFFF. The FAT structure can be shown as in Figure 2 below.
Limitation of FAT16: The DOS designers decided to use clusters with at least four
sectors in them (thus a cluster size of at least 2KB) for all FAT16 hard disks. That size
suffices for any hard disk with less than a 128MB total capacity. The largest logical
disk drives that DOS can handle comfortably have capacities up to 2GB. For such a
large volume, the cluster size is 32KB. This means that even if a file contains only a
single byte of data, writing it to the disk uses one entire 32KB region of the disk,
making that area unavailable for any other file’s data storage.
The most recent solution to these large-disk problems was introduced by Microsoft in
its OSR2 release of Windows 95 and it was named FAT32. The cluster entry for
FAT32 uses 32-bit numbers. The minimum size for a FAT32 volume is 512MB.
Microsoft has reserved the top four bits of every cluster number in a FAT32 file
68
allocation table. That means there are only 28-bits for the cluster number, so the Secondary Storage
Techniques
maximum cluster number possible is 268,435,456.
In the UNIX system, the information related to all these fields is stored in an Inode
table on the disk. For each file, there is an inode entry in the table. Each entry is made
up of 64 bytes and contains the relevant details for that file. These details are:
The disk caching technique can be used to speed up the performance of the disk drive
system. A set (cache) of buffers is allocated to hold a number of disk blocks which
have been recently accessed. In effect, the cached blocks are in memory copies of the
disk blocks. If the data in a cache buffer memory is modified, only the local copy is
updated at that time. Hence processing of the data takes place using the cached data
avoiding the need to frequently access the disk itself.
The main disadvantage of the system using disk caching is risking loss of updated
information in the event of machine failures such as loss of power. For this reason, the
system may periodically flush the cache buffer in order to minimize the amount of
loss.
The disk drive cache is essentially two-dimensional-all the bits are out in the open.
3.3.4 Hard Drive Interface: IDE, SCSI, EIDE, Ultra DMA and
ATA/66
Secondary storage devices need a controller to act as an intermediary between the
device and the rest of the computer system. On some computers, the controller is an
integral part of the computer’s main motherboard. On others, the controller is an
expansion board that connects to the system bus by plugging into one of the
computer’s expansion slots. In order that devices manufactured by independent
vendors can be used with different computer manufacturers, it is important that the
controllers follow some drive interfacing standard. Following are the commonly used
drive interface standards:
As shown in Figure 3, a SCSI controller connects directly to the computer bus on one
side and controls another bus (called SCSI bus) on the other side. Since the SCSI
controller is connected to the computer’s bus on one side and to the SCSI bus on the
other side, it can communicated with the processor and memory and can also control
the devices connected to the SCSI bus. The SCSI bus is a bus designed for connecting
devices to a computer in a uniform way.
These drives have fast access time and high data rates but are expensive. One
advantage of these drives is that a single SCSI controller can communicate
simultaneously with up to seven 16-bit SCSI devices or up to 15 Wide or Ultra-Wide
devices. Each device must be assigned a unique SCSI identification between 0 and 7
(or 15).
! The SCSI-1 calls for a cable with 8 data wires plus one for parity.
! The SCSI-2 enables the use of multiple cables to support 16- or even 32-bit data
transfers in parallel.
70
! The SCSI-3 enables the use of multiple cables to support 32- or even 64-bit data Secondary Storage
Techniques
transfers in parallel.
! With fast SCSI, it is possible to transfer 40MB of data per second on a single
SCSI cable.
Modern EIDE interfaces enable much faster communication. The speed increases
due to improvements in the protocol that describes how the clock cycles will be
used to address devices and transfer data. The modern EIDE hard drives are Ultra
DMA and ATA/66.
! Ultra DMA or ATA/33 (AT Attachment): The ATA standard is the formal
specification for how IDE and EIDE interfaces are supposed to work with hard
drives. The ATA33 enables up to 33.3 million bytes of data to be transferred each
second, hence the name ATA33.
1. The seek time of a disk is 30ms. It rotates at the rate of 30 rotations per sec. Each
track has 300 sectors. What is the access time of the disk?
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
2. Calculate the number of entries required in the FAT table using the following
parameters for an MS-DOS system:
Disk capacity 30MB
Block size 512 bytes
Blocks/cluster 4
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
3. What are the purposes of using SCSI, EISA, ATA, IDE?
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
A floppy is about 0.64 mm thick and is available in diameters 5.25 inch and 3.5 inch.
The data are organized in the form of tracks and sectors. The tracks are numbered
sequentially inwards, with the outermost being 0. The utility of index hole is that
when it comes under a photosenser, the system comes to know that the read/write
head is now positioned on the first sector of the current track. The write-protect notch
is used to protect the floppy against deletion of recorded data by mistake.
The data in a sector are stored as a series of bits. Once the required sector is found, the
average data transfer rate in bytes per second can be computed by the formula:
5.25 360KB 40 9
5.25 1.2MB 80 15
3.5 720KB 40 18
72
3.5 1.44MB 80 18 Secondary Storage
Techniques
1. CD-ROM (Compact Disk Read Only Memory): This technology has evolved
out of the entertainment electronics market where cassette tapes and long playing
records are being replaced by CDs. The term CD used for audio records stands
for Compact Disk. The disks used for data storage in digital computers are
known as CD-ROM, whose diameter is 5.25 inches. It can store around 650MB.
Information in CD-ROM is written by creating pits on the disk surface by shining
a laser beam. As the disk rotates the laser beam traces out a continuous spiral.
The focused beam creates a circular pit of around 0.8-micrometer diameter
wherever a 1 is to be written and no pits (also called a land) if a 0 is to be written.
Figure 5 shows the CD-ROM & DVD-ROM.
73
Basic Computer The CD-ROM with pre-recorded information is read by a CD-ROM reader which
Organisation
uses a laser beam for reading. It is rotated by a motor at a speed of 360 RPM. A
laser head moves in and out to the specified position. As the disk rotates the head
senses pits and land. This is converted to 1s and 0s by the electronic interface and
sent to the computer. The disk speed of CD-ROM is indicated by the notation nx,
where n is an integer indicating the factor by which the original speed of
150KB/s is to be multiplied. It is connected to a computer by SCSI and IDE
interfaces. The major application of CD-ROM is in distributing large text, audio
and video. For example, the entire Encyclopedia could be stored in one CD-
ROM. A 640MB CD-ROM can store 74 min. of music.
2. DVD-ROM (Digital Versatile Disk Read Only Memory): DVD-ROM uses the
same principle as a CD-ROM for reading and writing. However, a smaller
wavelength laser beam is used. The total capacity of DVD-ROM is 8.5GB. In
double-sided DVD-ROM two such disks are stuck back to back which allows
recording on both sides. This requires the disk to be reversed to read the reverse
side. With both side recording and with each side storing 8.5GB the total
capacity is 17GB.
In both CD-ROMs and DVD-ROMs, the density of data stored is constant throughout
the spiral track. In order to obtain a constant readout rate the disk must rotate faster,
near the center and slower at the outer tracks to maintain a constant linear velocity
(CLV) between the head and the CD-ROM/DVD-ROM platter. Thus CLV disks are
rotated at variable speed. Compare it with the mechamism of constant angular
velocity (CAV) in which disk is rotated at a constant speed. Thus, in CAV the density
of information storage on outside sectors is low.
The main advantage of having CAV is that individual blocks of data can be accessed
at semi-random mode. Thus the head can be moved from its current location to a
desired track and one waits for the specific sector to spin under it.
The main disadvantage of CAV disk is that a lot of storage space is wasted, since the
longer outer tracks are storing the data only equal to that of the shorter innermost
track. Because of this disadvantage, the CAV method is not recommended for use on
CD ROMs and DVD-ROMs.
Comparison of CD-ROM and DVD-ROM
Speed 1x 150KB/s
1.38MB/s
Jaz Drive: The Jaz drive is a popular drive with 2GB and unleashes the creativity of
professionals in the graphic design and publishing, software development, 3D
CAD/CAM, enterprise management systems and entertainment authorizing markets
by giving them unlimited space for dynamic digital content. It has an impressive
sustained transfer rate of 8.0 MB/s - fast enough to run applications or deliver full-
screen, full-motion video. It is compatible with both Windows (95/98/NT 4.0 & 2000)
& MAC OS 8.1 through 9.x.
Disk Cartridges: Removable disk cartridges are an alternative to hard disk units as a
form of secondary storage. The cartridge normally contains one or two platters
enclosed in a hard plastic case that is inserted into the disk drive much like a music
tape cassette. The capacity of these cartridges ranges from 5MB to more than 60MB,
somewhat lower than hard disk units but still substantially superior to diskettes. They
are handy because they give microcomputer users access to amounts of data limited
only by the number of cartridges used.
75
Basic Computer
Organisation
Quarter Inch Cartridge Tapes (QIC Standard): These tape cartridges record
information serially in a track with one head. When the end of the tape is reached the
tape is rewound and data is recorder on the next track. There are 9 to 30 tracks. Data
bits are serial on a track and blocks of around 6000 bytes are written followed by
error-correction code to enable correction of data on reading if any error occurs. The
density of data is around 16000 bits per inch in modern tapes. The tapes store around
500 MB. The cassette size is 5.25 inch just like a floppy and mounted in a slot
provided on the front panel of a computer. The tape read/write speed is around 120
inch/second and data are transferred at the rate of 240KB/s.
These tapes are normally interfaced to a computer using the SCSI standard. The data
formats used in these tapes are called QIC standard.
3.5.3 CD-R vs CD RW
A CD-R disc looks like a CD. Although all pressed CDs are silver, CD-R discs are
gold or silver on their label side and a deep green or cyan on the recordable side. The
silver/cyan CD-Rs were created because the green dye used in the original CD-R does
not reflect the shorter-wavelength red lasers used in new DVD drives. The cyan dye
used in the CD-R will allow complete compatibility with DVD drives. The CD-R disc
has four layers instead of three for a CD. At the lowest level, the laser light suffices to
detect the presence or absence of pits or marks on the recording surface to read the
disc. At the higher level, it can actually burn marks into the surface.
CD-RW is relatively new technology, but it has been gaining market share quite
rapidly. The drives cost little more than CD-R drives because they can be used to play
audio CDs and CD-ROMs as well as playing and recording CD-RW discs. A CD-RW
disc contains two more layers than a CD-R. The difference is that the recordable layer
is made of a special material, an alloy of several metals.
Iomega Corporation has announced a CD-RW drive, the Iomega 48*24*48 USB 2.0
external CD-RW drive. These drive features buffers under run protection, which list
user’s record safely, even while multitasking. It offers plug-&-play capability with
Microsoft Windows & Mac OS operating systems and its digital audio extraction rate
(DAE) of 48x allows users to rep or burn a 60-min CD in under 3 min., while
maximum drive speed is attainable only with hi-speed USB 2.0 connections.
CD-ROM 650 MB Laser Disk 500 msec Non-volatile Direct Store large 1/10000
text, pictures
and audio.
Software
distribution
DVD- 8.5 GB Laser Disk 500 msec Non-volatile Direct Video files 1/100000
ROM
Digital Audio Tape (DAT): The most appropriate tape for backing up data from a
disk today is Digital Audio Tape (DAT). It uses a 4mm tape enclosed in a cartridge. It
uses a helical scan, read after write recording technique, which provides reliable data
recording. The head spins at a high speed while the tape moves. Very high recording
densities are obtained. It uses a recording format called Digital Data Storage (DDS),
which provides three levels of error correcting code to ensure excellent data integrity.
The capacity is up to 4GB with a data transfer speed of 366KB/sec. This tape uses
SCSI interface.
77
Basic Computer ……………………………………………………………………………………
Organisation
……………………………………………………………………………………
……………………………………………………………………………………
2. State True or False:
(a) Zip drive can be used for storing 10 MB data. T/F
(b) QIC standard Cartridges have 40 tracks T/F
(c) DAT is a useful backing store technology T/F
(d) DVD-ROMs are preferred for data storage over CD-ROMs T/F
(e) Magnetic tape is faster that CD-ROM. T/F
3.6 SUMMARY
In this unit, we have discussed the characteristics of different secondary storage
drives, their drive speed, access time, rotation speed, density, etc. We also describe the
low-level and high-level formatting of a blank disk and also the use of disk
partitioning. We have also learnt to distinguish among the various types of drives, i.e.,
hard drives, optical drives, removable drives and cartridge drive, the hard drive
interfaces, removable drives and non-removable drives. This unit also described the
different types of disk formats. The advanced technologies of optical memories such
as CD-ROM, DVD-ROM, CD-R, CD-RW, etc., and backups for storage such as DAT
are also discussed in this unit.
1. A CD-ROM is a non-erasable disk used for storing computer data. The standard
uses 12 cm disk and can hold more than 650 MB.
A DVD-ROM is used for providing digitized compressed representation of video
as well as the large volume of digital data. Both 8 and 12 cm diameters are used
with a double sided capacity of up to 17GB.
2. The advantages of CD-ROM are:
78
! Large storage capacity. Secondary Storage
Techniques
! Mass replication is inexpensive and fast.
! These are removable disks, thus they are suitable for archival storage
1. A CD-R is similar to a CD-ROM but the user can write to the disk only once. A
CD-RW is also similar to a CD-ROM but the user can erase and rewrite to the
disk multiple times.
2. (a) False (b) False (c) True (d) False (e) False.
79
Basic Computer
Organisation UNIT 4 I/O TECHNOLOGY
Structure Page No.
4.0 Introduction 80
4.1 Objectives 81
4.2 Keyboard 81
4.2.1 Keyboard Layout
4.2.2 Keyboard Touch
4.2.3 Keyboard Technology
4.3 Mouse 85
4.4 Video Cards 87
4.4.1 Resolution
4.4.2 Colour Depth
4.4.3 Video Memory
4.4.4 Refresh Rates
4.4.5 Graphic Accelerators and 3-D Accelerators
4.4.6 Video Card Interfaces
4.5 Monitors 92
4.5.1 Cathode Ray Tubes
4.5.2 Shadow Mask
4.5.3 Dot Pitch
4.5.4 Monitor Resolutions
4.5.5 DPI
4.5.6 Interlacing
4.5.7 Bandwidth
4.6 Liquid Crystal Displays (LCD) 95
4.7 Digital Camera 96
4.8 Sound Cards 96
4.9 Printers 97
4.9.1 Classification of Printers
4.9.2 Print Resolutions
4.9.3 Print Speed
4.9.4 Print Quality
4.9.5 Colour Management
4.10 Modems 99
4.11 Scanners 100
4.11.1 Resolution
4.11.2 Dynamic Range/Colour Depth
4.11.3 Size and Speed
4.11.4 Scanning Tips
4.12 Power Supply 102
SMPS (Switched Mode Power Supply)
4.13 Summary 104
4.14 Solutions /Answers 104
References
4.0 INTRODUCTION
In the previous units you have been exposed to Input/Output interfaces, control and
techniques etc. This unit covers Input/Output devices and technologies related to
them. The basic aspects covered include:
! The characteristics of the Device.
! How does it function?
! How does it relate with the Main computing unit?
80
I/O Technology
4.1 OBJECTIVES
After going through this unit you will be able to:
4.2 KEYBOARD
The keyboard is the main input device for your computer. It is a fast and accurate
device. The multiple character keys allow you to send data to your computer as a
stream of characters in a serial manner. The keyboard is one device which can be used
in public spaces or offices where privacy is not ensured. The keyboard is efficient in
jobs like data entry. The keyboard is one device which shall stay on for years to come,
probably even after powerful voice-based input devices have been developed.
The precursor of the keyboard was the mechanical typewriter, hence it has inherited
many of the properties of the typewriter.
The Keys
A full size keyboard has the distance between the centres of the keycaps (keys) as
19mm (0.75in).The keycaps have a top of about 0.5in (12.5in) which is shaped as a
sort of dish to help you place your finger. Most designs have the keys curved in a
concave cylindrical shape on the top.
QWERTY
q,w,e,r,t,y are the first six letters of the top row of the alphabets of the QWERTY
layout. The QWERTY arrangement was given by Sholes, the inventor of the
typewriter. The first typewriter that Sholes created had an alphabetic layout of keys.
However, very soon Sholes designed QWERTY as a superior arrangement though he
gave no record of how he came upon this arrangement.
81
Basic Computer QWERTY-based keyboards
Organisation
Besides the standard alphabet keys having the QWERTY arrangement, a computer
keyboard also consists of the control (alt, Del, Ctrl etc. keys), the function keys (F1,
F2 .. etc.), the numerical keypad etc.
correspond to the function keys shown by many software on the monitor. However,
this has also been criticised at times for having a small enter key and function keys on
the top! ! ! .
Dvorak-Dealey keyboard
This was one keyboard layout designed to be a challenger to the QWERTY layout.
This was designed by August Dvorak and William Dealey after much scientific
research in 1936. This layout tries to make typing faster. The basic strategy it tries to
incorporate is called hand alteration. Hand alteration implies that if you press one
key with the left hand, the next key is likely to be pressed by the right hand, thus
speeding up typing (assuming you type with both hands).
82
I/O Technology
However, the Dvorak has not been able to compete with QWERTY and almost all
systems now come with QWERTY 101-key or 104-key based keyboards. Still, there
may be a possibility of designing new keyboards for specific areas, say, for Indian
scripts.
Linear travel or linear touch keyboards increase resistance linearly with the travel of
the key. Therefore, you have to press harder as the key goes lower. There can be
audible feedback as a click and visual feedback as the appearance of a character on
screen letting you know when a key gets activated. Better keyboards provide tactile
feedback (to your fingers) but suddenly reducing resistance when the key gets
actuated. This is called an over-center feel. Such keyboards are best for quick touch
typing. These were implemented by using springs earlier but now they are usually
elastic rubber domes. Keyboards also differ in whether they ‘click’ or not (soundless),
on the force required and the key travel distance to actuate a key. The choice is
usually an issue of personal liking. Laptops usually have short travel keys to save
space which is at a premium in laptops.
83
Basic Computer well but have the drawback that they follow an indirect approach though they have a
Organisation
longer life than contact-based keyboards. These keyboards were introduced by IBM.
Contact-Based Keyboards
Contact-based keyboards use switches directly. Though they have a comparatively
shorter life, they are the most preferred kind nowadays due to their lower cost. Three
such kinds of keyboards have been used in PCs:
1. Mechanical Switches: These keyboards use traditional switches with the metal
contacts directly touching each other. Springs and other parts are used to control
positioning of the keycaps and give the right feel. Overall, this design is not
suited to PC keyboards.
2. Rubber Dome: In rubber dome keyboards, both contact and positioning are
controlled by a puckered sheet of elastomer, which is a stretchy, rubber-like
synthetic material. This sheet is moulded to have a dimple or dome in each
keycap. The dome houses a tab of carbon or other conductive material which
serves as a contact. When a key is pressed, the dome presses down to touch
another contact and complete the circuit. The elastomer then pushes the key back.
This is the most popular PC keyboard design since the domes are inexpensive
and proper design can give the keyboards an excellent feel.
3. Membrane: These are similiar to rubber domes except that they use thin plastic
sheets (membranes) with conductive traces on them. The contacts are in the form
of dimples which are plucked together when a key is pressed. This design is often
used in calculators and printer keyboards due to their low cost and trouble-free
life. However, since its contacts require only a slight travel to actuate, it makes
for a poor computer keyboard.
Scan Codes
A scan code is the code generated by a microprocessor in the keyboard when a key is
pressed and is unique to the key struck. When this code is received by the computer it
OPERATOR issues an interrupt and looks up the scan code table in the BIOS and finds out which
keys have been pressed and in what combination. Special memory locations called
! means dash which is status bytes tell the status of the locking and toggle keys, e.g., Caps lock etc. Each
longer. keypress generates two different scan codes ! one on key-push down called Make
- means hypen which is
shorter.
code, another on its popping back called Break code.This two-key technique allows
the computer to tell when a key is held pressed down, e.g., the ALT key while
pressing another key, say, CTRL-ALT-DEL.
There are three standards for scan codes: Mode1 (83-key keyboard PC, PC-XT),
Mode2 (84-key AT keyboard), Mode3 (101-key keyboard onwards). In Mode1 Make
and Break codes are both single bytes but different for the same key. In Mode2 and
Mode3, Make code is a single byte and Break code is two bytes (byte F0(Hex) + the
make code).
Interfacing
The keyboard uses a special I/O port that is like a serial port but does not explicitly
follow the RS-232 serial port standard. Instead of multiple data and handshaking
signals as in RS-232, the keyboard uses only two signals, through which it manages a
bi-directional interface with its own set of commands.
Using its elaborate handshaking mechanism, the keyboard and the PC send commands
and data to each other. The USB keyboards work differently by using the USB
coding and protocol.
84
Table 1: Some Scan Codes I/O Technology
A 31 1E 9E 1C F0 1C
0 11 0B 8B 45 F0 45
Enter 43 1C 9C 5A F0 5A
Left Shift 44 2A AA 12 F0 12
F1 112 3B BB 07 F0 07
Connections
5-pin DIN connector: This is the connector of the conventional keyboard having 5
pins (2 IN, 2 OUT and one ground pin), used for synchronization and transfer.
PS/2 connector (PS/2 keyboards): These were introduced with IBM’s PS/2
computers and hence are called PS/2 connectors. They have 6-pins but in fact their
wiring is simply a rearrangement of the 5-pin DIN connector. This connector is
smaller in size and quite popular nowadays. Due to the similiar wiring, a 5-pin DIN
can easily be connected to a PS/2 connector via a simple adapter.
Ergonomic Keyboards
Ergonomics is the study of the environment, conditions and efficiency of workers1 .
Ergonomics suggests that the keyboard was not designed with human beings in mind.
Indeed, continuous typing can be hazardous to health. This can lead to pain or some
ailments like the Carpal Tunnel Syndrome.
For normal typing on a keyboard, you have to place your hands apart, bending them at
the wrists and hold this position for a long time. You also have to bend your wrist
vertically especially if you elevate your keyboard using the little feet behind the
keyboards. This stresses the wrist ligaments and squeezes the nerves running into the
hand through the Carpal tunnel, through the wrist bones.
To reduce the stress, keyboards called ergonomic keyboards have been designed.
These split the keyboard into two and angle the two halves so as to keep the wrists
straight. To reduce vertical stress, many keyboards also provide extended wrist rests.
For those who indulge in heavy, regular typing, it is recommended that they use more
ergonomics based keyboards and follow ergonomic advice in all aspects of their
workplace.
4.3 MOUSE
The idea of the Mouse was developed by Douglas C. Engelbart of Stanford Research
institute, and the first Mouse was developed by Xerox corporation. Mouse itself is a
device which gives you a pointer on screen and a method of selection of commands
through buttons on the top. A single button is usually sufficient (as in Mouse with
Apple Macintosh machines) but Mice come with upto 3 buttons.
Types of Mice
Mice can be classified on the basis of the numbers of buttons, position sensing
technology or the type of Interface:
1
Oxford Advanced Learner’s Dictionary
85
Basic Computer Sensing Technology
Organisation
The Mice can be Mechanical or Optical.
Mechanical Mice have a ball made from rough rubbery material, the rotation of
which effects sensors that are perpendicular to each other. Thus, the motion of the
ball along the two axes is detected and reflected as the motion of the pointer on the
screen.
Optical Mice can detect movement without any moving parts like a ball. The typical
optical Mouse used to have a pair of LEDs (Light Emitting Diodes) and photo-
detectors in each axis and its own Mousepad on which it is slided. However, due to
the maintenance needs of the Mousepad, this was not very successful. Recently,
optical Mice have made a comeback since they can now operate without a Mousepad.
Interface
Mouse is usually a serial device connected to a serial port(RS232), but these
connections can themselves take various forms:
Serial Mouse
Mice that use the standard serial port are called “serial”. Since Serial ports 1 and 4
(COM1, COM4 under DOS, /dev/ttyS0 and /dev/ttyS3 under Unix/GNU-Linux
systems) and ports 2 and 3 (COM2, COM3 or /dev/ttyS1/dev/ttyS2) share the same
interrupts respectively, one should be careful not to attach the mouse so that it shares
the interrupt with another device in operation like a modem.
Bus Mouse
These Mice have a dedicated Mouse card and port to connect to. Recently, USB
mouse has become popular.
Proprietary
Mouse ports specific to some PCs e.g., IBM’s PS/2 and some Compaq computers.
Mouse Protocols
The mouse protocol is the digital code to which the signal from the mouse gets
converted. There are four major protocols: Microsoft, Mouse Systems
Corporation(MSC), Logitech and IBM. Most mice available do support at least the
Microsoft protocol or its emulation.
86
2. Why is keyboard touch important? What kind of touch would you prefer and I/O Technology
which kind of keyboard will give that touch?
..................................................................................................................................
..................................................................................................................................
..................................................................................................................................
..................................................................................................................................
..................................................................................................................................
.................................................................................................................................
..................................................................................................................................
..................................................................................................................................
...............................................................................................................................
..................................................................................................................................
4. You enter ‘a’ as left-shift + ‘A’ ? What will be the scan-code generated in
Mode-3 by the keyboard?
a) 2A1E9EAA b) 1CF01C
c) 121CF01CF012 d) 1CF01C5AF05A
87
Figure 3: Raster Display
Basic Computer The more the number of dots, i.e., the higher the resolution of the image, the sharper
Organisation
the picture is. The richness of the image is also dependant on the number of colours
(or gray levels for a monochrome display) displayed by the system. The higher the
number of colours, the more is the information required for each dot. Hence, the
amount of memory (framebuffer) required by a system is directly dependent on the
resolution and colour depth required.
4.4.1 Resolution
Resolution is the parameter that defines the possible sharpness or clarity of a video
image. Resolution is defined as the number of pixels that make up an image. These
pixels are then spread across the width and height of the monitor. Resolution is
independent of the physical characteristics of the monitor. The image is generated
without considering the ultimate screen it is to be displayed upon. Hence, the unit of
resolution is the number of pixels, not the number of pixels per inch. For example, a
standard VGA native graphic display mode has a resolution of 640 pixels horizontally
by 480 pixels vertically. Higher resolutions mean the image can be sharper because it
contains more pixels.
The actual on-screen sharpness is given as dots-per-inch, and this depends on both the
resolution and the size of the image. For the same resolution, an image will be
sharper on a smaller screen, i.e., an image which may look sharp on a 15" monitor
may be a little jagged on a 17” display.
Colour Depth ( or the number of Colour Planes) is the number of bits assigned to
each pixel to code colour information in it. These are also called Colour Planes
because each bit of a pixel represents a specific colour and the bit at the same position
on every pixel represents the same colour. Hence, the bits at the same position can be
thought of as forming a plane of a particular colour shade and these planes piled on
top of each other give the final colour at each point. Thus, if each pixel is described
by 3 bits, one each for red, green and blue colour, then, there are 3 Colour Planes (one
each for red, green and blue) and 6 colour planes if there are 6 bits — see Figure 4.
88
Figure 4: Colour Planes
What Colour depths are practically used? I/O Technology
Practically, the number of colours are an exponential power of 2, since for Colour
Depth n, colours can be displayed. The most popular colour modes are given in
Table 2.
This also implies that 24-bit colour bit-depth is the practical upper limit. Hence, this
depth is also called true colour because with this depth the system stores more colours
than can ever be seen by the human eye and, hence, it is a true colour representation of
the image. Though, 24-bit colour or true colour systems have more colour than
possibly useful, they are convenient for designers because they assign 1 byte of
storage for each of the three additive primary colours (red, green and blue). Some
new systems even have 32 bits per pixel. Why? Actually, the additional bits are not
used to hold colours but something called an Alpha Channel. This 8-bit Alpha
Channel stores special effect information for the image.
Why are all resolutions in the ratio of 43? The answer you’ll find in a later section.
The amount of video memory required is dependant on the resolution and colour-
depth required of the system. Let us see how to calculate the amount of video
memory required. The video memory required is simply the resolution (i.e., the total
number of pixels) multiplied by the Colour Depth. Let us do the calculations for a
standard VGA graphics screen (640 " 480) using 16 colours.
If you can’t wait any longer, here is the answer: 1152 " 864 is nearly one million
pixels. Since 8-bit colour depth means 8 million bits or 1 MB. This is the highest
resolution you can get in 1 MB video memory at 8-bit colour depth, plus this still
leaves you square pixels (in the ratio 4: 3) to allow easy programming.
The above calculations hold good for only two-dimensionsal display systems. This is
because 3-D systems require much more memory because of techniques such as
“Double Buffering” and “Z-Buffering”.
The most important thing is maintaining the same frequencies between the Video
system and monitor. The monitor must support these refresh rates, hence the
supported refresh rates are given with the manual of the monitor. More about this
topic will be discussed in the section on Monitors.
The graphic accelerator determines whether your system can show 3-D graphics, how
quickly your system displays a drop-down menu, how good is your video playback,
etc. It determines the amount and kind of memory in the framebuffer and also the
resolution your PC can display.
The first major graphic accelerators were made by the S3 corporation. Modern
Graphic accelerators have internal registers at least 64-bit wide to work on at least 2
pixels at a time. They can use the standard Dynamic RAM (DRAM) or the more
expensive but faster dual-ported Video RAM (VRAM). They support at least the
standard resolutions up to 1024 " 768 pixels. They often use RAMDACs for colour
support giving full 24-bit or 32-bit colour support. A RAMDAC (Random Access
90
Memory Digital-to-Analog Converter) is a microchip that converts digital image data I/O Technology
into the analog data needed by a computer display. However, the higher the
resolution required, the higher is the speed at which the chip has to function. So, for a
resolution of 1280 " 1024, the chip operates at 100 MHz. At the cutting edge of
technology, chips now run even as fast as 180 or 200 Mhz.
AGP
AGP stands for Advanced (or Accelerated) Graphics Port. It is a connector standard
describing a high speed bus connection between the PC video system, the
microprocessor and the main memory. It is an advancement of the PCI interface.
AGP uses concepts such as pipelining to allow powerful 3-D graphic accelerators to
function when used in conjuction with fast processors. AGP uses three powerful
innovations to achieve its performance:
! Pipelined Memory: The use of Pipelining eliminates wait states allowing faster
operation.
! Seperate Address and Data Lines.
! High speeds through a special 2X mode that allows running AGP at 133 MHz
instead of the default 66 MHz.
Through AGP, the video board has a direct connection to the microprocessor as a
dedicated high speed interface for video. The system uses DMA (Direct Memory
Access) to move data between main memory and framebuffer. The accelerator chip
uses the main memory for execution of high level functions like those used in 3-D
rendering.
UMA
UMA stands for Unified Memory Architecture. It is an architecture which reduces the
cost of PC construction. In this, a part of the main memory is actually used as
framebuffer. Hence, it eliminates the use of a bus for video processing. Therefore, it
is less costly. Though it is not supposed to perform as well as AGP etc., in some
91
Basic Computer cases it may give a better performance than the bus-based systems. It is the interface
Organisation
used nowadays in low-cost motherboards.
4.5 MONITORS
A Monitor is the television like box connected to your computer and giving you a
vision into the mind of your PC. It shows what your computer is thinking. It has a
display which is technically defined as the image-producing device, i.e., the screen
one sees and a circuitry that converts the signals from your computer (or similiar
devices) into the proper form for display.
Monitors are or were just like television sets except that television sets have a tuner or
demodulator circuit to convert the signals. However, now monitors have branched
beyond television. They have greater sharpness and colour purity and operate at
higher frequencies.
Generally, when you go to purchase a monitor from the market, you see the following
specifications: The maximum Resolution, the Horizontal and Vertical Frequencies
supported, the tube size and the connectors to the monitor. There are many vendors on
the market like Samsung, LG, Sony etc. Home users generally go in for monitors of
size 17”, 15” or 14” . Monitors are also available as the traditional curved screens,
flat screens or LCD. The technology behind Monitors and the above specifications
are discussed ahead.
1. The Phosphor coating : This affects the colour and the persistence (The period
the effect of a single hit on a dot lasts).
2. The Cathode (Electron Gun) : The sharpness of the image depends on the good
functioning of this gun.
92
3. Shadow Mask/ Aperture Grill : This determines the resolution of the screen in I/O Technology
colour monitors.
4. The Screen, glare and lighting of the monitor.
93
Basic Computer Horizontal Frequency: The time to scan one line connecting the right edge to the left
Organisation
edge of the screen horizontally is called the Horizontal cycle and the inverse number
of the Horizontal cycle is called Horizontal Frequency. The unit is KHz (KiloHertz).
Vertical Frequency: Like a Flouroscent lamp, the screen has to repeat the same
image many times per second to display an image to the user. The frequency of this
repetition is called Vertical Frequency or Refresh Rate.
If the resolution generated by the video card and the monitor resolution is properly
matched, you get a good quality display. However, the actual resolution achieved is a
physical quality of the monitor. In colour systems, the resolution is limited by
Convergence (Do the beam of the 3 colours converge exactly on the same dot? ) and
the Dot Pitch. In monochrome monitors, the resolution is only limited by the highest
frequency signals the monitor can handle.
4.5.5 DPI
DPI (Dots Per Inch) is a measure for the actual sharpness of the onscreen image. This
depends on both the resolution and the size of the image. Practical experience shows
that a smaller screen has a sharper image at the same resolution than does a larger
screen. This is because it will require more dots per inch to display the same number
of pixels. A 15-inch monitor is 12-inches horizontally. A 10-inch monitor is 8 inches
horizontally. To display a VGA image (640 480) the 15-inch monitor will require
53DPI and the 10-inch monitor 80 DPI.
4.5.6 Interlacing
Interlacing is a technique in which instead of scanning the image one-line-at-a-time it
is scanned alterenately, i.e., alternate lines are scanned at each pass. This achieves a
doubling of the frame rate with the same amount of signal input. Interlacing is used to
keep bandwidth (amount of signal) down. Presently, only the 8514/A display adapters
use interlacing. Since Interlaced displays have been reported to be more flickery, with
better technology available, most monitors are non-interlaced now.
4.5.7 Bandwidth
Bandwidth is the amount of signal the monitor can handle and it is rated in
MegaHertz. This is the most commonly quoted specification of a monitor. The
Bandwidth should be enough to address each pixel plus synchronizing signals.
94
3. What is the difference between Shadow Mask and Dot Pitch for Trinitron and I/O Technology
non-Trinitron monitors?
..................................................................................................................................
..................................................................................................................................
..................................................................................................................................
4. How much Video-RAM would you require for a high-colour (16-bits) Colour-
Depth at 1024 768 resolution? What would be the size of the corresponding
single memory chip you would get from the market?
a) 900KB, 1MB b) 1.6 MB, 4MB
c) 12.6MB, 16MB d)7.6MB, 8MB
LCD Technology
The technology behind LCD is called Nematic Technology because the molecules of
the liquid crystals used are nematic i.e. rod-shaped. This liquid is sandwiched
between two thin plastic membranes. These crystals have the special property that
they can change the polarity and the bend of the light and this can be controlled by
grooves in the plastic and by applying electric current.
Passive Matrix
In a passive matrix arrangement, the LCD panel has a grid of horizontal and vertical
conductors and each pixel is located at an intersection. When a current is recieved by
the pixel, it becomes dark. This is the technology which is more commonly used.
Active Matrix
This is called TFT (Thin Film Transistor) technology. In this there is a transistor at
every pixel acting as a relay, receiving a small amount and making it much higher to
activate the pixel. Since the amount is smaller, it can travel faster and hence response
times are much faster. However, TFTs are much more difficult to fabricate and are
costlier.
95
Basic Computer
Organisation 4.7 DIGITAL CAMERA
A Digital camera is a camera that captures and stores still images and video (Digital
Video Cameras) as digital data instead of on photographic film. The first digital
cameras became available in the early 1990s. Since the images are in digital form
they can be later fed to a computer or printed on a printer.
Like a conventional camera, a digital camera has a series of lenses that focus light to
create an image of a scene. But instead of this light hitting a piece of film, the camera
focuses it on to a semiconductor device that records light electronically. An in-built
computer then breaks this electronic information down into digital data.
This semiconductor device is called an Image sensor and converts light into electrical
charges. There are two main kinds of Image sensors: CCD and CMOS. CCD stands
for Charge coupled devices and is the more popular and more powerful kind of
sensor. CMOS stands for Complementary Metal oxide semiconductor and this kind
of technology is now only used in some lower end cameras. While CMOS sensors
may improve and become more popular in the future, they probably won’t replace
CCD sensors in higher-end digital cameras.
In brief, the CCD is a collection of tiny light-sensitive diodes called photosites, which
convert photons (light) into electrons (electrical charge). Each photosite is
proportionally sensitive to light – the brighter the light that hits a single photosite, the
greater the electrical charge that will accumulate at that site.
A digital Camera is also characterised by its resolution (like monitors and printers)
which is measured in pixels. The higher the resolution, the more detail is available in
an image.
Mobile Cameras
Mobile cameras are typically low-resolution Digtial cameras integrated into the
mobile set. The photographs are typically only good enough to show on the low
resolution mobile screen. They have become quite popular devices now and the
photographs taken can be used for MMS messages or uploading to a Computer.
As you must have read in your high school physics, sound is a longitudinal wave
travelling in a medium, usually air in the case of music. Sound can be encoded into
electrical form using electrical signals which encode sound strengths. This is called
analog audio. This analog audio is converted to digital audio, which is conversion of
those signals into bits and bytes through the process called Sampling. In Sampling,
analog ‘samples’ are taken at regular intervals and the amplitude (Voltage) of these
samples is encoded to bits. These sounds are manipulated by your PCs
microprocessor etc. To play back these digital audio sounds, the data are sent to the
Sound card which converts them to analog audio, which is played back through
speakers.
96
The Sound card (The card is often directly built into motherboards nowadays) is a I/O Technology
board that has digital to analog sound converter, amplifier, etc., circuitry to play
sound and to connect the PC to various audio sources.
A sound card may support the following functions:
1. Convert digital sound to analog form using digital-to-analog converter to play
back the sound.
2. May record sound to play back later with analog-to-digital converter.
3. May have built-in Synthesizers to create new sounds.
4. May use various input sources (Microphone, CD, etc.) and mixer circuits to play
these sounds together.
5. Amplifiers to amplify the sound signals to nicely audible levels.
Compatibility: Sound cards must be compatible at both hardware and software levels
with industry standards. Most software, especially games, require sound cards to be
compatible with the two main industry standards: AdLib (A Basic standard) and
Sound Blaster (an advanced standard developed by Creative Labs).
Connections: Sound cards should have connections to allow various functions. One
of the most important is the MIDI port (MIDI stands for Musical Instrument Device
Interface). MIDI port allows you to create music directly with your PC using the
Sound Cards synthesizer circuit and even attach a Piano keyboard to your PC.
Quality: Sound Cards vary widely in terms of the quality they give. This ranges
from the frequency range support, digital quality and noise control.
4.9 PRINTERS
Printers are devices that put ink on paper in a controlled manner. They manually
produce readable text or photographic images. Printers have gone through a large
transition in technology. They are still available in a wide range of technology and
prices from the dot matrix printer to Inkjet printers to Laser Printers.
Actually, there are many specifications one has to keep in mind while purchasing a
printer. Some of these are Compatibiltiy with other hardware, in-built Memory,
maximum supported memory, actual technology, Printer resolution (Colour, BW),
PostScript support, output type, Printer speed, Media capacity, Weight, Height and
Width of the Printer.
97
Basic Computer
Organisation
4.9.2 Print Resolution
Print Resolution is the detail that a printer can give determined by how many dots a
printer can put per inch of paper. Thus, the unit of resolution is Dots per inch. This is
applicable to both impact and non-impact printer though the actual quality will depend
on the technology of the printer.
The required resolution to a great extent determines the quality of the output and the
time taken to print it. There is a tradeoff between quality and time. Lower resolution
means faster printing and low quality. High resolution means slower printing of a
higher quality. There are three readymade resolution modes: draft, near letter quality
(NLQ) and letter quality. Draft gives the lower resolution print and letter quality
higher resolution. In Inkjet and Laser Printers, the highest mode is often called ‘best’
quality print.
Actually, it rasterizes the full image of the page in its memory and then prints it as one
line of dots at a time. For a line printer, the speed is measured in characters per
second (cps) whereas for page printing, it is pages per minute (ppm). Hence, Dot
Matrix usually have speeds given in cps whereas Lasers have speed in ppm. The
actual speed may vary from the rating speed given by the manufacturer because, as
expected, the printer chooses the more favourable values.
DotMatrix/InkJet Printers
Three main issues determine the quality of characters produced by DotMatrix/InkJet
Printers: - Number of dots in the matrix of each character, the size of the dots and the
addressability of the Printer. Denser matrix and smaller dots make better characters.
Addressability is the accuracy with which a dot can be produced (e.g., 1/120 inch
means printer can put a dot with 1/120 inch of the required dot). Minimum dot matrix
used by general dot matrix printers is 9 " 9 dots, 18-pin and 24-pin printers use
12 " 24 to 24 " 24 matrices. Inkjets may even give up to 72 " 120 dots. Quality of
output also depends on the paper used. If the ink of an Inkjet printer gets absorbed by
the paper, it spreads and spoils the resolution.
Laser Printer
Laser Printers are page printers. For print quality, they also face the same
addressability issues as DMP/InkJet Printers. However, some other techniques are
possible to use for better quality here.
98
One of these is ReT(Resolution Enhancement Technology) introduced by Hewlett- I/O Technology
Packard. It prints better at the same resolution by changing the size of the dots at
character edges and diagonal lines reducing jagged edges.
A very important requirement for Laser Printers to print at high quality is Memory.
Memory increases as a square of resolution, i.e., the Dot density, i.e., the dpi.
Therefore, if 3.5 MB is required for a 600 dpi page, approximately 14 MB is required
for 1200 dpi. You need even more memory for colour.
For efficient text printing, the Laser printer stores the page image as ASCII characters
and fonts and prints them with low memory usage. At higher resolutions, the quality
of print toner also becomes important since the resolution is limited by the size of
toner particles.
Physical Mixing: Physically mix colours to make a new colour. This is difficult for
printers because their colours are quick drying and so colours to be mixed must be
applied simultaneously.
Optical Mixing: Mixing to give the illusion of a new colour. This can be done in
ways:
! Apply colours one upon another. This is done using inks which are somewhat
transparent, as modern inks are.
! Applying dots of different colours so close to one another that the human eye
cannot distinguish the difference. This is the theory behind Dithering.
3 or 4 colour Printing?
For good printing, printers do not use RBY, instead they use CMYK (Cyan instead of
Blue, Magenta instead of Red, Yellow, and a separate Black). A separate Black is
required since the 3 colours mixed to produce a black (which is called Composite
Black) is often not satisfactory.
What is Dithering?
CMYK gives only 8 colours ( C, M, Y K, Violet= C + M, Orange= M + Y,
Green = C + Y, and the colour of the paper itself! ). What about other colours? For
these, the technique of Dithering is used. Dithering is a method in which instead of
being a single colour dot, it is a small matrix of a number of different colour dots.
Such pixels are called Super-pixels. The dots of a given colour in a Super-pixel
decide the intensity of that colour. The problem with dithering is that it reduces the
resolution of the image since more dots are taken by a single pixel now.
99
Basic Computer
Organisation 4.10 MODEMS
A Modem is one device that most computer users who have surfed the Internet are
aware of. A modem is required because though most of the telecommunications have
become digital, most telephone connections at the user end are still the analog POTS
(Plain Old Telephone Systems/Sets/Service). However, the computer is a digital
device and hence another device is needed which can convert the digital signals to
analog signals and vice-versa. Such a device is the Modem.
Modem stands for Modulator/Demodulator. Modulation is the process which puts
digital information on to the analog circuit by modifying a constant wave (signal)
called the Carrier. This is what happens when you press a button to connect to the
Internet or to a web site. Demodulation is the reverse process, which derived the
digital signal from the modulated wave. This is what happens when you receive data
from a website which then gets displayed by your browser.
Discussion of modulation techniques is out of scope here (you can refer to your course
on Computer Networks).
1. Internal Modems: Internal Modems plug into expansion slots in your PC.
Internal Modems are cheap and efficient. Internal Modems are bus-specific and
hence may not fit universally.
2. External Modems: Modems externally connected to PC through a serial or
parallel port and into a telephone line at the other end. They can usually connect
to any computer with the right port and have a range of indicators for
troubleshooting.
3. Pocket Modems: Small external Modems used with notebook PCs.
4. PC-Card Modems: PC and Modems are read with PCMCIA slots found in
notebooks. They are like external Modems which fit into an internal slot. Thus,
they give the advantage of both external and internal modems but are more
expensive.
Modems come according to CCITT/ITU standards, e.g., V.32, V.32bis, V.42 etc.
Modem Language
Modems understand a set of instructions called Hayes Command Set or the AT
Command Set. These commands are used to communicate with the Modem.
Sometimes, when you are in trouble setting up your Modem, it is useful to know some
basic commands, e.g., ATDT 17776 will dial the number 17776 across a Tone Phone
and ATDP 17776 to the number 17776 if it is a Pulse phone.
4.11 SCANNERS
A Scanner is a device that allows you to capture drawings or photographs or text from
tangible sources (paper, slides etc.) into electronic form. Scanners work by detecting
differences in brightness of reflections from an image or object using light sensors.
These light sensors are arranged in an array across the whole width that is scannable.
This packing determines the resolution and details that can be scanned.
Scanners come in various types: Drum Scanners, Flatbed Scanners, Hand Scanners
and Video Scanners. Drum Scanners use a rotating drum to scan loose paper sheets.
Flatbed scanners have movable sensors to scan images placed on a flat glass tray.
100
These are the most expensive kind. Hand held Scanners are the cheapest and most I/O Technology
portable.
They are useful for many applications but are small in size and need good hand
control for high quality scanning. Video Scanners use Video technology and Video
cameras instead of Scanning technology. Potentially, they can give high resolutions,
scanners in the economical range give poor resolutions.
Figure 7: Scanners
When you buy a scanner, there are many factors that can be looked at: Compatibility
of the Scanner with your Computer, The Technology (Depth, Resolution), the media
types supported for scanning, How media can be loaded, Media size supported,
Interfaces supported, physical dimensions, style and ease of use of the scanner.
One exciting application of Scanners is Optical Recognition of Characters (OCR).
OCR software tries to recognise characters from their shapes and write out the
scanned text as a text file. Though this technology is steadingly improving, it is still
not completely reliable especially w.r.t. Indian scripts. However, it can be very
useful to digitize the ancient texts written in Indian scripts.
4.11.1 Resolution
Optical Resolution
Optical resolution or hardware resolution is the mechanical limit on resolution of the
Scanner. For scanning, the sensor has to advance after each line it scans. The
smallness of this advancement step gives the resolution of the Scanner. Typically,
Scanners may be available with mechanical resolutions of 300, 600, 1200 or 2400 dpi.
Some special scanners even scan at 10,000 dpi.
Interpolated Resolution
Each Scanner is accompanied by a software. This software can increase the apparent
resolution of the scan by a technique called Interpolation. By this technique,
additional dots are interpolated (added) between existing dots. This gives a higher
resolution and smoother picture but without adding any additional information. The
added dots will however lead to larger file sizes.
101
Basic Computer
Organisation
4.11.2 Dynamic Range/Colour Depth
Dynamic Range is the number of colours a colour scan or the number of grays a
monochrome scanner can differentiate. The dynamic range is usually given as bit-
depth or colour depth. This is simply the number of bits to distinguish the colours.
Most scanners can do 256(8-bit), 1024(10-bit) or 4096(12-bit) for each primary
colour. This adds up to and is advertised as 24-bit, 30-bit and 36-bit colour scanners.
Actually though, to utilise the Colour Depth, the image under scanning must be
properly focused upon and properly illuminated by the scanner.
Since the minimum colour range useful for human vision is 24-bits, more bits may
seem useful. However, extra bits of scanning give you firm control for filtering the
image colour to your requirements.
High resolution scans of large images result in large file sizes. These can slow down
processing since they need Hard Disk I/O for virtual memory. Hence, for large scans,
it is necessary to have higher RAM in your PC.
! Do not scan at more resolution than required. This saves both time and Disk
Space.
! Usually, it is not useful to scan at more than the optical resolution since it adds no
new information. Interpolation can be done later with Image processing
softwares.
! If scanning photographs for Printers, it is enough to scan at one-third the
resolution of printing, since Printers usually use Super-Pixels (Dithering) for
printing. Only for other kind of Printers, like continuous tone Printers, do you
need to scan at the Printer resolution for best quality.
! For images to be seen only at the Computer Monitor, you may need to only scan
so that the image size in pixels is the same as display resolution. That is, Scan
resolution = Height of image in pixels divided by the screen size in inches. This
may be surprisingly small.
102
SMPS (Switched Mode Power Supply) I/O Technology
SMPS is the unit into which the electric supply from the mains is attached to your PC
and this supplies DC to the internal circuits. It is more efficient, less expensive and
more complex than linear supplies.
SMPS works in the following way: The electric supply received is sent to a
component called triac which shifts it from 50 Hz to a much higher frequency (almost
20.000 Hz). At the same time, using a technique called Pulse Width modulation, the
pulse is varied to the needs of the computer circuit. Shorter pulses give lower output
voltage. A transformer then reduces back the voltage to the correct levels and
rectifiers and filters generate the pure DC current.
SMPS has two main advantages: They generates less heat since they waste less power,
and use less expensive transformers and circuits since they operate at higher
frequencies.
2. Explain the term Resolution and how it applies to Monitors, Cameras, Printers,
Scanners etc.
...................................................................................................................................
...................................................................................................................................
..................................................................................................................................
..................................................................................................................................
4. Compare Laptops using passive matrix and TFT technology. Which are cheaper
in price?
...................................................................................................................................
...................................................................................................................................
...................................................................................................................................
...................................................................................................................................
...................................................................................................................................
103
Basic Computer 5. To connect my modem to an ISP, I have to dial to the Pulse phone number
Organisation
26176661, what Hayes Command set command would I give:
a) ATDT 1777626176661 b) ATDT 26176661
c) 8MB d) 16MB
c) MP3 d) OGG
4.13 SUMMARY
In this unit, we discussed various Input/Output devices. We have covered the input
devices Keyboard, Mouse and Scanner. Various types of Keyboards, Keyboard
layouts (QWERTY, Dvorak) and technologies have been discussed. Various types of
mice and their operation have been discussed. Different types of Scanners, the
underlying technology and use in applications like OCR have been discussed.
The output devices discussed are Monitor, LCD and Printer. The technologies and
specifications behind Monitors, LCD and Printers have been discussed. Colour
management has also been discussed. Video cards, which control the display on
monitors from the CPU and their system of display have been discussed with their
characteristics like depth, resolution and memory. Modem is a communication device
and thereby an I/O device. Its functioning has been discussed. The Power supply, and
especially, the SMPS, which is actually input of electric power for the computing unit,
has also been discussed.
1. The main Merit of Dvorak-Dealey keyboard is the scientific design using Hand
alteration. However, since it came much later than QWERTY it did not become
popular, as QWERTY was already well established.
2. Keyboard touch gives you a feedback mechanism. This tells you when you
have pressed a key enough and invloluntarlily allows faster typing. The
preferred touch is an individual choice but the best feedback is provided with an
overcenter feel with a `click' sound. The most suitable touch is given by
Rubber Dome keyboards. (refer text for details).
3. Besides the standard precautions while attaching Hardware, one has to take
precaution regarding interrupt conflict for serial devices., since Serial ports
share their interrupts. (refer text for details).
104
Check Your Progress 2 I/O Technology
1. A true-colour system has a depth of 24 bits per pixel. This means that 8 bits
each are assigned to R,G and B i.e. there are 8 Colour Planes. Hence, in figure-
4 replace ` n' by 8 to draw the new figure.
2. Framebuffer is another name for the Display Memory. This is like a time-slice
of what you see on your monitor. Discuss how framebuffer is handled
differently in early display systems, PCI, AGP and UMA. (refer text for
details).
3. Shadow Mask: Trinitron uses Aperture Grills instead of Shadow Mask, for the
same purpose.
Dot Pitch: Similiarly, instead of Dot Pitch, there is Slot Pitch.
explain the terms Shadow Mask, Aperture Grill, Dot Pitch and Slot Pitch (refer
text).
4. Ans. (b) 1024 " 768 " 2Bytes = 1.6MB. RAM is/was available as 1MB, 4MB,
16MB etc.
5. Ans. (a) Total screen size = 12 " 9 = 108 inches. image size
= 1024 " 768 = 786432 pixels. divide 108 inches by 786432.
1. In a digital camera, photos are stored in digital format. Instead of film, these
cameras use Semiconductor devices, called image sensors. There are many
other differences regarding quality, resolution etc.
2. Resolution is a generic term the parameter that defines the possible sharpness or
clarity of something i.e. how clearly that thing can be resolved. This applies
especially to images. See in what different ways it is used for Monitors,
Cameras, Printers, Scanners and even Mice.
3. It tells about physical mixing, optical mixing and RGB and CMYK schemes.
The rechnique of dithering is used for rich colour quality. Colours also differ on
monitors and printers. To maintain similarity is also an important issue.
4. Compare Laptops made using passive matrix and TFT technology. Which are
cheaper in price?
In a Passive matrix arrangement, the LCD has a grid of horizontal and vertical
conductors. Each pixel is located at an intersection. When a current is recieved
by the pixel, it becomes dark whereas in Active Matrix, also called TFT (Thin
Film Transistor) technology, each pixel is active, working as a relay. Hence, it
needs less power and gives better quality display. Passive matrix LCDs are
cheaper but now, TFT LCDs are also economically available. (find out the
latest from the market).
5. Ans. (d) ATDP 26176661.
105
Basic Computer References:
Organisation
1) http: //whatis.techtarget.com/.
2) http: //www.epanorama.net/links/pc/index.htm.
3) http: //www.howstuffworks.com/.
4) Mark Minasi. The Complete PC Upgrade and Maintenance Guide. BPB
Publications, New Delhi, 2002.
5) Winn L. Rosch. Hardware Bible. Techmedia, New Delhi, 1997.
106
Instruction Set
Architecture
UNIT 1 INSTRUCTION SET
ARCHITECTURE
Structure Page No.
1.0 Introduction 5
1.1 Objectives 5
1.2 Instruction Set Characteristics 6
1.3 Instruction Set Design Considerations 9
1.3.1 Operand Data Types
1.3.2 Types of Instructions
1.3.3 Number of Addresses in an Instruction
1.4 Addressing Schemes 18
1.4.1 Immediate Addressing
1.4.2 Direct Addressing
1.4.3 Indirect Addressing
1.4.4 Register Addressing
1.4.5 Register Indirect Addressing
1.4.6 Indexed Addressing Scheme
1.4.7 Base Register Addressing
1.4.8 Relative Addressing Scheme
1.4.9 Stack Addressing
1.5 Instruction Set and Format Design Issues 26
1.5.1 Instruction Length
1.5.2 Allocation of Bits Among Opcode and Operand
1.5.3 Variable Length of Instructions
1.6 Example of Instruction Format 28
1.7 Summary 29
1.8 Solutions/ Answers 30
1.0 INTRODUCTION
The Instruction Set Architecture (ISA) is the part of the processor that is visible to the
programmer or compiler designer. They are the parts of a processor design that need
to be understood in order to write assembly language, such as the machine language
instructions and registers. Parts of the architecture that are left to the implementation
are not part of ISA. The ISA serves as the boundary between software and hardware.
The term instruction will be used in this unit more often. What is an instruction?
What are its components? What are different types of instructions? What are the
various addressing schemes and their importance? This unit is an attempt to answer
these questions. In addition, the unit also discusses the design issues relating to
instruction format. We have presented here the instruction set of MIPS
(Microprocessor without Interlocked Pipeline Stages) processor (very briefly) as an
example.
Other related microprocessors instruction set can be studied from further readings. We
will also discuss about the complete instruction set of 8086 micro-processor in unit 1,
Block 4 of this course.
1.1 OBJECTIVES
After going through this unit you should be able to:
The common goal of computer designers is to build the hardware for implementing
the machine’s instructions for CPU. From the programmer’s point of view, the user
must understand machine or assembly language for low-level programming.
Moreover, the user must be aware of the register set, instruction types and the function
that each instruction performs.
This unit covers both the viewpoints. However, our prime focus is the programmer’s
viewpoint with the design of instruction set. Now, let us define the instructions, parts
of instruction and so on.
Thus, each instruction consists of several fields. The most common fields found in
instruction formats are:
Opcode: (What operation to perform?)
0 5 6 7 8 31
Instruction Length
Figure 1: A Hypothetical Instruction Format of 32 bits
In case of immediate operand the maximum size of the unsigned operand would be
224.
7
The Central
Processing Unit
For this machine there may be two more possible addressing modes in addition to the
immediate and direct. However, let us not discuss addressing modes right now. They
will be discussed in general, details in section 1.4 of this unit.
The opcode field of an instruction is a group of bits that define various processor
operations such as LOAD, STORE, ADD, and SHIFT to be performed on some data
stored in registers or memory.
The operand address field can be data, or can refer to data – that is address of data, or
can be labels, which may be the address of an instruction you want to execute next,
such labels are commonly used in Subroutine call instructions. An operand address
can be:
However, later it was found in the studies of program style that many complex
instructions found CISC are not used by the program. This lead to the idea of making
a simple but faster computer, which could execute simple instructions much faster.
These computers have simple instructions, registers addressing and move registers.
These are called Reduced Instruction Set Computers (RISC). We will study more
about RISC in Unit 5 of this Block.
3. The opcode field of an instruction specifies the address field of operand on which
data processing is to be performed.
8
Instruction Set
4. The operands placed in processor registers are fetched faster than that of Architecture
operands placed in memory.
! A set of data types (e.g. integers, long integers, doubles, character strings etc.).
! A set of operations on those data types.
! A set of instruction formats. Includes issues like number of addresses,
instruction length etc.
! A set of techniques for addressing data in memory or in registers.
! The number of registers which can be referenced by an instruction and how
they are used.
We will discuss the above concepts in more detail in the subsequent sections.
! Numbers: All machine languages include numeric data types. Numeric data
usually use one of three representations:
! Floating-point numbers-single precision (1 sign bit, 8 exponent bits, 23
mantissa bits) and double precision (1 sign bit, 11 exponent bits, 52 mantissa
bits).
! Fixed point numbers (signed or unsigned).
9
The Central
Processing Unit
! Binary Coded Decimal Numbers.
! Logical data: Each word or byte is treated as a single unit of data. When an n-bit
data unit is considered as consisting of n 1-bit items of data with each item
having the value 0 or 1, then they are viewed as logical data. Such bit-oriented
data can be used to store an array of Boolean or binary data variables where each
variable can take on only the values 1 (true) and 0 (false). One simple application
of such a data may be the cases where we manipulate bits of a data item. For
example, in floating-point addition we need to shift mantissa bits.
Types of Instructions
Logical: AND, OR, NOT, XOR operate on binary data stored in registers. For
example, if two registers contain the data:
R1 = 1011 0111
R2 = 1111 0000
Then,
R1 AND R2 = 1011 0000. Thus, the AND operation can be used as a mask that selects
certain bits in a word and zeros out the remaining bits. With one register is set to all
1’s, the XOR operation inverts those bits in R1 register where R2 contains 1.
R1 XOR R2 = 0100 0111
Shift: Shift operation is used for transfer of bits either to the left or to the right. It can
be used to realize simple arithmetic operation or data communication/recognition etc.
Shift operation is of three types:
1. Logical shifts LOGICAL SHIFT LEFT and LOGICAL SHIFT RIGHT insert
zeros to the end bit position and the other bits of a word are shifted left or right
respectively. The end bit position is the leftmost bit for shift right and the
rightmost bit position for the shift left. The bit shifted out is lost.
The arithmetic left shift and a logical left shift when performed on numbers
represented in two’s complement notation cause multiplication by 2 when there is
no overflow. Arithmetic shift right corresponds to a division by 2 provided there
is no underflow.
3. Circular shifts ROTATE LEFT and ROTATE RIGHT. Bits shifted out at one
end of the word are not lost as in a logical shift but are circulated back into
the other end.
These instructions specify conditions for altering the sequence of program execution
or in other words the content of PC (program counter) register. PC points to memory
location that holds the next instruction to be executed. The change in value of PC as a
result of execution of control instruction like BRANCH or JUMP causes a break in
the sequential execution of instructions. The most common control instructions are:
12
Instruction Set
Architecture
0FFF MBR ! 0
1000 X ! 2001
1001 READ X
1002 BRZ 1007
1003 ADD MBR
1004 TRAS MBR
Unconditional 1005 INC X Conditional Branch
Branch 1006 JUMP 1001
1007 :
:
:
:
2001 10
2002 20
2003 30
2004 0
1st Cycle:
1001 (with location X = 2001 which is value 10) " 1002 " 1003 "
1004"1005 (X is incremented to 2002)" 1006
2nd Cycle
1001 (with X = 2002 which is 20) " 1002 " 1003 " 1004 " 1005 (X
is 2003) " 1006
3rd Cycle
1001 (with X = 2003 which is 30) " 1002 " 1003 "1004 " 1005 (X is
2004) " 1006
4th Cycle
1001 (with X = 2004 which is 0) " 1002 [AC contains zero so take a
branch to 1007]
The SKIP instruction is a zero-address instruction and skips the next instruction
to be executed in sequence. In other words, it increments the value of PC by one
instruction length. The SKIP can also be conditional. For example, the instruction
ISZ skips the next instruction only if the result of the most recent operation is
zero.
CALL and RETN are used for CALLing subprograms and RETurning from
them. Assume that a memory stack has been built such that stack pointer points to
a non-empty location stack and expand towards zero address.
13
The Central
Processing Unit
CALL:
CALL X Procedure Call to function /procedure named X
CALL instruction causes the following to happen:
1. Decrement the stack pointer so that we will not overwrite last thing put on
stack,
(SP ! SP – 1)
2. The contents of PC, which is pointing to NEXT instruction, the one just after the
CALL is pushed onto the stack, and, M [SP] !PC.
3. JMP to X, the address of the start of the subprogram is put in the PC register; this
is all a jump does. Thus, we go off to the subprogram, but we have to remember
where we were in the calling program, i.e. we must remember where we came
from, so that we can get back there again.
PC ! X
RETN :
14
Instruction Set
RETN Return from procedure. Architecture
RETN instruction causes the following to happen:
1. Pops the stack, to yield an address/label; if correctly used, the top of the
stack will contain the address of the next instruction after the call from
which we are returning; it is this instruction with which we want to resume
in the calling program;
2. Jump to the popped address, i.e., put the address into the PC register.
PC ! top of stack value; Increment SP.
Most computer instructions are divided into two categories, privileged and non-
privileged. A process running in privileged mode can execute all instructions from the
instruction set while a process running in user mode can only execute a sub-set of the
instructions. I/O instructions are one example of privileged instruction, clock
interrupts are another one.
! Operand Storage in the CPU - Where are the operands kept other than the
memory?
! Number of explicitly named operands - How many operands are named in an
instruction?
! Operand location - Can any ALU instruction operand be located in memory? Or
must all operands be kept internally in the CPU registers?
! Operations - What operations are provided in the ISA?
! Type and size of operands - What is the type and size of each operand and how
is it specified?
As far as operations and type of operands are concerned, we have already discussed
about these in the previous subsection. In this section let us look into some of the
architectures that are common in contemporary computer. But before we discuss the
architectures, let us look into some basic instruction set characteristics:
15
The Central
Processing Unit
The three most common types of ISAs are:
1. Evaluation Stack: The operands are implicitly on top of the stack.
2. Accumulator: One operand is implicitly the accumulator.
3. General Purpose Register (GPR): All operands are explicit, either registers or
memory locations.
ADD // operator POP operand(s) and PUSH result(s) (implicit on top of stack)
POP C
! Small instructions (do not need many bits to specify the operation).
! Compiler is easy to write.
! Lots of memory accesses required - everything that is not on the stack is in
memory. Thus, the machine performance is poor.
! Registers can be used to store variables as it reduces memory traffic and speeds
up execution. It also improves code density, as register names are shorter than
memory addresses.
! Instructions must include bits to specify which register to operate on, hence
large instruction size than accumulator type machines.
! Memory access can be minimized (registers can hold lots of intermediate
values).
! Implementation is complicated, as compiler writer has to attempt to maximize
register usage.
While most early machines used stack or accumulator architectures, in the last 15
years all CPUs made are GPR processors. The three major reasons are that registers
are faster than memory; the more data that can be kept internally in the CPU the faster
the program will run. The third reason is that registers are easier for a compiler to use.
But while CPU’s with GPR were clearly better than previous stack and accumulator
based CPU’s yet they were lacking in several areas. The areas being: Instructions
were of varying length from 1 byte to 6-8 bytes. This causes problems with the pre-
fetching and pipelining of instructions. ALU instructions could have operands that
were memory locations because the time to access memory is slower and so does the
whole instruction.
Thus in the early 1980s the idea of RISC was introduced. RISC stands for Reduced
Instruction Set Computer. Unlike CISC, this ISA uses fewer instructions with simple
constructs so they can be executed much faster within the CPU without having to use
memory as often. The first RISC CPU, the MIPS 2000, has 32 GPRs. MIPS is a
load/store architecture, which means that only load and store instructions access
memory. All other computational instructions operate only on values stored in
registers.
17
The Central
Processing Unit Check Your Progress 2
1. Match the following pairs:
(a) Zero address instruction (i) Accumulator machines
(b) One address instruction (ii) General Purpose Register machine
(c) Three address instruction (iii) Evaluation-Stack machine
But, why addressing schemes? The question of addressing is concerned with how
operands are interpreted. In other words, the term ‘addressing schemes’ refers to the
mechanism employed for specifying operands. There are a multitude of addressing
schemes and instruction formats. Selecting which schemes are available will impact
not only the ease to write the compiler, but will also determine how efficient the
architecture can be?
All computers employ more than one addressing schemes to give programming
flexibility to the user by providing facilities such as pointers to memory, loop control,
indexing of data, program relocation and to reduce the number of bits in the operand
field of the instruction. Offering a variety of addressing modes can help reduce
instruction counts but having more modes also increases the complexity of the
machine and in turn may increase the average Cycles per Instruction (CPI). Before we
discuss the addressing modes let us discuss the notations being used in this section.
In the description that follows the symbols A, A1, A2 ...... etc. denote the content of
an operand field. Thus, Ai may refer to a data or a memory address. In case the
operand field is a register address, then the symbols R, R1, R2,... etc., are used. If C
denotes the contents (either of an operand field or a register or of a memory location),
then (C) denotes the content of the memory location whose address is C.
What is a virtual address? von Neumann had suggested that the execution of a
program is possible only if the program and data are residing in memory. In such a
situation the program length along with data and other space needed for execution
cannot exceed the total memory. However, it was found that at the time of execution,
the complete portion of data and instruction is not needed as most of the time only few
areas of the program are being referenced. Keeping this in mind a new idea was put
18
Instruction Set
forward where only a required portion is kept in the memory while the rest of the Architecture
program and data reside in secondary storage. The data or program portion which are
stored on secondary storage are brought to memory whenever needed and the portion
of memory which is not needed is returned to the secondary storage. Thus, a program
size bigger than the actual physical memory can be executed on that machine. This is
called virtual memory. Virtual memory has been discussed in greater details as part of
the operating system.
! they are longer than the physical addresses as total addressed memory in virtual
memory is more than the actual physical memory.
! if a virtual addressed operand is not in the memory then the operating system
brings that operand to the memory.
The symbols D, D1, D2,..., etc. refer to actual operands to be used by instructions for
their execution.
Most of the machines employ a set of addressing modes. In this unit, we will describe
some very common addressing modes employed in most of the machines. A specific
addressing mode example, however, is given in Unit 1 of Block 4.
Addressing Modes
In general not all of the above modes are used for all applications. However, some of
the common areas where compilers of high-level languages use them are:
Main Memory
Instruction
LOAD (I) 07
Opcode
Addressing mode Operand value
(immediate)
LOAD D 500
200
:
……0111
500
! This scheme provides a limited address space because if the address field has n
bits then memory space would contain 2n memory words or locations. For
example, for the example machine of Figure 1, the direct addresses memory
space would be 210.
20
Instruction Set
! The effective address in this scheme is defined as the address of the operand, Architecture
that is,
EA ! A and (EA in the above example will be 500)
D = (EA) (D in the above example will be 7)
The second statement implies that the data is stored in the memory location
specified by effective address.
! In this addressing scheme only one memory reference is required to fetch the
operand.
LOAD I 500
500 50 A
50 A …..0111
! In this addressing scheme the effective address EA and the contents of the
operand field are related as:
EA = (A) and (Content of location 500 that is 50A above)
D = (EA) (Contents of location 50A that is 7)
! The drawback of this scheme is that it requires two memory references to fetch
the actual operand. The first memory reference is to fetch the actual address of
the operand from the memory and the second to fetch the actual operand using
that address.
! In this scheme the word length determines the size of addressable space, as the
actual address is stored in a Word. For example, the memory having a word size
of 32 bits can have 232 indirect addresses.
21
The Central
Processing Unit
! Register access is faster than memory access and hence register addressing
results in faster instruction execution. However, register obtains operands only
from memory; therefore, the operands that should be kept in registers are
selected carefully and efficiently. For example, if an operand is moved into a
register and processed only once and then returned to memory, then no saving
occurs. However if an operand is used repeatedly after bringing into register
then we have saved few memory references. Thus, the task of using register
efficiently deals with the task of finding what operand values should be kept in
registers such that memory references are minimised. Normally, this task is
done by a compiler of a high level language while translating the program to
machine language. As a thumb rule the frequently used local variables are kept
in the registers.
! The size of register address is smaller than the memory address. It reduces the
instruction size. For example, for a machine having 32 general purpose registers
only 5 bits are needed to address a register.
The address capability of register indirect addressing scheme is determined by the size
of the register.
For example, to address of an element B[i] of an array B[1], B[2],....B[n], with each
element of the array stored in two consecutive locations, and the starting address of
the array is assumed to be 101, the operand field A in the instruction shall contain the
number 101 and the index register R will contain the value of the expression
(i - 1) × 2.
Thus, for the first element of the array the index register will contain 0. For addressing
5th element of the array, the A=101 whereas index register will contain:
(5- 1) × 2 = 8
Therefore, the address of the 5th element of array B is=101+8=109. In B[5], however,
the element will be stored in location 109 and 110. To address any other element of
the array, changing the content of the index register will suffice.
As the index register is used for iterative applications, therefore, the value of index
register is incremented or decremented after each reference to it. In several systems
this operation is performed automatically during the course of an instruction cycle.
This feature is known as auto-indexing. Auto indexing can be auto-incrementing or
auto-decrementing. The choice of register to be used as an index register differs from
machine to machine. Some machines employ general-purpose registers for this
purpose while other machines may specify special purpose registers referred to as
index registers.
The contents of the base register may be changed in the privileged mode only. No user
is allowed to change the contents of the base register. The base-addressing scheme
provides protection of users from one another.
Example 1: What would be the effective address and operand value for the following
LOAD instructions:
24
Instruction Set
The values are shown in the following table: Architecture
(iii) Indirect addressing requires fewer memory accesses than that of direct
addressing.
25
The Central
Processing Unit
1.5 INSTRUCTION SET AND FORMAT DESIGN
ISSUES
Some of the basic issues of concerns for instruction set design are:
Completeness: For an initial design, the primary concern is that the instruction set
should be complete which means there is no missing functionality, that is, it should
include instructions for the basic operations that can be used for creating any possible
execution and control operation.
Orthogonal: The secondary concern is that the instructions be orthogonal, that is, not
unnecessarily redundant. For example, integer operation and floating number
operation usually are not considered as redundant but different addressing modes may
be redundant when there are more instructions than necessary because the CPU takes
longer to decode.
An instruction format is used to define the layout of the bits allocated to these
elements of instructions. In addition, the instruction format explicitly or implicitly
indicates the addressing modes used for each operand in that instruction.
Basic Tardeoff: Smaller instruction (less space) Versus desire for more powerful
instruction repertoire.
However, a 32 bit instruction although will occupy double the space and can be
fetched at double the rate of a 16 bit instruction, but can not be doubly useful.
Some of the factors that are considered for selection of addressing bits:
! Number of addressing modes: The more are the explicit addressing modes the
more bits are needed for mode selection. However, some machines have implicit
modes of addressing.
! ` : As far as memory references are concerned, granularity implies whether an
address is referencing a byte or a word at a time. This is more relevant for
machines, which have 16 bits, 32 bits and higher bits words. Byte addressing
although may be better for character manipulation, however, requires more bits in
an address. For example, memory of 4K words (1 word = 16 bit) is to be
addressed directly then it requires:
WORD Addressing = 4K words
= 212 words
" 12 bits are required for word addressing.
An important aspect about these variables length instructions is: “The CPU is not
aware about the length of next instruction which is to be fetched”. This problem can
be handled if each instruction fetch is made equal to the size of the longest instruction.
Thus, sometimes in a single fetch multiple instructions can be fetched.
MIPS 2000
27
The Central
Processing Unit
Let’s consider the instruction format of a MIPS computer. MIPS is an acronym for
Microprocessor without Interlocked Pipeline Stages. It is a microprocessor
architecture developed by MIPS Computer Systems Inc. most widely known for
developing the MIPS architecture. The MIPS CPU family was one of the most
successful and flexible CPU designs throughout the 1990s. The MIPS CPU has a five-
stage CPU pipeline to execute multiple instructions at the same time. Now what we
have introduced is a new term Pipelining. What else: the 5 stage pipeline, let us just
introduce it here. It defines the 5 steps of execution of instructions that may be
performed in an overlapped fashion. The following diagram will elaborate this
concept:
Instruction 1 stage 2 4 5
11 3
Instruction 2 1 2 3 4 5
Instruction 3 4
1 2 3 5
Figure15: Pipeline
! All the stages are independent and distinct, that is, the second stage execution of
Instruction 1 should not hinder Instruction 2.
! The overall efficiency of the system becomes better.
The early MIPS architectures had 32-bit instructions and later versions have 64-bit
implementations.
The first commercial MIPS CPU model, the R2000, whose instruction format is
discussed below, has thirty-two 32-bit registers and its instructions are 32 bits long.
op rs rt rd shamt funct
6 bits 5 bits 5 bits 5 bits 5 bits 5 bits
All MIPS instructions are of the same length, requiring different kinds of instruction
formats for different types of instructions.
Instruction Format
All MIPS instructions are of the same size and are 32 bits long. MIPS designers chose
to keep all instructions of the same length, thereby requiring different kinds of
instruction formats for different kinds of instructions. For example, R-type (register)
or R-format is used for arithmetic instructions (Figure 16). A second type of
28
Instruction Set
instruction format is called i-type or i-format and is used by the data transfer Architecture
instructions.
Instruction format of I-type instructions is given below:
op rs rt address
6 bits 5 bits 5 bits 16 bits
The 16-bit address means a load word instruction can load any word within a region
of + 215 of the base register rs. Consider a load word instruction given below:
The rt field specifies the destination register, which receives the result of the load.
(i) Instruction length should normally be equal to data bus length or multiple
of it.
(vi) Large number of operations can be provided in the instruction set, which
have variable-lengths of instructions.
1.7 SUMMARY
In this unit, we have explained various concepts relating to instructions. We have
discussed the significance of instruction set, various elements of an instruction,
instruction set design issues, different types of ISAs, various types of instructions and
various operations performed by the instructions, various addressing schemes. We
have also provided you the instruction format of MIPS machine. Block 4 Unit 1
contains a detailed instruction set of 8086 machine. You can refer to further reading
for instruction set of various machines.
3. (i) True.
(ii) False.
(iii) False.
(iv) False
30
Registers, Micro-
operations and
Introduction
Execution
UNIT 2 REGISTERS, MICRO-OPERATIONS
AND INSTRUCTION EXECUTION
Structure Page
No.
2.0 Introduction 31
2.1 Objectives 31
2.2 Basic CPU Structure 32
2.3 Register Organization 34
2.3.1 Programmer Visible Registers
2.3.2 Status and Control Registers
2.4 General Registers in a Processor 37
2.5 Micro-operation Concepts 38
2.5.1 Register Transfer Micro-operations
2.5.2 Arithmetic Micro-operations
2.5.3 Logic Micro-operations
2.5.4 Shift Micro-operations
2.6 Instruction Execution and Micro-operations 45
2.7 Instruction Pipelining 49
2.8 Summary 50
2.9 Solutions/ Answers 51
2.0 INTRODUCTION
The main task performed by the CPU is the execution of instructions. In the previous
unit, we have discussed about the instruction set of computer system. But, one thing,
which remained unanswered is: how these instructions will be executed by the CPU?
The above question can be broken down into two simpler questions. These are:
What are the steps required for the execution of an instruction? How are these steps
performed by the CPU?
The answer to the first question lies in the fact that each instruction execution
consists of several steps. Together they constitute an instruction cycle. A micro-
operation is the smallest operation performed by the CPU. These operations put
together execute an instruction.
For answering the second question, we must have an understanding of the basic
structure of a computer. As discussed earlier, the CPU consists of an Arithmetic
Logic Unit, the control unit and operational registers. We will be discussing the
register organisation in this unit, whereas the arithmetic-logic unit and control unit
organisation are discussed in subsequent units.
In this unit we will first discuss the basic CPU structure and the register organisation
in general. This is followed by a discussion on micro-operations and their
implementation. The discussion on micro-operations will gradually lead us towards
the discussion of a very simple ALU structure. The detail of ALU structure is the
topic of the next unit.
2.1 OBJECTIVES
After going through this unit, you should be able to:
31
The Central
Processing Unit
2. An arithmetic and logic unit (ALU) for performing data manipulation, and
3. A control unit that coordinates and controls the various operations and initiates
the appropriate sequence of micro-operations for each task.
Computer instructions are normally stored in consecutive memory locations and are
executed in sequence one by one. The control unit allows reading of an instruction
from a specific address in memory and executes it with the help of ALU and
Register.
2. It is decoded by the control unit and converted into a set of lower level control
signals, which cause the functions specified by that instruction to be executed.
3. After the completion of execution of the current instruction, the next instruction
fetched is the next instruction in sequence.
This process is repeated for every instruction except for program control instructions,
like branch, jump or exception instructions. In this case the next instruction to be
fetched from memory is taken from the part of memory specified by the instruction,
rather than being the next instruction in sequence.
32
Registers, Micro-
operations and
Introduction
Execution
But how do the registers help in instruction execution? We will discuss this with the
help of Figure 1.
Step 1:
The first step of instruction execution is to fetch the instruction that is to be executed.
To do so we require:
In Step 2:
! Get the data of memory location B to buffer register for data (DR) using
buffer address register (MAR) by issuing Memory read operation.
! This data may be stored in a general purpose register, if so needed let us say
R2
! Now, ALU will perform addition of R1 & R2 under the command of
control unit and the result will be put back in R1. The status of ALU
33
The Central
Processing Unit
operation for example result in zero/non zero, overflow/no overflow etc. is
recorded in the status register.
! Similarly, the other instructions are fetched and executed using ALU and
register under the control of the Control Unit.
Thus, for describing instruction execution, we must describe the registers layout,
micro-operations, ALU design and finally the control unit organization. We will
discuss registers and micro- operation in this unit. ALU and Control Unit are
described in Unit 3 and Unit 4 of this Block.
! All von-Neumann machines have a program counter (PC) (or instruction counter
IC), which is a register that contains the address of the next instruction to be
executed.
! Most computers use special registers to hold the instruction(s) currently being
executed. They are called instruction register (IR).
! There are a number of general-purpose registers. With these three kinds of
registers, a computer would be able to execute programs.
! Other types of registers:
A few factors to consider when choosing the number of registers in a CPU are:
! CPU can access registers faster then it can access main memory.
! For addressing a register, depending on the number of addressable registers a
few bit addresses is needed in an instruction. These address bits are definetly
quite less in comparison to a memory address. For example, for addressing 256
registers you just need 8 bits, whereas, the common memory size of 1MB
requires 20 address bits, a difference of 60%.
! Compilers tend to use a small number of registers because large numbers of
registers are very difficult to use effectively. A general good number of registers
is 32 in a general machine.
! Registers are more expensive than memory but far less in number.
From a user’s point of view the register set can be classified under two basic
categories.
Status Control and Registers: These registers cannot be used by the programmers
but are used to control the CPU or the execution of a program.
Different vendors have used some of these registers interchangeably; therefore, you
should not stick to these definitions rigidly. Yet this categorization will help in better
34
Registers, Micro-
operations and
Introduction
understanding of register sets of machine. Therefore, let us discuss more about these Execution
categories.
The general-purpose registers as the name suggests can be used for various functions.
For example, they may contain operands or can be used for calculation of address of
operand etc. However, in order to simplify the task of programmers and computers
dedicated registers can be used. For example, registers may be dedicated to floating
point operations. One such common dedication may be the data and address registers.
The data registers are used only for storing intermediate results or data and not for
operand address calculation.
One of the basic issues with register design is the number of general-purpose registers
or data and address registers to be provided in a microprocessor. The number of
registers also affects the instruction design as the number of registers determines the
number of bits needed in an instruction to specify a register reference. In general, it
has been found that the optimum number of registers in a CPU is in the range 16 to
32. In case registers fall below the range then more memory reference per instruction
on an average will be needed, as some of the intermediate results then have to be
stored in the memory. On the other hand, if the number of registers goes above 32,
then there is no appreciable reduction in memory references. However, in some
computers hundreds of registers are used. These systems have special characteristics.
These are called Reduced Instruction Set Computers (RISC) and they exhibit this
property. RISC computers are discussed later in this unit.
What is the importance of having less memory references? As the time required for
memory reference is more than that of a register reference, therefore the increased
number of memory references results in slower execution of a program.
35
The Central
Processing Unit
For control of various operations several registers are used. These registers cannot be
used in data manipulation; however, the content of some of these registers can be
used by the programmer. One of the control registers for a von-Neumann machine is
the Program Counter (PC).
Almost all the CPUs, as discussed earlier, have a status register, a part of which may
be programmer visible. A register which may be formed by condition codes is called
condition code register. Some of the commonly used flags or condition codes in such
a register may be:
Flag Comments
Sign flag This indicates whether the sign of previous arithmetic operation
was positive (0) or negative (1).
Zero flag This flag bit will be set if the result of the last arithmetic
operation was zero.
Carry flag This flag is set, if a carry results from the addition of the highest
order bits or borrow is taken on subtraction of highest order bit.
Equal flag This bit flag will be set if a logic comparison operation finds
out that both of its operands are equal.
Overflow flag This flag is used to indicate the condition of arithmetic overflow.
Interrupt This flag is used for enabling or disabling interrupts. Enable/
disable flag.
Supervisor flag This flag is used in certain computers to determine whether
the CPU is executing in supervisor or user mode. In case the
CPU is in supervisor mode it will be allowed to execute certain
privileged instructions.
These flags are set by the CPU hardware while performing an operation. For
example, an addition operation may set the overflow flag or on a division by 0 the
overflow flag can be set etc. These codes may be tested by a program for a typical
conditional branch operation. The condition codes are collected in one or more
registers. RISC machines have several sets of conditional code bits. In these
machines an instruction specifies the set of condition codes which is to be used.
Independent sets of condition code enable the provisions of having parallelism within
the instruction execution unit.
The flag register is often known as Program Status Word (PSW). It contains
condition code plus other status information. There can be several other status and
control registers such as interrupt vector register in the machines using vectored
interrupt, stack pointer if a stack is used to implement subroutine calls, etc.
2. A machine has 20 general-purpose registers. How many bits will be needed for
register address of this machine?
..............................................................................................................................
..............................................................................................................................
..............................................................................................................................
36
Registers, Micro-
operations and
Introduction
3. What is the advantage of having independent set of conditional codes? Execution
..............................................................................................................................
..............................................................................................................................
..............................................................................................................................
3. Can we store status and control information in the memory?
..............................................................................................................................
..............................................................................................................................
..............................................................................................................................
Let us now look into an example register set of MIPS processor.
MIPS register names begin with a $. There are two naming conventions:
! By number:
$0 $1 $2 … $31
Not all of these are general-purpose registers. The following table describes how each
general register is treated, and the actions you can take with each register.
Register
Name Description Specify in Expression
number
37
The Central
Processing Unit
You will also study another 8086 based register organization in Block 4 of this
course. So, all the computers have a number of registers. But, how exactly is the
instruction execution related to registers? To explore this concept, let us first discuss
the concept of Micro-operations.
! Move data from memory location “sum” to register R1 (LOAD R1, sum)
! Add an immediate operand to register (R1) and store the results in R1
(ADD R1, 7)
! Store data from register R1 to memory location “sum” (STORE sum, R1).
Thus, several machine instructions may be needed (this will vary from machine to
machine) to execute a simple C statement. But, how will each of these machine
statements be executed with the help of micro-operations? Let us try to elaborate the
execution steps:
38
Registers, Micro-
operations and
Introduction
Execution
Thus, we may have to execute the instruction in several steps. For the subsequent
discussion, for simplicity, let us assume that each micro-operation can be completed
in one clock period, although some micro-operations require memory read/write that
may take more time.
Let us first discuss the type of micro-operations. The most common micro-operations
performed in a digital computer can be classified into four categories:
1) Register transfer micro-operations: simply transfer binary information from one
register to another.
2) Arithmetic micro-operations: perform simple arithmetic operations on numeric
data stored in registers.
3) Logic micro-operations: perform bit manipulation (logic) operations on non-
numeric data stored in registers.
4) Shift micro-operations registers: perform shift operations on data stored in
registers.
! For a register transfer micro-operation there must be a path for data transfer from
the output of the source register to the input of destination register.
! In addition, the destination register should have a parallel load capability, as we
expect the register transfer to occur in a predetermined control condition. We
will discuss more about the control unit in Unit 4 of this block.
! A common path for connecting various registers is through a common internal
data bus of the processor. In general the size of this data bus should be equal to
the number of bits in a general register.
2. The individual bits within a register are numbered from 0 (rightmost bit) to n-1
(leftmost bit) as shown in Figure 2b). Common ways of drawing the block
diagram of a computer register are shown below. The name of the 16-bit register
is IR (Instruction Register) which is partitioned into two subfields in Figure 2d).
Bits 0 through 7 are assigned the symbol L (for Low byte) and bits 8 through 15
are assigned the symbol H (for high byte). The symbol IR (L) refers to the low-
order byte and IR (H) refers to high-order byte.
a) Register b) Individual bits
R0 15 14 13 ……………2 1 0
15 0 15 8 7 0
R1 IR (H) IR (L)
39
The Central
Processing Unit
denotes a transfer of all bits from the source register R1 to the destination
register R2 during one clock pulse and the destination register has a parallel load
capacity. However, the contents of register R1 remain unchanged after the
register transfer micro-operation. More than one transfer can be shown using a
comma operator.
4. If the transfer is to occur only under a predetermined control condition, then this
condition can be specified as a control function. For example, if P is a control
function then P is a Boolean variable that can have a value of 0 or 1. It is
terminated by a colon (:) and placed in front of the actual transfer statement. The
operation specified in the statement takes place only when P = 1. Consider the
statements:
If (P =1) then (R2 ! R1)
or,
P: R2 ! R1,
5. All micro-operations written on a single line are to be executed at the same time
provided the statements or a group of statements to be implemented together are
free of conflict. A conflict occurs if two different contents are being transferred
to a single register at the same time. For example, the statement: new line X:
R1! R2, R1! R3 represents a conflict because both R2 and R3 are trying to
transfer their contents to R1 at the same time.
Bits
R2
Timing Diagram
Clock t ^ t+1 ^
Load
It is assumed that the control variable is synchronized with the same clock as the one
applied to the register. The control function T is activated by the rising edge of the
clock pulse at time t. Even though the control variable T becomes active just after
time t, the actual transfer does not occur until the register is triggered by the next
positive transition of the clock at time t+1. At time t+1, load input is again active and
the data inputs of R2 are then loaded into the register R1 in parallel. The transfer
occurs with every clock pulse transition while T remains active.
40
Registers, Micro-
operations and
Introduction
(consists of a group of wires) one for each bit of a register, over which information is Execution
transferred, from any of several sources to any of several destinations.
R1 $ BUS,
The content of the selected register is placed on the BUS, and the content of the bus
is loaded into register R1 by activating its load control input.
Memory Transfer
The transfer of information from memory to outside world i.e., I/O Interface is called
a read operation. The transfer of new information to be stored in memory is called a
write operation. These kinds of transfers are achieved via a system bus. It is
necessary to supply the address of the memory location for memory transfer
operations.
Memory Read
The memory unit receives the address from a register, called the memory address
register designated by MAR. The data is transferred to another register, called the
data register designated by DR. The read operation can be stated as:
Read: DR ! [MAR]
Memory Write
The memory write operation transfers the content of a data register to a memory word
M selected by the address. Assume that the data of register R1 is to be written to the
memory at the address provided in MAR. The write operation can be stated as:
Write: [MAR] ! R1
Please note, it means that the location pointed by MAR will be written and not MAR.
Read
MAR MEMORY
Write
DR
It means that the contents of register R1 are added to the contents of register R2 and
the sum is transferred to register R3. This operation requires three registers to hold
data along with the Binary Adder circuit in the ALU. Binary adder is a digital circuit
41
The Central
Processing Unit
that generates the arithmetic sum of two binary numbers of any lengths and is
constructed with full-adder circuits connected in cascade. An n-bit binary adder
requires n full-adders. Add micro-operation, in accumulator machine, can be
performed as:
AC ! AC + DR
R3 ! R1 % R2
R3 ! R1 + (2’s complement of R2)
R3 ! R1 + (1’s complement of R2 + 1)
R3 ! R1 + R2 + 1 (The bar on top of R2 implies 1’s complement of R2 which
is bitwise complement)
Adding 1 to the 1’s complement produces the 2’s complement. Adding the contents
of R1 to the 2’s complement of R2 is equivalent to subtracting the contents of R2
from R1 and storing the result in R3. We will describe the basic circuit required for
these micro-operations in the next unit.
What about the multiply and division operations? Are not they micro-operations? In
most of the older computers multiply and divisions were implemented using
add/subtract and shift micro-operations. If a digital system has implemented division
and multiplication by means of combinational circuits, then we can call these as the
micro-operations for that system.
Some of the common logic micro-operations are AND, OR, NOT or Complement,
Exclusive OR, NOR, and NAND. In many computers only four: AND, OR, XOR
(exclusive OR) and complement micro-operations are implemented.
Let us now discuss how these four micro-operations can be used in implementing
some of the important applications of manipulation of bits of a word, such as,
changing some bit values or deleting a group of bits. We are assuming that the result
42
Registers, Micro-
operations and
Introduction
of logic micro-operations go back to Register R1 and R2 contains the second Execution
operand.
We will play a trick with the manipulations we are performing. Let us select 1010 as
4 bit data for register R1, and 1100 data for register R2. Why? Because if you see the
bit combinations of R2, and R1, they represent the truth table entries (read from right
to left and bottom to top) 00, 01, 10 and 11. Thus, the resultant of the logical
operation on them will indicate which logic micro-operation is needed to be
performed for that data manipulation. The following table gives details on some of
these operations:
R1 1 0 1 0
R2 1 1 0 0
43
The Central
Processing Unit
Step 2: Insert the bit most 0011 then:
using OR micro- 0011 1011 (R1 before)
operation with the 0000 1111 (R2 for masking)
bits which are to be Perform AND operation
inserted. (mask)
0000 1011 (R1 after)
Now insert: 01100000 (R2 for insertion)
Perform OR operation
0110 1011 R1 after insert
Clear Clear all the bits R1 = 1101
R2 = 1101
0000
Implemented by taking exclusive OR with
the same number. The exclusive OR, thus,
can also be used for checking whether two
numbers are equal or not.
! logical
! arithmetic and
! circular.
In logical shift the data entering by serial input to left most or right most flip-flop
(depending on right or left shift operations respectively) is a 0.
If we connect the serial output of a shift register to its serial input then we encounter a
circular shift. In circular shift left or circular shift right information is not lost, but is
circulated.
In arithmetic shift a signed binary number is shifted to the left or to the right. Thus,
an arithmetic shift-left causes a number to be multiplied by 2, on the other hand a
shift-right causes a division by 2. But as in division or multiplication by 2 the sign of
a number should not be changed, therefore, arithmetic shift must leave the sign bit
unchanged. We have already discussed about shift operations in the Unit 1.
44
Registers, Micro-
operations and
Introduction
! Logical Execution
! Arithmetic
! Circular
4. What are the differences between circular and logical shift micro-operations?
……………………………………………………………………………………
……………………………………………………………………………………
……………………………………………………………………………………
………
Instruction fetch: In this phase the instruction is brought from the address pointed
by PC to instruction register. The steps required are:
45
The Central
Processing Unit
It may take more than one clock pulses depending on
the tcpu and tmem) The PC is incremented by one memory
word length to point to the next instruction in sequence.
This micro-operation can be carried out in parallel to
the micro-operation above.
The instruction so obtained is transferred from data IR ! DR
register to the Instruction register for further processing.
(Register Transfer)
Instruction Decode: This phase is performed under the control of the Control Unit of
the computer. The Control Unit determines the operation that is to be performed and
the addressing mode of the data. In our example, the addressing modes can be direct
or indirect.
Thus, the address portion of IR now contains the effective address, which is the direct
address of the operand.
Execution: Now the instruction is ready for execution. A different opcode will
require different sequence of steps for the execution. Therefore, let us discuss a few
examples of execution of some simple instructions for the purpose of identifying
some of the steps needed during instruction execution. Let us start the discussions
with a simple case of addition instruction. Suppose, we have an instruction: Add R1,
A which adds the content of memory location A to R1 register storing the result in
R1. This instruction will be executed in the following steps:
46
Registers, Micro-
operations and
Introduction
Now, let us try a complex instruction - a conditional jump instruction. Suppose an Execution
instruction:
INCSKIP A
increments A and skips the next instruction if the content of A has become zero. This
is a complex instruction and requires intermediate decision-making. The micro
operations required for this instruction execution are:
47
The Central
Processing Unit
operation. (Memory write)
Increment the PC as it contains the first PC ! PC + 1
location of subroutine, which is used to store
the return address. The first instruction of
subroutine starts from the next location.
(Increment)
Thus, the number of steps required in execution may differ from instruction to
instruction.
After completing the above interrupt processing, CPU will fetch the next instruction
that may be interrupt service program instruction. Thus, during this time CPU might
be doing the interrupt processing or executing the user program. Please note each
instruction of interrupt service program is executed as an instruction in an instruction
cycle.
Please note for a complex machine the instruction cycle will not be as easy as this.
You can refer to further readings for more complex instruction cycles.
Time Slot - 1 2 3 4 5 6 7 8 9 10 11
>
Instruction IF ID OF EX SR
48
Registers, Micro-
operations and
Introduction
1 Execution
Instruction IF ID OF EX SR
2
Instruction IF ID OF EX SR
3
Instruction IF ID OF EX SR
4
Instruction IF ID OF EX SR
5
Instruction IF ID OF EX SR
6
Instruction IF ID OF EX SR
7
! The pipeline stages are like steps. Thus, a step of the pipeline is to be complete
in a time slot. The size of the time slot will be governed by the stage taking
maximum time. Thus, if the time taken in various stages is almost similar, we
get the best results.
! The first instruction execution is completed on completion of 5th time slot, but
afterwards, in each time slot the next instruction gets executed. So, in ideal
conditions one instruction is executed in the pipeline in each time slot.
! Please note that after the 5th time slot and afterwards the pipe is full. In the 5th
time slot the stages of execution of five instructions are:
SR (instruction 1) (Requires memory reference)
EX (instruction 2) (No memory reference)
OF (instruction 3) (Requires memory reference)
ID (instruction 4) (No memory reference)
IF (instruction 5) (Requires memory reference)
! On the 5th time slot and later, there may be a register or memory conflict in the
instructions that are performing memory and register references that is various
stages may refer to same registers/memory location. This will result in slower
execution instruction pipeline that is one of the higher number instruction has to
wait till the lower number instructions completed, effectively pushing the whole
pipelining by one time slot.
How can we minimize the problems occurring due to the branch instructions?
We can use many mechanisms that may minimize the effect of branch penalty.
49
The Central
Processing Unit
3) Interrupt cycle results only in jumping to an interrupt service routine. The actual
processing of the instructions of this routine is performed in instruction cycle.
2.8 SUMMARY
In this unit, we have discussed in detail the register organisation and a simple
structure of the CPU. After this we have discussed in details the micro-operations and
their implementation in hardware using simple logical circuits. While discussing
micro-operations our main emphasis was on simple arithmetic, logic and shift micro-
operations, in addition to register transfer and memory transfer. The knowledge you
have acquired about register sets and conditional codes, helps us in giving us an idea
that conditional micro-operations can be implemented by simply checking flags and
conditional codes. This idea will be clearer after we go through Unit 3 and Unit 4.
We have completed the discussions on this unit, with providing a simple approach of
instruction execution with micro-operations. We have also defined the concepts of
Instruction Pipeline. We will be using this approach for discussing control unit details
in Unit 3 and Unit 4. The following table gives the details of various terms used in
this unit.
General purpose registers These registers are used for any address
or data computation / storage
Status and control register Stores the various condition codes
You will also get the details on 8086 microprocessor register sets, conditional codes,
instructions etc. in Unit 1 of Block 4.
You can refer to further readings for more register organisation examples and for
more details on micro-operations and instruction execution.
50
Registers, Micro-
operations and
Introduction
Execution
1. Registers, which are used only for the calculation of operand addresses, are
called address registers.
2. 5 bits
3. It helps in implementing parallelism in the instruction execution unit.
4. Yes. Normally, the first few hundreds of words of memory are allocated for
storing control information.
4. The bits circulate and after a complete cycle the data is still intact in circular
shift. Not so in logical shift.
51
ALU Organisation
3.0 INTRODUCTION
By now we have discussed the instruction sets and register organisation followed by a
discussion on micro-operations and instruction execution. In this unit, we will first
discuss the ALU organisation. Then we will discuss the floating point ALU and
arithmetic co-processors, which are commonly used for floating point computations.
3.1 OBJECTIVES
After going through this unit, you will be able to:
53
The Central
Processing Unit
Bus
Parallel Adder :
and other Logic Control Unit :
Control Circuits
Flags
Signals
The above structure has three registers AC, MQ and DR for data storage. Let us
assume that they are equal to one word each. Please note that the Parallel adders and
other logic circuits (these are the arithmetic, logic circuits) have two inputs and only
one output in this diagram. It implies that any ALU operation at most can have two
input values and will generate single output along with the other status bits. In the
present case the two inputs are AC and DR registers, while output is AC register. AC
and MQ registers are generally used as a single AC.MQ register. This register is
capable of left or right shift operations. Some of the micro-operations that can be
defined on this ALU are:
Addition : AC ! AC + DR
Subtraction : AC ! AC – DR
AND : AC ! AC ^ DR
OR : AC ! AC v DR
Exclusive OR : AC ! AC (+) DR
NOT : AC ! AC
In this ALU organisation multiplication and division were implemented using shift-
add/subtract operations. The MQ (Multiplier-Quotient register) is a special register
used for implementation of multiplication and division. We are not giving the details
of how this register can be used for implementing multiplication and division
algorithms. For more details on these algorithms please refer to further readings. One
such algorithm is Booth’s algorithm and you must refer to it in further readings.
54
ALU Organisation
DR is another important register, which is used for storing second operand. In fact it
acts as a buffer register, which stores the data brought from the memory for an
instruction. In machines where we have general purpose registers any of the registers
can be utilized as AC, MQ and DR.
The basic advantage of such ALUs is that these ALUs can be constructed for a
desired word size. More details on bit-slice ALUs can be obtained from further
readings.
A digital computer has many registers, and rather than connecting wires between all
registers to transfer information between them, a common bus is used. Bus is a path
(consists of a group of wires) one for each bit of a register, over which information is
transferred, from any of several sources to any of several destinations. In general the
size of this data bus should be equal to the number of bits in a general purpose
register.
A register is selected for the transfer of data through bus with the help of control
signals. The common data transfer path, that is the bus, is made using the
multiplexers. The select lines are connected to the control inputs of the multiplexers
and the bits of one register are chosen thus allowing multiplexers to select a specific
source register for data transfer.
The construction of a bus system for four registers using 4×1 multiplexers is shown
below. Each register has four bits, numbered 0 through 3. Each multiplexer has 4 data
inputs, numbered 0 through 3, and two control or selection lines, C0 and C1. The data
inputs of 0th MUX are connected to the corresponding 0th input of every register to
form four lines of the bus. The 0th multiplexer multiplexes the four 0th bits of the
registers, and similarly for the three other multiplexers.
Since the same selection lines C0 and C1 are connected to all multiplexers, therefore
they choose the four bits of one register and transfer them into the four-line common
bus.
0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3
55
The Central
Processing Unit
0 1 2 3 0 1 2 3 0 1 2 3
0 1 2 3
4×1 4×1 4×1
4×1
MUX 0 MUX 2 MUX 3
MUX 1
C0
C1
When C1 C0 = 00, the 0th data input of all multiplexers are selected and this causes the
bus lines to receive the content of register A since the outputs of register A are
connected to the 0th data inputs of the multiplexers which is then applied to the output
that forms the bus. Similarly, when C1 C0 = 01, register B is selected, and so on. The
following table shows the register that is selected for each of the four possible values
of the selection lines:
C1 C0 Register Selected
0 0 A
0 1 B
1 0 C
1 1 D
To construct a bus for 8 registers of 16 bits each, you would require 16 multiplexers,
one for each line in the bus. The number of multiplexers needed to construct the bus
is equal to the number of bits in each register. Each multiplexer must have eight data
input lines and three selection lines (2 3 = 8) to multiplex one bit in the eight registers.
56
ALU Organisation
The diagram of a 4-bit arithmetic circuit has four 4×1 multiplexers and four full
adders (FA). Please note that the FULL ADDER is a circuit that can add two input
bits and a carry-in bit to produce one sum-bit and a carry-out-bit.
So what does the adder do? It just adds three bits. What does the multiplexer do? It
controls one of the input bits. Thus, such combination produces a series of micro-
operations.
Let us find out how the multiplexer control lines will change one of the Inputs for
Adder circuit. Please refer to the following table. (Please note the convention VALID
ONLY FOR THE TABLE are that an uppercase alphabet indicates a Data Word,
whereas the lowercase alphabet indicates a bit.)
57
The Central
Processing Unit
S1 S0 MUX(a) MUX(b) MUX(c) MUX(d) Adder
The data word B
0 0 b0 b1 b2 b3 B is input to Full
Adders
1’s complement
0 1 b0 b1 b2 b3 B of B is input to
Full Adders
Data word 0 is
1 0 0 0 0 0 0 input to Full
Adders
Data word 1111
1 1 1 1 1 1 FH = FH is input to
Full Adders
Now let us discuss how by coupling carry bit (Cin) with these input bits we can obtain
various micro-operations.
Input to Circuits
! Register A bits as a0, a1, a2 and a3 in the corresponding X bits of the Full Adder
(FA).
! Please note each bit of register A and register B is fed to different full adder
unit.
! Please also note that each of the four inputs from A are applied to the X inputs
of the binary adder and each of the four inputs from B are connected to the data
inputs of the multiplexers. It means that the A input directly goes to adder but
B input can be manipulated through the Multiplexer to create a number of
different input values as given in the figure above. The B inputs through
multiplexers are controlled by two selection lines S1 and S0. Thus, using various
combinations of S1 and S0 we can select data bits of B, complement of B, 0
word, or word having All 1’s.
! The input carry Cin, which can be equal to 0 or 1, goes to the carry input of the
full adder in the least significant position. The other carries are cascaded from
one stage to the next. Logically it is the same as that of addition performed by
us. We do pass the carry of lower digits addition to higher digits. The output of
the binary adder is determined from the following arithmetic sum:
D = X + Y + Cin
OR
D = A + Y + Cin
By controlling the value of Y with the two selection lines S1 and S0 and making Cin
equal to 0 or 1, it is possible to implement the eight arithmetic micro-operations listed
in the truth table.
58
ALU Organisation
When S1S0 = 00, input line B is enabled and its value is applied to the Y inputs of the
full adder. Now,
If input carry Cin = 0, the output will be D = A + B
If input carry Cin = 1, the output will be D = A + B + 1.
When S1S0 = 01, the complement of B is applied to the Y inputs of the full adder. So
If Cin = 1, then output D = A +"B + 1. This is called subtract micro-operation. (Why?)
Reason: Please observe the following example, where A = 0111 and B=0110, then
B =1001. The sum will be calculated as:
0111 (Value of A)
1001 ( Complement of B)
1 0000 + (Carry in =1) = 0001
Ignore the carry out bit. Thus, we get simple subtract operation.
0111 (Value of A)
1001 ( Complement of B)
1 0000 + (Carry in =0) = 0000
D = A + "B
=> D = (A – 1) + ("B + 1)
=> D = (A – 1) + 2’s complement of B
=> D = (A – 1) – B Thus, is the name complement with Borrow
When S1S2 = 10, input value 0 is applied to Y inputs of the full adder.
If Cin = 0, then output D = A + 0 + Cin => D = A
If Cin = 1, then D = A + 0 +1 => D = A + 1
The first is a simple data transfer micro-operation; while the second is an increment
micro-operation.
When S1S2 = 11, input word all 1’s is applied to Y inputs of the full adder.
If Cin = 0, then output D = A + All (1s) + Cin => D = A – 1 (How? Let us
explain with the help of the following example).
59
The Central
Processing Unit
Example: Let us assume that the Register A is of 4 bits and contains the value 0101
and it is added to an all (1) value as:
0101
1111
1 0100
The 1 is carry out and is discarded. Thus, on addition with all (1’s) the number has
actually got decremented by one.
If Cin = 1, then D = A + All(1s) +1 => D = A
The first is the decrement micro-operation; while the second is a data transfer micro-
operation.
Please note that the micro-operation D = A is generated twice, so there are only seven
distinct micro-operations possible through the proposed arithmetic circuit.
Please note that in the figure above the micro-operations are derived by replacing the
x and y of Boolean function with registers R1 and R2 on each corresponding bit of
the registers R1 and R2. Each of these bits will be treated as binary variables.
In many computers only four: AND, OR, XOR (exclusive OR) and complement
micro-operations are implemented. The other 12 micro-operations can be derived
60
ALU Organisation
from these four micro-operations. Figure 8 shows one bit, which is the ith bit stage of
the four logic operations. Please note that the circuit consists of 4 gates and a 4 × 1
MUX. The ith bits of Register R1 and R2 are passed through the circuit. On the basis
of selection inputs S0 and S1 the desired micro-operation is obtained.
Please note that in this figure we have given reference to two previous figures for
arithmetic and logic circuits. This stage of ALU has two data inputs; the ith bits of the
registers to be manipulated. However, the (i – 1)th or (i+1)th bit is also fed for the case
of shift micro-operation of only one register. There are four selection lines, which
determine what micro-operation (arithmetic, logic or shift) on the input. The Fi is the
resultant bit after desired micro-operation. Let us see how the value of Fi changes on
the basis of the four select inputs. This is shown in Figure 10:
Please note that in Figure 10 arithmetic micro-operations have both S3 and S2 bits as
zero. Input Ci is important for only arithmetic micro-operations. For logic micro-
operations S3, S2 values are 01. The values 10 and 11 cause shift micro-operations.
61
The Central
Processing Unit
For this shift micro-operation S1 and S0 values and Ci values do not play any role.
S3 S2 S1 S0 Ci F Micro- Name
operation
0 0 0 0 0 F=x R!R1 Transfer
0 0 0 0 1 F = x+1 R!R1+1 Increment
0 0 0 1 0 F = x+y R!R1+R2 Addition
0 0 0 1 1 F = x+y+1 R!R1+R2+1 Addition Arithmetic
with carry Micro-operation
0 0 1 0 0 F = x+ y R!R1+ R 2 Subtract
with borrow
0 0 1 0 1 F = x+( y +1) R!R1 – R2 Subtract
0 0 1 1 0 F=x–1 R!R1 – 1 Decrement
0 0 1 1 1 F=x R!R1 Transfer
The questions in this regard are: “What is an arithmetic processor?” and, “What is the
need for arithmetic processors?”
A typical CPU needs most of the control and data processing hardware for
implementing non-arithmetic functions. As the hardware costs are directly related to
chip area, a floating point circuit being complex in nature is costly to implement.
They need not be included in the instruction set of a CPU. In such systems, floating-
point operations were implemented by using software routines.
Two mechanisms are used for connecting the arithmetic processor to the CPU.
62
ALU Organisation
On the other hand if the arithmetic processor has a register and instruction set which
can be considered an extension of the CPU registers and instruction set, then it is
called a tightly coupled processor. Here the CPU reserves a special subset of code for
arithmetic processor. In such a system the instructions meant for arithmetic processor
are fetched by CPU and decoded jointly by CPU and the arithmetic processor, and
finally executed by arithmetic processor. Thus, these processors can be considered a
logical extension of the CPU. Such attached arithmetic processors are termed as co-
processors.
The concept of co-processor existed in the 8086 machine till Intel 486 machines
where co-processor was separate. However, Pentium at present does not have a
separate co-processor. Similarly, peripheral processors are not found as arithmetic
processors in general. However, many chips are used for specialized I/O architecture.
These can be found in further readings.
3.4 SUMMARY
In this unit, we have discussed in detail the hardware implementation of micro-
operations. The unit starts with an implementation of bus, which is the backbone for
any register transfer operation. This is followed by a discussion on arithmetic circuit
and micro-operation thereon using full adder circuits. The logic micro-operation
implementation has also been discussed. Thus, leading to a logical construction of a
simple arithmetic – logic –shift unit. The unit revolves around the basic ALU with the
help of the units that are constructed for the implementation of micro-operations.
In the later part of the unit, we discussed the arithmetic processors. Finally, we have
presented a few chipsets that support the working of a processor for input/output
functions from key board, printer etc.
63
The Central
Processing Unit
1. The diagram is the same as that of Figure 9.
2. Arithmetic processor performs arithmetic computation. These are support
processors to a computer.
64
The Control Unit
UNIT 4 THE CONTROL UNIT
Structure Page No.
4.0 Introduction 65
4.1 Objectives 65
4.2 The Control Unit 65
4.3 The Hardwired Control 71
4.4 Wilkes Control 72
4.5 The Micro-Programmed Control 74
4.6 The Micro-Instructions 75
4.6.1 Types of Micro-Instructions
4.6.2 Control Memory Organisation
4.6.3 Micro-Instruction Formats
4.7 The Execution of Micro-Program 78
4.8 Summary 81
4.9 Solutions/ Answers 81
4.0 INTRODUCTION
By now we have discussed instruction sets and register organisation followed by a
discussion on micro-operations and a simple arithmetic logic unit circuit. We have
also discussed the floating point ALU and arithmetic processors, which are commonly
used for floating point computations.
In this unit we are going to discuss the functions of a control unit, its structure
followed by the hardwired type of control unit. We will discuss the micro-
programmed control unit, which are quite popular in modern computers because of
flexibility in designing. We will start the discussion with several definitions about the
unit followed by Wilkes control unit. Finally, we will discuss the concepts involved in
micro-instruction execution.
4.1 OBJECTIVES
After going through this unit you will be able to:
65
The Central
Processing Unit
! making ALU to perform a particular operation on the data
! regulating other internal operations.
But how does a control unit control the above operations? What are the functional
requirements of the control unit? What is its structure? Let us explore answers of these
questions in the next sections.
! The Arithmetic Logic Unit (ALU), which performs the basic arithmetic and
logical operations.
! Registers which are used for information storage within the CPU.
! Internal Data Paths: These paths are useful for moving the data between two
registers or between a register and ALU.
! External Data Paths: The roles of these data paths are normally to link the CPU
registers with the memory or I/O interfaces. This role is normally fulfilled by the
system bus.
! The Control Unit: This causes all the operations to happen in the CPU.
The basic responsibility of the control unit lies in the fact that the control unit must be
able to guide the various components of CPU to perform a specific sequence of micro-
operations to achieve the execution of an instruction.
What are the functions, which a control unit performs to make an instruction
execution feasible? The instruction execution is achieved by executing micro-
operations in a specific sequence. For different instructions this sequence may be
different. Thus the control unit must perform two basic functions:
But how are these two tasks achieved? The control unit generates control signals,
which in turn are responsible for achieving the above two tasks. But, how are these
control signals generated? We will answer this question in later sections. First let us
discuss a simple structure of control unit.
66
The Control Unit
Structure of Control Unit
A control unit has a set of input values on the basis of which it produces an output
control signal, which in turn performs micro-operations. These output signals control
the execution of a program. A general model of control unit is shown in Figure 1.
In the model given above the control unit is a black box, which has certain inputs and
outputs.
! Flags: Flags are used by the control unit for determining the status of the CPU &
the outcomes of a previous ALU operation. For example, a zero flag if set
conveys to control unit that for instruction ISZ (skip the next instruction if zero
flag is set) the next instruction is to be skipped. For such a case control unit cause
increment of PC by program instruction length, thus skipping next instruction.
! Control Signals from Control Bus: Some of the control signals are provided to
the control unit through the control bus. These signals are issued from outside the
CPU. Some of these signals are interrupt signals and acknowledgement signals.
On the basis of the input signals the control unit activates certain output control
signals, which in turn are responsible for the execution of an instruction. These output
control signals are:
! Control signals, which are required within the CPU: These control signals
cause two types of micro-operations, viz., for data transfer from one register to
another; and for performing an arithmetic, logic and shift operation using ALU.
! Control signals to control bus: These control signals transfer data from or to
CPU register to or from memory or I/O interface. These control signals are
issued on the control bus to activate a data path on the data / address bus etc.
67
The Central
Processing Unit
Now, let us discuss the requirements from such a unit. A prime requirement for
control unit is that it must know how all the instructions will be executed. It should
also know about the nature of the results and the indication of possible errors. All this
is achieved with the help of flags, op-codes, clock and some control signals to itself.
A control unit contains a clock portion that provides clock-pulses. This clock signal is
used for measuring the timing of the micro-operations. In general, the timing signals
from control unit are kept sufficiently long to accommodate the proportional delays of
signals within the CPU along various data paths. Since within the same instruction
cycle different control signals are generated at different times for performing different
micro-operations, therefore a counter can be utilised with the clock to keep the count.
However, at the end of each instruction cycle the counter should be reset to the initial
condition. Thus, the clock to the control unit must provide counted timing signals.
Examples, of the functionality of control units along with timing diagrams are given
in further readings.
How are these control signals applied to achieve the particular operation? The
control signals are applied directly as the binary inputs to the logic gates of the logic
circuits. All these inputs are the control signals, which are applied to select a circuit
(for example, select or enable input) or a path (for example, multiplexers) or any other
operation in the logic circuits.
Let us revisit the micro-operations described in Unit 2 to discuss how the events of
any instruction cycle can be described as a sequence of such micro-operations.
The fetch cycle consists of four micro-operations that are executed in three timing
steps. The fetch cycle can be written as:
T1 : MAR ! PC
T2 : MBR ! [MAR]
PC ! PC + I
T3 : IR ! MBR
where I is the instruction length. We assume that a clock is available for timing
purposes and that it emits regularly spaced clock pulses. Each clock pulse defines a
time unit. Thus, all the units are of equal duration. Each micro-operation can be
performed within the time of a single time unit. The notation (T1, T2, T3) represents
successive time units. What is done in these time units?
! In the second time unit the contents of memory location specified by MAR is
moved to MBR and the contents of the PC is incremented by I.
! In the third time unit the content of MBR is moved to IR.
68
The Control Unit
Once an instruction is fetched, the next step is to fetch the operands. Considering the
same example as of Unit 2, the instruction may have direct and indirect addressing
modes. An indirect address is handled using indirect cycle. The following micro-
operations are required in the indirect cycle:
T1 : MAR ! IR (address)
T2 : MBR ! [MAR]
T3 : IR (address) ! MBR (address)
The MAR is loaded with the address field of IR register. Then the memory is read to
fetch the address of operand, which is transferred to the address field of IR through
MBR as data is received in MBR during the read operation.
Thus, the IR now is in the same state as of direct address, viz., as if indirect addressing
had not been used. IR is now ready for the execute cycle.
This is not true of the execute cycle. For a machine with N different opcodes, there are
N different sequences of micro-operations that can occur. Let us consider some
hypothetical instructions:
An add instruction that adds the contents of memory location X to Register R1 with
R1 storing the result:
ADD R1, X
At the beginning of the execute cycle IR contains the ADD instruction and its direct
operand address (memory location X). At time T1, the address portion of the IR is
transferred to the MAR. At T2 the referenced memory location is read into MBR
Finally, at T3 the contents of R1 and MBR are added by the ALU.
Please note that for this machine we have assumed that MBR can be incremented by
ALU directly.
69
The Central
Processing Unit
The PC is incremented if MBR contains 0. This test and action can be implemented as
one micro-operation. Note also that this micro-operation can be performed during the
same time unit during which the updated value in MBR is stored back to memory.
Such instructions are useful in implementing looping.
On completion of the execute cycle the current instruction execution gets completed.
At this point a test is made to determine whether any enabled interrupts have occurred.
If so, the interrupt cycle is performed. This cycle does not execute an interrupt but
causes start of execution of Interrupt Service Program (ISR). Please note that ISR is
executed as just another program instruction cycle. The nature of this cycle varies
greatly from one machine to another. A typical sequence of micro-operations of the
interrupt cycle are:
T1 : MBR ! PC
T2 : MAR ! Save-Address
PC ! ISR- Address
T3 : [MAR] ! MBR
At time T1, the contents of the PC are transferred to the MBR, so that they can be
saved for return from the interrupt. At time T2 the MAR is loaded with the address at
which the contents of the PC are to be saved, and PC is loaded with the address of the
start of the interrupt-servicing routine. At time T3 MBR, which contains the old value
of the PC, is stored in the memory. The processor is now ready to begin the next
instruction cycle.
The instruction cycle for this given machine consists of four cycles. Assume a 2-bit
instruction cycle code (ICC). The ICC can represent the state of the processor in terms
of cycle. For example, we can use:
00 : Fetch
01 : Indirect
10 : Execute
11 : Interrupt
At the end of each of the four cycles, the ICC is set appropriately. Please note that an
indirect cycle is always followed by the execute cycle and the interrupt cycle is
always followed by the fetch cycle. For both the execute and fetch cycles, the next
cycle depends on the state of the system. Let us show an instruction execution using
timing diagram and instruction cycles:
70
The Control Unit
Please note that the address line determine the location of memory. Read/ write signal
controls whether the data is being input or output. For example, at time T2 in M2 the
read control signal becomes active, A9 – A0 input contains MAR that value is kept
enabled on address bits and the data lines are enabled to accept data from RAM, thus
enabling a typical RAM data output on the data bus.
For reading no data input is applied by CPU but it is put on data bus by memory after
the read control signal to memory is activated. Write operation is activated along with
data bus carrying the output value.
This diagram is used for illustration of timing and control. However, more
information on these topics can be obtained from further readings.
71
The Central
Processing Unit
A decoder will have n binary inputs and 2n binary outputs. Each of these 2n different
input patterns will activate a single unique output line.
The clock portion of the control unit issues a repetitive sequence of pulses for the SS
duration of micro-operation(s). These timing signals control the sequence of execution
of instruction and determine what control signal needs to applied at what time for
instruction execution.
The control memory in Wilkes control is organized, as a PLA’s like matrix made of
diodes. This is partial matrix and consists of two components, the control signals and
the address of the next micro-instruction. The register I contains the address of the
next micro-instruction that is one step of instruction execution, for example T1 in M1
or T2 in M2 etc. as in Figure 2. On decoding the control signals are generated that
cause execution of micro-operation(s) of that step. In addition, the control unit
indicates the address of the next micro-operation which gets loaded through register II
to register I. Register I can also be loaded by register II and “enable IR input” control
signal. This will pass the address of first micro-instruction of execute cycle. During a
machine cycle one row of the matrix is activated. The first part of the row generates
the control signals that control the operations of the processor. The second part
generates the address of the row to be selected in the next machine cycle.
At the beginning of the cycle, the address of the row to be selected is contained in
register I. This address is the input to the decoder, which is activated by a clock pulse.
This activates the row of the control matrix. The two-register arrangement is needed,
as the decoder is a combinational circuit; with only one register, the output would
become the input during a cycle. This may be an unstable condition due to repetitive
loop.
73
The Central
Processing Unit
4.5 THE MICRO-PROGRAMMED CONTROL
An alternative to a hardwired control unit is a micro-programmed control unit, in
which the logic of the control unit is specified by a micro-program. A micro-program
is also called firmware (midway between the hardware and the software). It consists
of:
(a) One or more micro-operations to be executed; and
(b) The information about the micro-instruction to be executed next.
The micro-instructions are stored in the control memory. The address register for the
control memory contains the address of the next instruction that is to be read. The
control memory Buffer Register receives the micro-instruction that has been read. A
micro-instruction execution primarily involves the generation of desired control
signals and signals used to determine the next micro-instruction to be executed. The
sequencing logic section loads the control memory address register. It also issues a
read command to control memory. The following functions are performed by the
micro-programmed control unit:
1. The sequence logic unit specifies the address of the control memory word that is
to be read, in the Address Register of the Control Memory. It also issues the
READ signal.
2. The desired control memory word is read into control memory Buffer Register.
74
The Control Unit
3. The content of the control memory buffer register is decoded to create control
signals and next-address information for the sequencing logic unit.
4. The sequencing logic unit finds the address of the next control word on the basis
of the next-address information from the decoder and the ALU flags.
As we have discussed earlier, the execute cycle steps of micro-operations are different
for all instructions in addition the addressing mode may be different. All such
information generally is dependent on the opcode of the instruction Register (IR).
Thus, IR input to Address Register for Control Memory is desirable. Thus, there exist
a decoder from IR to Address Register for control memory. (Refer Figure 5). This
decoder translates the opcode of the IR into a control memory address.
3. What will be the control signals and address of the next micro-instruction in the
Wilkes control example of Figure 4, if the entry address for a machine instruction
selects the last but one (branching control line) and the conditional bit value for
branch is true?
..................................................................................................................................
......................................................................................................................………
……………………………………………………………………………………..
NON -ZERO: (Microcode which may set flags if desired indicating the branch has
not taken place).
Branch to interrupt or fetch cycle. (For Next- Instruction Cycle)
In a vertical micro-instruction many similar control signals can be encoded into a few
micro-instruction bits. For example, for 16 ALU operations, which may require 16
individual control bits in horizontal micro-instruction, only 4 encoded bits are needed
in vertical micro-instruction. Similarly, in a vertical micro-instruction only 3 bits are
needed to select one of the eight registers. However, these encoded bits need to be
passed from the respective decoders to get the individual control signals. This is
shown in figure 7(b).
77
The Central
Processing Unit
In general, a horizontal control unit is faster, yet requires wider instruction words,
whereas vertical control units, although; require a decoder, are shorter in length. Most
of the systems use neither purely horizontal nor purely vertical micro-instructions
figure 7(c).
Since we are dealing with binary control signals, therefore, a ‘N’ bit micro-instruction
can represent 2N combinations of control signals.
Unencoded micro-instructions
! One bit is needed for each control signal; therefore, the number of bits required
in a micro-instruction is high.
! It presents a detailed hardware view, as control signal need can be determined.
! Since each of the control signals can be controlled individually, therefore these
micro-instructions are difficult to program. However, concurrency can be
exploited easily.
! Almost no control logic is needed to decode the instruction as there is one to
one mapping of control signals to a bit of micro-instruction. Thus, execution of
micro-instruction and hence the micro-program is faster.
! The unencoded micro-instruction aims at optimising the performance of a
machine.
78
The Control Unit
In most of the cases, the design is kept between the two extremes. The LSI 11 (highly
encoded) and IBM 3033 (unencoded) control units are close examples of these two
approaches.
! Organize the format into independent fields. That is, each field depicts a set of
actions such that actions from different fields can occur simultaneously.
! Define each field such that the alternative actions that can be specified by the
field are mutually exclusive. That is, only one of the actions specified for a
given field could occur at a time.
Another aspect of encoding is whether it is direct or indirect (Figure 8). With indirect
encoding, one field is used to determine the interpretation of another field.
A detailed discussion on these topics is beyond this unit. You must refer to further
readings for more detailed information on Micro-programmed Control Unit Design.
79
The Central
Processing Unit Figure (a):
Figure (b):
Figure 8: Micro-instruction Encoding
f) Status bits supplied from ALU to sequencing logic have no role to play
with the sequencing of micro-instruction.
80
The Control Unit
4. Compare and contrast unencoded and highly encoded micro-instructions.
...............................................................................................................................
...............................................................................................................................
..............................................................................................................…………
…
4.8 SUMMARY
In this unit we have discussed the organization of control units. Hardwired, Wilkes
and micro-programmed control units are also discussed. The key to such control units
are micro-instruction, which can be briefly (that is types and formats) described in this
unit. The function of a micro-programmed unit, that is, micro-programmed execution,
has also been discussed. The control unit is the key for the optimised performance of a
computer. The information given in this unit can be further appended by going
through further readings.
3. Wilkes control typically has one address field. However, for a conditional
branching micro-instruction, it contains two addresses. The Wilkes control, in
fact, is a hardware representation of a micro-programmed control unit.
4.
Unencoded Micro instructions Highly encoded
! Large number of bits Relatively less bits
! Difficult to program Easy to program
! No decoding logic Need decoding logic
81
The Central
Processing Unit ! Optimizes machine Optimizes programming effort
performances Aggregated view
! Detailed hardware view
82
Reduced Instruction
Set Computer
UNIT 5 REDUCED INSTRUCTION SET Architecture
COMPUTER ARCHITECTURE
Structure Page No.
5.0 Introduction 83
5.1 Objectives 83
5.2 Introduction to RISC 83
5.2.1 Importance of RISC Processors
5.2.2 Reasons for Increased Complexity
5.2.3 High Level Language Program Characteristics
5.3 RISC Architecture 88
5.4 The Use of Large Register File 90
5.5 Comments on RISC 93
5.6 RISC Pipelining 94
5.7 Summary 98
5.8 Solutions/ Answers 98
5.0 INTRODUCTION
In the previous units, we have discussed the instruction set, register organization and
pipelining, and control unit organization. The trend of those years was to have a large
instruction set, a large number of addressing modes and about 16 –32 registers.
However, their existed a pool of thought which was in favour of having simplicity in
instruction set. This logic was mainly based on the type of the programs, which were
being written for various machines. This led to the development of a new type of
computers called Reduced Instruction Set Computer (RISC). In this unit, we will
discuss about the RISC machines. Our emphasis will be on discussing the basic
principles of RISC and its pipeline. We will also discuss the arithmetic and logic units
here.
5.1 OBJECTIVES
After going through this unit you should be able to:
83
The Central
Processing Unit
If we review the history of computer families, we find that the most common
architectural change is the trend towards even more complex machines.
This further enhances the processing capabilities of the RISC processor. It also
necessitates that the memory to register “LOAD” and “STORE” are independent
instructions.
SPARC Processors
Sun 4/100 series, Sun 4/310 SPARCserver 310, Sun 4/330 SPARCserver 330, Sun
4/350 SPARCserver 350, Sun 4/360 SPARCserver 360, Sun 4/370 SPARCserver 370,
Sun 4/20, SPARCstation SLC, Sun 4/40 SPARCstation IPC, Sun 4/75, SPARCstation
2.
PowerPC Processors
MPC603, MPC740, MPC750, MPC755, MPC7400/7410, MPC745x, MPC7450,
MPC8240, MPC8245.
However, this assumption is not very valid in the present era where the Main memory
is supported with Cache technology. Cache memories have reduced the difference
between the CPU and the memory speed and, therefore, an instruction execution
through a subroutine step may not be that difficult.
84
Reduced Instruction
Let us explain it with the help of an example: Set Computer
Architecture
Suppose the floating point operation ADD A, B requires the following steps
(assuming the machine does not have floating point registers) and the registers being
used for exponent are E1, E2, and EO (output); for mantissa M1, M2 and MO
(output):
If all these steps are coded as one machine instruction, then this simple instruction will
require many instruction execution cycles. If this instruction is made as part of the
machine instruction set as: ADDF A,B (Add floating point numbers A & B and store
results in A) then it will just be a single machine instruction. All the above steps
required will then be coded with the help of micro-operations in the form of Control
Unit Micro-Program. Thus, just one instruction cycle (although a long one) may be
needed. This cycle will require just one instruction fetch. Whereas in the program
memory instructions will be fetched.
However, faster cache memory for Instruction and data stored in registers can create
an almost similar instruction execution environment. Pipelining can further enhance
such speed. Thus, creating an instruction as above may not result in faster execution.
It is considered that the control unit of a computer be constructed using two ways;
create micro-program that execute micro-instructions or build circuits for each
instruction execution. Micro-programmed control allows the implementation of
complex architectures more cost effective than hardwired control as the cost to expand
an instruction set is very small, only a few more micro-instructions for the control
store. Thus, it may be reasoned that moving subroutines like string editing, integer to
floating point number conversion and mathematical evaluations such as polynomial
evaluation to control unit micro-program is more cost effective.
The smaller programs are advantageous because they require smaller RAM space.
However, today memory is very inexpensive, this potential advantage today is not so
compelling. More important, small programs should improve performance. How?
Fewer instructions mean fewer instruction bytes to be fetched.
However, the problem with this reasoning is that it is not certain that a CISC program
will be smaller than the corresponding RISC program. In many cases CISC program
expressed in symbolic machine language may be smaller but the number of bits of
machine code program may not be noticeably smaller. This may result from the
reason that in RISC we use register addressing and less instruction, which require
fewer bits in general. In addition, the compilers on CISCs often favour simpler
instructions, so that the conciseness of complex instruction seldom comes into play.
! The Complex Instruction is: Add C, A, B having 16 bit addresses and 8 bit data
operands
! All the operands are direct memory reference operands
! The machine has 16 registers. So the size of a register address is = 24 = 16 = 4
bits.
! The machine uses an 8-bit opcode.
8 4 16
Load rA A
8 16 16 16 Load rB B
Add C A B Add rC rA rB
Store rC C
Memory-to-Memory Register-to-Register
Instruction size (I) = 56 bits I = 104 bits
Data Size (D) = 24 bits D = 24bits
Total Memory Load (M) = 80 bits M = 128 bits
8 4 16
Load rA A
8 16 16 16 Load rB B
Add C A B Add rC rB rA
Add A C D Load rD D
Sub D D B Add rA rC Rd
Sub rD rD rB
Store rD D
Memory-to-Memory Register-to-Register
Instruction size (I) = 168 bits I = 172 bits
Data Size (D) = 72 bits D = 32bits
Total Memory Load (M) = 240 bits M = 204 bits
However, even though the instructions that were closer to the high level languages
were implemented in Complex Instruction Set Computers (CISCs), still it was hard to
exploit these instructions since the compilers were needed to find those conditions that
exactly fit those constructs. In addition, the task of optimising the generated code to
minimise code size, reduce instruction execution count, and enhance pipelining is
much more difficult with such a complex instruction set.
Another motivation for increasingly complex instruction sets was that the complex
HLL operation would execute more quickly as a single machine instruction rather
than as a series of more primitive instructions. However, because of the bias of
programmers towards the use of simpler instructions, it may turn out otherwise. CISC
makes the more complex control unit with larger microprogram control store to
accommodate a richer instruction set. This increases the execution time for simpler
instructions.
Thus, it is far from clear that the trend to complex instruction sets is appropriate. This
has led a number of groups to pursue the opposite path.
GOTO FEW
Others 1-5%
Observations
b) CISC yields smaller programs than RISC, which improves its performance;
therefore, it is very superior to RISC.
88
Reduced Instruction
5. Compiler should simplify instructions rather than generate complex instructions. Set Computer
RISC compilers try to remove as much work as possible during compile time so Architecture
that simple instructions can be used. For example, RISC compilers try to keep
operands in registers so that simple register-to-register instructions can be used.
RISC compilers keep operands that will be reused in registers, rather than
repeating a memory access or a calculation. They, therefore, use LOADs and
STOREs to access memory so that operands are not implicitly discarded after
being fetched. (Refer to Figure 1(b)).
! One instruction per cycle: A machine cycle is the time taken to fetch two
operands from registers, perform the ALU operation on them and store the
result in a register. Thus, RISC instruction execution takes about the same time
as the micro-instructions on CISC machines. With such simple instruction
execution rather than micro-instructions, it can use fast logic circuits for control
unit, thus increasing the execution efficiency further.
Thus, RISC is potentially a very strong architecture. It has high performance potential
and can support VLSI implementation. Let us discuss these points in more detail.
89
The Central
Processing Unit ! VLSI Implementation of Control Unit: A major potential benefit of RISC is
the VLSI implementation of microprocessor. The VLSI Technology has
reduced the delays of transfer of information among CPU components that
resulted in a microprocessor. The delays across chips are higher than delay
within a chip; thus, it may be a good idea to have the rare functions built on a
separate chip. RISC chips are designed with this consideration. In general, a
typical microprocessor dedicates about half of its area to the control store in a
micro-programmed control unit. The RISC chip devotes only about 6% of its
area to the control unit. Another related issue is the time taken to design and
implement a processor. A VLSI processor is difficult to develop, as the designer
must perform circuit design, layout, and modeling at the device level. With
reduced instruction set architecture, this processor is far easier to build.
In general, the register storage is faster than the main memory and the cache. Also the
register addressing uses much shorter addresses than the addresses for main memory
and the cache. However, the numbers of registers in a machine are less as generally
the same chip contains the ALU and control unit. Thus, a strategy is needed that will
optimize the register use and, thus, allow the most frequently accessed operands to be
kept in registers in order to minimize register-memory operations.
On the face of it the use of a large set of registers should lead to fewer memory
accesses, however in general about 32 registers were considered optimum. So how
does this large register file further optimize the program execution?
Since most operand references are to local variables of a function in C they are the
obvious choice for storing in registers. Some registers can also be used for global
variables. However, the problem here is that the program follows function call - return
so the local variables are related to most recent local function, in addition this call -
return expects saving the context of calling program and return address. This also
requires parameter passing on call. On return, from a call the variables of the calling
program must be restored and the results must be passed back to the calling program.
RISC register file provides a support for such call- returns with the help of register
windows. Register files are broken into multiple small sets of registers and assigned to
a different function. A function call automatically changes each of these sets. The use
from one fixed size window of registers to another, rather than saving registers in
memory as done in CISC. Windows for adjacent procedures are overlapped. This
feature allows parameter passing without moving the variables at all. The following
figure tries to explain this concept:
Assumptions:
Register file contains 138 registers. Let them be called by register number 0 – 137.
The diagram shows the use of registers: when there is call to function A (fA) which
calls function B (fB) and function B calls function C (fC).
90
Reduced Instruction
Registers Nos. Used for Set Computer
0–9 Global variables Architecture
required by fA, fB, and Function A Function B Function C
fC
10 – 83 Unused
84 – 89 Used by parameters of Temporary
(6 Registers) fC that may be passed variables of
to next call function C
90 – 99 Used for local variable Local
(10 Registers) of fC variables of
function C
100 – 105 Used by parameters Temporary Parameters
(6 Registers) that were passed from variables of of function
fB " fC function B C
106 – 115 Local variables of fB Local
(10 Registers) variables of
function B
116 – 121 Parameters that were Temporary Parameters
(6 Registers) passed from fA to fB variables of of function
function A B
122 – 131 Local variable of fA Local
(10 Registers) variables of
function A
132 – 138 Parameter passed to fA Parameters
(6 Registers) of function
A
Figure 3: Use of three Overlapped Register Windows
Please note the functioning of the registers: at any point of time the global registers
and only one window of registers is visible and is addressable as if it were the only set
of registers. Thus, for programming purpose there may be only 32 registers. Window
in the above example although has a total of 138 registers. This window consists of:
But what is the maximum function calls nesting can be allowed through RISC? Let us
describe it with the help of a circular buffer diagram, technically the registers as above
have to be circular in the call return hierarchy.
This organization is shown in the following figure. The register buffer is filled as
function A called function B, function B called function C, function C called function
D. The function D is the current function. The current window pointer (CWP) points
to the register window of the most recent function (function D in this case). Any
register references by a machine instruction is added with the contents of this pointer
to determine the actual physical registers. On the other hand the saved window
pointer identifies the window most recently saved in memory. This action will be
needed if a further call is made and there is no space for that call. If function D now
calls function E arguments for function E are placed in D’s temporary registers
indicated by D temp and the CWP is advanced by one window.
91
The Central
Processing Unit
If function E now makes a call to function F, the call cannot be made with the current
status of the buffer, unless we free space equivalent to exactly one window. This
condition can easily be determined as current window pointer on incrementing will be
equal to saved window pointer. Now, we need to create space; how can we do it? The
simplest way will be to swap FA register to memory and use that space. Thus, an N
window register file can support N –1 level of function calls.
Thus, the register file, organized in the form as above, is a small fast register buffer
that holds most of the variables that are likely to be used heavily. From this point of
view the register file acts almost like a cache memory.
All but one point above basically show comparative equality. The basic difference is
due to addressing overhead of the two approaches.
The following figure shows the difference. Small register (R) address is added with
current window Pointer W#. This generates the address in register file, which is
decoded by decoder for register access. On the other hand Cache reference will be
generated from a long memory address, which first goes through comparison logic to
ascertain the presence of data, and if the data is present it goes through the select
circuit. Thus, for simple variables access register file is superior to cache memory.
92
Reduced Instruction
However, even in RISC computer, performance can be enhanced by the addition of Set Computer
instruction cache. Architecture
a. RISC has a large register file so that more variables can be stored in register
or longer periods of time.
d. Cache is superior to a large register file as it stores most recently used local
scalars.
93
The Central
Processing Unit
CISCs provide better support for high-level languages as they include high-level
language constructs such as CASE, CALL etc.
Yes CISC architecture tries to narrow the gap between assembly and High Level
Language (HLL); however, this support comes at a cost. In fact the support can be
measured as the inverse of the costs of using typical HLL constructs on a particular
machine. If the architect provides a feature that looks like the HLL construct but runs
slowly, or has many options, the compiler writer may omit the feature, or even, the
HLL programmer may avoid the construct, as it is slow and cumbersome. Thus, the
comment above does not hold.
The studies have shown that it is not so due to the following reasons:
If an instruction can be executed in more ways than one, then more cases must be
considered. For it the compiler writer needed to balance the speed of the compilers to
get good code. In CISCs compilers need to analyze the potential usage of all available
instruction, which is time consuming. Thus, it is recommended that there is at least
one good way to do something. In RISC, there are few choices; for example, if an
operand is in memory it must first be loaded into a register. Thus, RISC requires
simple case analysis, which means a simple compiler, although more machine
instructions will be generated in each case.
RISC is tailored for C language and will not work well with other high level
languages.
But the studies of other high level languages found that the most frequently executed
operations in other languages are also the same as simple HLL constructs found in C,
for which RISC has been optimized. Unless a HLL changes the paradigm of
programming we will get similar result.
The good performance is due to the overlapped register windows; the reduced
instruction set has nothing to do with it.
Certainly, a major portion of the speed is due to the overlapped register windows of
the RISC that provide support for function calls. However, please note this register
windows is possible due to reduction in control unit size from 50 to 6 per cent. In
addition, the control is simple in RISC than CISC, thus further helping the simple
instructions to execute faster.
In general, the memory access in RISC is performed through LOAD and STORE
operations. For such instructions the following steps may be needed:
F: Instruction Fetch to get the instruction
E: Effective address calculation for the desired memory operand
D: Memory to register or register to memory data transfer through bus.
94
Reduced Instruction
Let us explain pipelining in RISC with an example program execution sample. Take Set Computer
the following program (R indicates register). Architecture
Load RA ! M(A) F E D
Load RB ! M(B) F E D
Add RC! RA +RB F E
Sub RD ! RA - RB F E
Mul RE ! RC×RD F E
Stor RE "M( C ) Time --------------" F E D
Return Time = 17 units F E
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Figure 6: Sequential Execution of Instructions
Figure 7 shows a simple pipelining scheme, in which F and E phases of two different
instructions are performed simultaneously. This scheme speeds up the execution rate
of the sequential scheme.
Load RA ! M(A) F E D
Load RB ! M(B) F E D
Add RC ! RA + RB F E
Sub RD ! RA - RB F E
Mul RE ! RC × RD F E
Stor RE" M(C) F E D
Return F E
Time 1 2 3 4 5 6 7 8 9 10 11
Total time = 11 units
Please note that the pipeline above is not running at its full capacity. This is because
of the following problems:
! We are assuming a single port memory thus only one memory access is allowed at
a time. Thus, Fetch and Data transfer operations cannot occur at the same time.
Thus, you may notice blank in the time slot 3, 5 etc.
! The last instruction is an unconditional jump. Please note that after this instruction
the next instruction of the calling program will be executed. Although not visible
in this example a branch instruction interrupts the sequential flow of instruction
execution. Thus, causing inefficiencies in the pipelined execution.
This pipeline can simply be improved by allowing two memory accesses at a time.
95
The Central
Processing Unit
Load RA ! M(A) F E D
Load RB ! M(B) F E D
Add RC !RA + RB F E
Sub RD ! RA - RB F E
Mul RE = RC × RD F E
Stor RE " M( C ) Time ------" F E D
Return Time = 8 units F E
Optimization of Pipelining
RISC machines can employ a very efficient pipeline scheme because of the simple
and regular instructions. Like all other instruction pipelines RISC pipeline suffer from
the problems of data dependencies and branching instructions. RISC optimizes this
problem by using a technique called delayed branching.
One of the common techniques used to avoid branch penalty is to pre-fetch the branch
destination also. RISC follows a branch optimization technique called delayed jump
as shown in the example given below:
Load RA ! M(A) F E D
Load RB ! M(B) F E D
Add RC !RA + RB F E
Sub RD ! RA - RB F E
If RD < 0 Return F E
Stor RC " M( C ) F E D
Return F E
(a) The instruction “If RD < 0 Return” may cause pipeline to empty
Load RA ! M(A) F E D
Load RB ! M(B) F E D
Add RC! RA + RB F E
Sub RD ! RA - RB F E
If RD < 0 Return F E
NO Operation F E
Stor RC" M(C) Or F E D
Return as the case may be
Return F E
(b) The No operation instruction causes decision of the If instruction known, thus
correct instruction can be fetched.
96
Reduced Instruction
Set Computer
Load RA !M(A) F E D Architecture
Load RB ! M(B) F E D
Sub RD ! RA - RB F E
If RD < 0 Return F E
Add RC ! RA + RB F E
Stor RC " M( C ) F E D
Return F E
(c) The branch is calculated before, thus the pipeline need not be emptied. This is
delayed branch.
Finally, let us summarize the basic differences between CISC and RISC architecture.
The following table lists these differences:
CISC RISC
1. Large number of instructions – from 1. Relatively fewer instructions - less
120 to 350. than 100.
2. Employs a variety of data types and a 2. Relatively fewer addressing modes.
large number of addressing modes.
3. Variable-length instruction formats. 3. Fixed-length instructions usually 32
bits, easy to decode instruction format.
4. Instructions manipulate operands 4. Mostly register-register operations.
residing in memory. The only memory access is through
explicit LOAD/STORE instructions.
5. Number of Cycles Per Instruction 5. Number of CPI is one as it uses
(CPI) varies from 1-20 depending upon pipelining. Pipeline in RISC is
the instruction. optimised because of simple
instructions and instruction formats.
6. GPRs varies from 8-32. But no support 6. Large number of GPRs are available
is available for the parameter passing that are primarily used as Global
and function calls. registers and as a register based
procedural call and parameter passing
stack, thus, optimised for structured
programming.
7. Microprogrammed Control Unit. 7. Hardwired Control Unit.
97
The Central
Processing Unit
3. What are the problems of RISC architecture? How are these problems
compensated such that there is no reduction in performance?
..............................................................................................................................
..............................................................................................................................
..............................................................................................................................
5.7 SUMMARY
RISC represents new styles of computers that take less time to build yet provide a
higher performance. While traditional machines support HLLs with instruction that
look like HLL constructs, this machine supports the use of HLLs with instructions that
HLL compilers can use efficiently. The loss of complexity has not reduced RISC’s
functionality; the chosen subset, especially when combined with the register window
scheme, emulates more complex machines. It also appears that we can build such a
single chip computer much sooner and with much less effort than traditional
architectures.
Thus, we see that because of all the features discussed above, the RISC architecture
should prove to be far superior to even the most complex CISC architecture.
In this unit we have also covered the details of the pipelined features of the RISC
architecture, which further strengthen our arguments for the support of this
architecture.
2.
a) False
b) False
c) False
98
Reduced Instruction
Set Computer
Incoming Outgoing No. of Local Architecture
Parameter Parameter Registers
Registers Registers
1 1 22
2 2 20
3 3 18
4 4 16
5 5 14
6 6 12
7 7 10
8 8 8
9 9 6
10 10 4
11 11 2
12 12 0
! It has a single port memory reducing the access to one device at a time
! Branch instruction
! The data dependencies between the instructions
99
Microprocessor
Architecture
UNIT 1 MICROPROCESSOR
ARCHITECTURE
Structure Page No.
1.0 Introduction 5
1.1 Objectives 5
1.2 Microcomputer Architecture 5
1.3 Structure of 8086 CPU 7
1.3.1 The Bus Interface Unit
1.3.2 Execution Unit (EU)
1.4 Register Set of 8086 11
1.5 Instruction Set of 8086 13
1.5.1 Data Transfer Instructions
1.5.2 Arithmetic Instructions
1.5.3 Bit Manipulation Instructions
1.5.4 Program Execution Transfer Instructions
1.5.5 String Instructions
1.5.6 Processor Control Instructions
1.6 Addressing Modes 29
1.6.1 Register Addressing Mode
1.6.2 Immediate Addressing Mode
1.6.3 Direct Addressing Mode
1.6.4 Indirect Addressing Mode
1.7 Summary 33
1.8 Solutions/Answers 33
1.0 INTRODUCTION
In the previous blocks of this course, we have discussed concepts relating to CPU
organization, register set, instruction set, addressing modes with a few examples. Let
us look at one microprocessor architecture in regard of all the above concepts. We
have selected one of the simplest processors 8086, for this purpose. Although the
processor technology is old, all the concepts are valid for higher end Intel processor.
Therefore, in this unit, we will discuss the 8086 microprocessor in some detail.
1.1 OBJECTIVES
After going through this unit, you should be able to:
! describe the features of the 8086 microprocessor;
! list various components of the 8086 microprocessor; and
! identify the instruction set and the addressing modes of the 8086 microprocessor.
Bus Sizes
1. The Address bus: 8085 microprocessor has 16 bit lines. Thus, it can access up to
216 = 64K Bytes. The address bus of 8086 microprocessor has a 20 bits address
bus. Thus it can access upto 220 = 1M Byte size of RAM directly.
2. Data bus is the number of bits that can be transferred simultaneously. It is 16 bits
in 8086.
Microprocessors
The microprocessor is a complete CPU on a single chip. The main advantages of the
microprocessor are:
! More throughput
! More addressing capability
! Powerful addressing modes
! Powerful instruction set
! Faster operation through pipelining
! Virtual memory management.
However, RISC machine do not agree with above principles.
6
Microprocessor
The assembly language for more advanced chips subsumes the simplest 8086/ 8088 Architecture
assembly language. Therefore, we will confine our discussions to Intel 8086/8088
assembly language. You must refer to the further readings for more details on
assembly language of Pentium, G4 and other processors.
while (1)
{
fetch (instruction); ,
execute (using date);
}
7
Assembly Language
Programming
The word independent implies that these two units can function parallel to each other.
In other words they may be considered as two stages of the instruction pipeline.
The BIU (Bus Interface Unit) primarily interacts with the system bus. It performs
almost all the activities relating to fetch cycle such as:
! Reading or writing data memory or I/O port from memory or Input/ Output.
The instruction/ data is then passed to the execution unit. This BIU consists of:
The instruction queue is used to store the instruction “bytes” fetched. Please
note two points here: that it is (1) A Byte (2) Queue. This is used to store
information in byte form, with the underlying queue data structure. The
advantage of this queue would only be if the next expected instructions are
fetched in advance, thus, allowing a pipeline of fetch and execute cycles.
These are very important registers of the CPU. Why? We will answer this later.
In 8086 microprocessor, the memory is a byte organized, that is a memory
address is byte address. However, the number of bits fetched is 16 at a time. The
segment registers are used to calculate the address of memory location along
with other registers. A segment register is 16 bits long.
The BIU contains four sixteen-bit registers, viz., the CS: Code Segment, the DS:
Data Segment, the SS: Stack Segment, and the ES: Extra Segment. But what is
the need of the segments: Segments logically divide a program into logical
entities of Code, Data and Stack each having a specific size of 64 K. The
segment register holds the upper 16 bits of the starting address of a logical
group of memory, called the segment. But what are the advantages of using
segments? The main advantages of using segments are:
! The addresses that need to be used in programs are relocatable as they are
the offsets. Thus, the segmentation supports relocatability.
! Although the size of address, is 20 bits, yet only the maximum segment
size, that is 16 bits, needs to be kept in instruction, thus, reducing
instruction length.
8
Microprocessor
Architecture
Although the size of each segment can be 64K, as they are overlapping segments we
can create variable size of segments, with maximum as 64K. Each segment has a
specific function. 8086 supports the following segments:
As per model of assembly program, it can have more than one of any type of
segments. However, at a time only four segments one of each type, can be active.
The 8086 supports 20 address lines, thus supports 20 bit addresses. However, all the
registers including segment registers are of only 16 bits. So how may this mapping of
20 bits to 16 bits be performed?
You can add offset of 16 bits (4 Hex digits) from 0000h to FFFFh to it . Thus, a
typical segment which starts at a physical address 10000h will range from 10000h to
1FFFFh. The segment register for this segment will contain 1000H and offset will
9
Assembly Language
Programming
range from 0000h to FFFFh. But, how will the segment address and offset be added to
calculate physical address? Let us explain using the following examples:
Example 1 (In the Figure above)
The value of the stack segment register (SS) = 6000h
The value of the stack pointer (SP) which is Offset = 0010h
SS 6 0 0 0 0 Implied zero
SP + 0 0 1 0
6 0 0 1 0
Physical Address
Example 2
The offset of the data byte = 0020h
The value of the data segment register (DS) = 3000h
Physical address of the data byte
DS 3 0 0 0 0 Implied Zero
Offset + 0 0 2 0
Physical Address
3 0 0 2 0
This calculation can be expressed as physical address = DS (Hex) " 16 + Data byte
offset (hex).
Example 3
The value of the Instruction Pointer, holding address of the instruction = 1234h
The value of the code segment register (CS) = 448Ah
Physical address of the instruction
CS 4 4 8 A 0 ImpliedZero
+ 1 2 3 4
IP
Physical Address 4 5 A 0 4
10
Microprocessor
1.3.2 Execution Unit (EU) Architecture
Execution unit performs all the ALU operations. The execution unit of 8086 is of 16
bits. It also contains the control unit, which instructs bus interface unit about which
memory location to access, and what to do with the data. Control unit also performs
decoding and execution of the instructions. The EU consists of the following:
(a) Control Circuitry, Instruction Decoder and ALU
The 8086 control unit is primarily micro-programmed control. In addition it has an
instruction decoder, which translates an instruction into sequence of micro operations.
The ALU performs the required operations under the control of CU which issues the
necessary timing and control sequences.
(b) Registers
All CPUs have a defined number of operational registers. 8086 has several general
purpose and special purpose registers. We will discuss these registers in the following
sections.
AX register is also known as accumulator. Some of the instructions like divide, rotate,
shift etc. require one of the operands to be available in the accumulator. Thus, in such
instructions, the value of AX should be suitably set prior to the instruction.
BX register is mainly used as a base register. It contains the starting base location of a
memory region within a data segment.
You will experience their usage in various assembly programs discussed later.
Segment Registers
Segment Registers are used for calculating the physical address of the instruction or
memory. Segment registers cannot be used as byte registers.
Flags Register
A flag represents a condition code that is 0 or 1. Thus, it can be represented using a
flip- flop. 8086 employs a 16-bit flag register containing nine flags. The following
table shows the flags of 8086.
12
Microprocessor
(c) The Source Index (SI) and Destination Index(DI) registers in 8086 can also be Architecture
used as general registers.
Please note that NEXT is the label field. It is giving an identity to the statement. It is
an optional field, and is used when an instruction is to be executed again through a
LOOP or GO TO. ADD is symbolic op-code, for addition operation. AL and BL are
the two operands of the instructions. Please note that the number of operands is
dependent upon the instructions. 8086 instructions can have zero, one or two
operands. An operand in 8086 can be:
1. A register
2. A memory location
3. A constant called literal
4. A label.
Comments in 8086 assembly start with a semicolon, and end with a new line. A long
comment can be extended to more than one line by putting a semicolon at the
beginning of each line. Comments are purely optional, however recommended as they
provide program documentation. In the next few sections we look at the instruction set
of the 8086 microprocessor. These instructions are grouped according to their
functionality.
18
Microprocessor
masking off the upper nibble of each ; and 7 CH = 09h Architecture
byte. Then ADD instruction is used to AAD
convert the unpacked BCD digits in ; adjust to binary before
AL and AH registers to adjust them to ; division AX= 0043 =
equivalent binary prior to division. ; 043h = 67 Decimal
Such division will result in unpacked DIV CH
BCD quotient and remainder. The PF, ; Divide AX by unpacked
SF, ZF flags are updated, while the ; BCD in CH
AF, CF, and the OF flags are left ; AL = 07 unpacked BCD
undefined. ; AH = 04 unpacked BCD
; PF = SF = ZF = 0
CBW Fill upper-byte or word with copies ; AL = 10011011 = -155
of sign bit of lower bit. This is called ; decimal AH = 00000000
sign extension of byte to word. This CBW ;convert signed
instruction does not change any ; byte in AL to signed
flags. This operation is done with AL ; word in AX = 11111111
register in the result being stored in ; 10011011 = -155 decimal
AX.
CWD Fill upper word or double word with ; DX : 0000 0000 0000 0000
sign bit of lower word. This ; AX : 1111 0000 0101 0001
instruction is an extension of the CWD
; DX:AX = 1111 1111 1111 1111:
previous instruction. This instruction
; 1111 0000 0101 0001
results in sign extension of AX
register to DX:AX double word.
CF MSB LSB
CF MSB LSB
CF MSB LSB
CF MSB LSB
(d) TEST instruction performs an OR operation, but does not change the value
of operands.
(e) Suppose AL contains 0110 0101 and CF is set, then instructions ROL AL
and RCL AL will produce the same results.
21
Assembly Language
Programming 1.5.4 Program Execution Transfer Instructions
These instructions are the ones that causes change in the sequence of execution of
instruction. This change can be through a condition or sometimes may be
unconditional. The conditions are represented by flags. For example, an instruction
may be jump to an address if zero flag is set, that is the last ALU operation has
resulted in zero value. These instructions are often used after a compare instruction, or
some arithmetic instructions that are used to set the flags, for example, ADD or SUB.
LOOP is also a conditional branch instruction and is taken till loop variable is below a
certain count.
Please note that a "/" is used to separate two mnemonics which represent the same
instruction.
24
Microprocessor
decremented to zero, Architecture
which means all the
elements of the array are
equal to 0FFh, or an
element in the array is
found which is not equal
to 0FFh. In this case, the
CX register may still be
greater than zero, when the
control comes out. This
can be coded as follows:
(Please note here that you
might not understand
everything at this place,
that is because you are still
not familiar with the
various addressing modes.
Just concentrate on the
LOOPE instruction):
In addition to these instructions, there are other interrupt handling instructions also,
which too transfer the control of the program to some specified location. We will
discuss these instructions in later units.
Well, 8086 only allows you to control certain control flags that causes the processing
in a certain direction, processor synchronization if more than one processors are
attached through LOCK instruction for buses etc.
Note: Please note that these instructions may not be very clear to you right now. Thus,
some of these instructions have been discussed in more detail in later units. You must
refer to further readings for more details on these instructions.
28
Microprocessor
operation occurs in the reverse ; Clear the direction flag Architecture
direction. ; so that the string pointers
; auto-increment.
MOV AX,1000h
MOV DS, AX
; Initialize data segment
; and extra segment
MOV ES, AX
MOV SI, 20h
; Load offset of start of
; source string to SI
MOV DI,30h
; Load offset of start of
; destination string to DI
MOV CX,10
; Load length of string to
; CX as counter
REP MOVSB
; Decrement CX and
; increment
; SI and DI to point to next
; byte, then MOVSB until
; CX = 0
There are many process control instructions other than these; you may please refer to
further reading for such instructions. These instructions include instructions for setting
and closing interrupt flag, halting the computer, LOCK (locking the bus), NOP etc.
30
Microprocessor
default will be assumed to be ; The offsets of these Architecture
in data segment, while LABEL ; variables are calculated
1, will be assumed to be in ; with respect to the
code segment. If we specify, ; segment name (register)
as a direct operand then the ; specified in the
address is non-relocatable. ; instruction.
Please note the value of
segment register will be
known only at the run time.
32
Microprocessor
2. Conditional jump instructions require one of the flags to be tested. Architecture
4. In the instruction MOV BX, DX register addressing mode has been used.
6. In the instruction ADD CX, [DI] [BX] the second operand is a based index
operand, whose effective address is obtained by adding the contents of DI and BX
registers.
1.7 SUMMARY
In this unit, we have studied one of the most popular series of microprocessors, viz.,
Intel 8086. It serves as a base to all its successors, 8088, 80186, 80286, 80486, and
Pentium. The successors of 8086 can be directly run on any successors. Therefore,
though, 8086 has become obsolete from the market point of view, it is still needed to
understand advanced microprocessors.
You can refer to further readings for obtaining more details on INTEL and Motorola
series of microprocessors.
1.8 SOLUTIONS/ANSWERS
Check Your Progress 1
1. It improves execution efficiency by storing the next instruction in the register
queue.
2. (a) False
(b) False
(c) True
(d) False
(e) False
34
Introduction to
Assembly Language
UNIT 2 INTRODUCTION TO ASSEMBLY Unit Name
Programming
LANGUAGE PROGRAMMING
Structure Page No.
2.0 Introduction 35
2.1 Objectives 35
2.2 The Need and Use of the Assembly Language 35
2.3 Assembly Program Execution 36
2.4 An Assembly Program and its Components 41
2.4.1 The Program Annotation
2.4.2 Directives
2.5 Input Output in Assembly Program 45
2.5.1 Interrupts
2.5.2 DOS Function Calls (Using INT 21H)
2.6 The Types of Assembly Programs 51
2.6.1 COM Programs
2.6.2 EXE Programs
2.7 How to Write Good Assembly Programs 53
2.8 Summary 55
2.9 Solutions/Answers 56
2.10 Further Readings 56
2.0 INTRODUCTION
In the previous unit, we have discussed the 8086 microprocessor. We have discussed
the register set, instruction set and addressing modes for this microprocessor. In this
and two later units we will discuss the assembly language for 8086/8088
microprocessor. Unit 1 is the basic building block, which will help in better
understanding of the assembly language. In this unit, we will discuss the importance
of assembly language, basic components of an assembly program followed by
discussions on the program developmental tools available. We will then discuss what
are COM programs and EXE programs. Finally we will present a complete example.
For all our discussions, we have used Microsoft Assembler (MASM). However, for
different assemblers the assembly language directives may change. Therefore, before
running an assembly program you must consult the reference manuals of the
assembler you are using.
2.1 OBJECTIVES
After going through this unit you should be able to:
35
The Central
Processing Unit ! It greatly depends on machine and is difficult for most people to write in 0-1
Assembly Language
Programming forms.
! DEBUGGING is difficult.
! Deciphering the machine code is very difficult. Thus program logic will be
difficult to understand.
! Assembly Language provides more control over handling particular hardware and
software, as it allows you to study the instructions set, addressing modes,
interrupts etc.
! Assembly Programming generates smaller, more compact executable modules: as
the programs are closer to machine, you may be able to write highly optimised
programs. This results in faster execution of programs.
Assembly language programs are at least 30% denser than the same programs written
in high-level language. The reason for this is that as of today the compilers produce a
long list of code for every instruction as compared to assembly language, which
produces single line of code for a single instruction. This will be true especially in
case of string related programs.
On the other hand assembly language is machine dependent. Each microprocessor has
its own set of instructions. Thus, assembly programs are not portable.
Assembly language has very few restrictions or rules; nearly everything is left to the
discretion of the programmer. This gives lots of freedom to programmers in
construction of their system.
36
Introduction to
1) Manual assembly Assembly Language
Programming
2) By using an assembler.
Manual Assembly
It was an old method that required the programmer to translate each opcode into its
numerical machine language representation by looking up a table of the
microprocessor instructions set, which contains both assembly and machine language
instructions. Manual assembly is acceptable for short programs but becomes very
inconvenient for large programs. The Intel SDK-85 and most of the earlier university
kits were programmed using manual assembly.
Using an Assembler
The symbolic instructions that you code in assembly language is known as - Source
program.
An assembler program translates the source program into machine code, which is
known as object program.
Mnemonic Machine
Program Assembler Instructions
Step 2: The link step involves converting the .OBJ module to an .EXE machine code
module. The linker’s tasks include completing any address left open by the
assembler and combining separately assembled programs into one executable
module.
The linker:
Step 3: The last step is to load the program for execution. Because the loader knows
where the program is going to load in memory, it is now able to resolve any
remaining address still left incomplete in the header. The loader drops the
header and creates a program segment prefix (PSP) immediately before the
program is loaded in memory.
37
The Central
Processing
AssemblyUnit
Language
Programming
Pass 1: Assembler reads the entire source program and constructs a symbol table of
names and labels used in the program, that is, name of data fields and programs labels
and their relative location (offset) within the segment.
38
Introduction to
Pass 1 determines the amount of code to be generated for each instruction. Assembly Language
Programming
Pass 2: The assembler uses the symbol table that it constructed in Pass 1. Now it
knows the length and relative position of each data field and instruction, it can
complete the object code for each instruction. It produces .OBJ (Object file), .LST
(list file) and cross reference (.CRF) files.
Editor
The editor is a program that allows the user to enter, modify, and store a group of
instructions or text under a file name. The editor programs can be classified in 2
groups.
! Line editors
! Full screen editors.
Line editors, such as EDIT in MS DOS, work with the manage one line at a time. Full
screen editors, such as Notepad, Wordpad etc. manage the full screen or a paragraph
at a time. To write text, the user must call the editor under the control of the operating
system. As soon as the editor program is transferred from the disk to the system
memory, the program control is transferred from the operating system to the editor
program. The editor has its own command and the user can enter and modify text by
using those commands. Some editor programs such as WordPerfect are very easy to
use. At the completion of writing a program, the exit command of the editor program
will save the program on the disk under the file name and will transfer the control to
the operating system. If the source file is intended to be a program in the 8086
assembly language the user should follow the syntax of the assembly language and the
rules of the assembler.
Assembler
An assembly program is used to transfer assembly language mnemonics to the binary
code for each instruction, after the complete program has been written, with the help
of an editor it is then assembled with the help of an assembler.
An assembler works in 2 phases, i.e., it reads your source code two times. In the first
pass the assembler collects all the symbols defined in the program, along with their
offsets in symbol table. On the second pass through the source program, it produces
binary code for each instruction of the program, and give all the symbols an offset
with respect to the segment from the symbol table.
The assembler generates three files. The object file, the list file and cross reference
file. The object file contains the binary code for each instruction in the program. It is
created only when your program has been successfully assembled with no errors. The
errors that are detected by the assembler are called the symbol errors. For example,
List file is optional and contains the source code, the binary equivalent of each
instruction, and the offsets of the symbols in the program. This file is for purely
39
The Central
Processing Unit
documentation purposes. Some of the assemblers available on PC are MASM,
Assembly Language
Programming TURBO etc.
Linker
For modularity of your programs, it is better to break your program into several sub
routines. It is even better to put the common routine, like reading a hexadecimal
number, writing hexadecimal number, etc., which could be used by a lot of your other
programs into a separate file. These files are assembled separately. After each file
has been successfully assembled, they can be linked together to form a large file,
which constitutes your complete program. The file containing the common routines
can be linked to your other program also. The program that links your program is
called the linker.
The linker produces a link file, which contains the binary code for all compound
modules. The linker also produces link maps, which contains the address information
about the linked files. The linker however does not assign absolute addresses to your
program. It only assigns continuous relative addresses to all the modules linked
starting from the zero. This form a program is said to be relocatable because it can be
put anywhere in memory to be run.
Loader
Loader is a program which assigns absolute addresses to the program. These
addresses are generated by adding the address from where the program is loaded into
the memory to all the offsets. Loader comes into action when you want to execute
your program. This program is brought from the secondary memory like disk. The
file name extension for loading is .exe or .com, which after loading can be executed
by the CPU.
Debugger
The debugger is a program that allows the user to test and debug the object file. The
user can employ this program to perform the following functions.
Errors
Two possible kinds of errors can occur in assembly programs:
a. Programming errors: They are the familiar errors you can encounter in the course
of executing a program written in any language.
b. System errors: These are unique to assembly language that permit low-level
operations. A system error is one that corrupts or destroys the system under
which the program is running - In assembly language there is no supervising
40
Introduction to
interpreter or compiler to prevent a program from erasing itself or even from Assembly Language
erasing the computer operating system. Programming
! The assembler assigns line numbers to the statements in the source file
sequentially. If the assembler issues an error message; the message will contain a
reference to one of these line numbers.
! The second column from the left contains offsets. Each offset indicates the
address of an instruction or a datum as an offset from the base of its logical
segment, e.g., the statement at line 0010 produces machine language at offset
0000H of the CODE SEGMENT and the statement at line number 0002 produces
machine language at offset 0000H of the DATA SEGMENT.
! The third column in the annotation displays the machine language produce by
code instruction in the program.
Segment numbers: There is a good reason for not leaving the determination of
segment numbers up to the assembler. It allows programs written in 8086 assembly
language to be almost entirely relocatable. They can be loaded practically anywhere
in memory and run just as well. Program1 has to store the message “Have a nice
day$” somewhere in memory. It is located in the DATA SEGMENT. Since the
41
The Central
Processing Unit
characters are stored in ASCII, therefore it will occupy 15 bytes (please note each
Assembly Language
Programming blank is also a character) in the DATA SEGMENT.
Missing offset: The xxxx in the machine language for the instruction at line 0010 is
there because the assembler does not know the DATA segment location that will be
determined at loading time. The loader must supply that value.
Keyword: A keyword is a statement that defines the nature of that statement. If the
statement is a directive then the keyword will be the title of that directive; if the
statement is a data-allocation statement the keyword will be a data definition type.
Some examples of the keywords are: SEGMENT (directive), MOV (statement) etc.
Identifiers: An identifier is a name that you apply to an item in your program that
you expect to reference. The two types of identifiers are name and label.
1. Name refers to the address of a data item such as counter, arr etc.
2. Label refers to the address of our instruction, process or segment. For example
MAIN is the label for a process as:
MAIN PROC FAR
A20: BL,45 ; defines a label A20.
Identifier can use alphabet, digit or special character but it always starts with an
alphabet.
Parameters: A parameter extends and refines the meaning that the assembler
attributes to the keyword in a statement. The number of parameters is dependent on
the Statement.
2.4.2 Directives
Assembly languages support a number of statements. This enables you to control the
way in which a source program assembles and list. These statements, called
directives, act only when the assembly is in progress and generate no machine-
executable code. Let us discuss some common directives.
1. List: A list directive causes the assembler to produce an annotated listing on the
printer, the video screen, a disk drive or some combination of the three. An
annotated listing shows the text of the assembly language programs, numbers of
each statement in the program and the offset associated with each instruction and
each datum. The advantage of list directive is that it produces much more
informative output.
2. HEX: The HEX directive facilitates the coding of hexadecimal values in the
body of the program. That statement directs the assembler to treat tokens in the
42
Introduction to
source file that begins with a dollar sign as numeric constants in hexadecimal Assembly Language
notation. Programming
3. PROC Directive: The code segment contains the executable code for a
program, which consists of one or more procedures defined initially with the
PROC directive and ended with the ENDP directive.
5. ASSUME Directive: An .EXE program uses the SS register to address the base
of stack, DS to address the base of data segment, CS to address base of the code
segment and ES register to address the base of Extra segment. This directive tells
the assembler to correlate segment register with a segment name. For example,
CODE SEGMENT
The logical program segment is named code segment. When the linker links a
program it makes a note in the header section of the program’s executable file
describing the location of the code segment when the DOS invokes the loader to
load an executable file into memory, the loader reads that note. As it loads the
program into memory, the loader also makes notes to itself of exactly where in
memory it actually places each of the program’s other logical segments. As the
loader hands execution over to the program it has just loaded, it sets the CS
register to address the base of the segment identified by the linker as the code
segment. This renders every instruction in the code segment addressable in
segment relative terms in the form CS: xxxx.
The linker also assumes by default that the first instruction in the code segment
is intended to be the first instruction to be executed. That instruction will appear
in memory at an offset of 0000H from the base of the code segment, so the linker
passes that value on to the loader by leaving an another note in the header of the
program’s executable file.
43
The Central
Processing Unit
The loader sets the IP (Instruction Pointer) register to that value. This sets CS:IP
Assembly Language
Programming to the segment relative address of the first instruction in the program.
STACK SEGMENT
8086 Microprocessor supports the Word stack. The stack segment parameters
tell the assembler to alert the linker that this segment statement defines the
program stack area.
A program must have a stack area in that the computer is continuously carrying
on several background operations that are completely transparent, even to an
assembly language programmer, for example, a real time clock. Every 55
milliseconds the real time clock interrupts. Every 55 ms the CPU is interrupted.
The CPU records the state of its registers and then goes about updating the
system clock. When it finishes servicing the system clock, it has to restore the
registers and go back to doing whatever it was doing when the interruption
occurred. All such information gets recorded in the stack. If your program has
no stack and if the real time clock were to pulse while the CPU is running your
program, there would be no way for the CPU to find the way back to your
program when it was through updating the clock. 0400H byte is the default size
of allocation of stack. Please note if you have not specified the stack segment it
is automatically created.
DATA SEGMENT
It contains the data allocation statements for a program. This segment is very
useful as it shows the data organization.
DB Define byte 1
DW Define word 2
DD Define double word 4
DQ Define Quad word 8
DT Define 10 bytes 10
DUP Directive is used to duplicate the basic data definition to ‘n’ number of
times
ARRAY DB 10 DUP (0)
In the above statement ARRAY is the name of the data item, which is of byte
type (DB). This array contains 10 duplicate zero values; that is 10 zero values.
EQU directive is used to define a name to a constant
CONST EQU 20
44
Introduction to
Type of number used in data statements can be octal, binary, haxadecimal, Assembly Language
decimal and ASCII. The above statement defines a name CONST to a value 20. Programming
(b) DUP directive is used to indicate if a same memory location is used by two
different variables name.
(d) The maximum number of active segments at a time in 8086 can be four.
(e) ASSUME directive specifies the physical address for the data values of
instruction.
2.5.1 Interrupts
An interrupt causes interruption of an ongoing program. Some of the common
interrupts are: keyboard, printer, monitor, an error condition, trap etc.
45
The Central
Processing
AssemblyUnit
Language
Programming Hardware interrupts are generated when a peripheral Interrupt servicing program
requests for some service. A software interrupt causes a call to the operating system. It
usually is the input-output routine.
Let us discuss the software interrupts in more detail. A software interrupt is initiated
using the following statements:
INT number
In 8086, this interrupt instruction is processing using the interrupt vector table
(IVT). The IVT is located in the first 1K bytes of memory, and has a total of 256
entities, each of 4 bytes. An entry in the interrupt vector table is identified by the
number given in the interrupt instruction. The entry stores the address of the operating
system subroutine that is used to process the interrupt. This address may be different
for different machines. Figure 1 shows the processing of an interrupt.
Step 2: The CPU locates the interrupt servicing routine (ISR) whose address is stored
at IVT entry of the interrupt. For example, in the figure above the ISR of INT
10h is stored at location at a segment address F000h and an offset F065h.
Step 3: The CPU loads the CS register and the IP register, with this new address in
the IVT, and transfers the control to that address, just like a far CALL,
(discussed in the unit 4).
Step 4: IRET (interrupt return) causes the program to resume execution at the next
instruction in the calling program.
The advantage of this type of call is that it appears static to a programmer but flexible
to a system design engineer. For example, INT 00H is a special system level vector
that points to the “recovery from division by zero” subroutine. If new designer come
and want to move interrupt location in memory, it adjusts the entry in the IVT vector
of interrupt 00H to a new location. Thus from the system programmer point of view,
it is relatively easy to change the vectors under program control.
One of the most commonly used Interrupts for Input /Output is called DOS function
call. Let us discuss more about it in the next subsection:
47
The Central
Processing
AssemblyUnit
Language
Example of CR EQU ODH
Programming AH = 09H ; ASCII code of carriage return.
DATA SEGMENT
STRING DB ‘HELLO WORLD’, CR, ‘$’
DATA ENDS
CODE SEGMENT
:
MOV AX, DATA
MOV DS, AX
MOV AH, 09H
MOV DX, OFFSET STRING
; Store the offset of string in DX register.
INT 21H
AH = 0AH For input of string up to Look in the examples given.
255 characters. The string
is stored in a buffer.
AH = 4CH Return to DOS
48
Introduction to
; then type 9 Assembly Language
MOV AH, 08H Programming
INT 21H
MOV BL, AL ; If we have input 39 then, BL will first have character
; 3, we can convert it to 3 using previous logic that is 33 – 30 = 3.
SUB BL, ‘0’
MUL BL, AH ; To get 30 Multiply it by 10.
; Now BL Store 30
; Input another digit from keyboard
MOV AH, 08H
INT 21H;
MOV DL, AL ; Store AL in DL
SUB DL, ‘0’ ; (39 – 30) = 9.
; Now BL contains the value: 30 and DL has the value 9 add them and get the
; required numbers.
ADD BL, DL
; Now BL store 39. We have 2 digit value in BL.
Note: Boilerplate code is the code that is present more or less in the same form in
every assembly language program.
Strings Input
CODE SEGMENT
…
MOV AH, 0AH ; Move 04 to AH register
MOV DX, BUFF ; BUFF must be defined in data segment.
INT 21H
…..
CODE ENDS
DATA SEGMENT
BUFF DB 50 ; max length of string,
; including CR, 50 characters
DB ? ; actual length of string not known at present
DB 50 DUP(0) ; buffer having 0 values
DATA ENDS.
49
The Central
Processing Unit
Explanation
Assembly Language
Programming
The above DATA segment creates an input buffer BUFF of maximum 50 characters.
On input of data ‘JAIN’ followed by enter data would be stored as:
50 4 J A I N #
Here data from BL is moved to DL and then data display on monitor function is called
which displays the contents of DL register.
Here data in input buffer stored in data segment is going to be displayed on the
monitor.
A complete program:
Input a letter from keyboard and respond. “The letter you typed is ___”.
50
Introduction to
CODE SEGMENT Assembly Language
; set the DS register Programming
MOV AX, DATA
MOV DS, AX
; Read Keyboard
MOV AH, 08H
INT 21H
; Save input
MOV BL, AL
; Display first part of Message
MOV AH, 09H
MOV DX, OFFSET MESSAGE
INT 21 H
; Display character of BL register
MOV AH, 02H
MOV DL, BL
INT 21 H
; Exit to DOS
MOV AX, 4C00H
INT 21H
CODE ENDS
DATA SEGMENT
MESSAGE DB “The letter you typed is $”
DATA ENDS
END.
A COM program keeps its code, data, and stack segments within the same segment.
Since the offsets in a physical segment can be of 16 bits, therefore the size of COM
program is limited to 216 = 64K which includes code, data and stack. The following
program shows a COM program:
; Title add two numbers and store the result and carry in memory variables.
; name of the segment in this program is chosen to be CSEG
CSEG SEGMENT
ASSUME CS:CSEG, DS:CSEG, SS:CSEG
ORG 100h
START:MOV AX, CSEG ; Initialise data segment
MOV DS, AX ; register using AX
MOV AL, NUM1 ; Take the first number in AL
51
The Central
Processing Unit
ADD AL, NUM2 ; Add the 2nd number to it
Assembly Language
Programming MOV RESULT, AL ; Store the result in location RESULT
RCL AL, 01 ; Rotate carry into LSB
AND AL, 00000001B ; Mask out all but LSB
MOV CARRY, AL ; Store the carry result
MOV AX,4C00h
INT 21h
NUM1 DB 15h ; First number stored here
NUM2 DB 20h ; Second number stored here
RESULT DB ? ; Put sum here
CARRY DB ? ; Put any carry here
CSEG ENDS
END START
These programs are stored on a disk with an extension .com. A COM program
requires less space on disk rather than equivalent EXE program. At run-time the COM
program places the stack automatically at the end of the segment, so they use at least
one complete segment.
The load module of EXE program consists of up to 64K segments, although at the
most only four segments may be active at any time. The segments may be of variable
size, with maximum size being 64K.
52
Introduction to
RESULT DB ? ; Put sum here Assembly Language
CARRY DB ? ; Put any carry here Programming
DATA ENDS
CODE SEGMENT
ASSUME CS:CODE, DS:DATA
START:MOV AX, DATA ; Initialise data segment
MOV DS, AX ; register using AX
MOV AL, NUM1 ; Bring the first number in AL
ADD AL, NUM2 ; Add the 2nd number to AL
MOV RESULT, AL ; Store the result
RCL AL, 01 ; Rotate carry into Least Significant Bit (LSB)
AND AL, 00000001B ; Mask out all but LSB
MOV CARRY, AL ; Store the carry
MOV AX, 4C00h ; Terminate to DOS
INT 21h
CODE ENDS
END START
1. Write an algorithm for your program closer to assembly language. For example,
the algorithm for preceding program would be:
get NUM1
add NUM2
put sum into memory at RESULT
position carry bit in LSB of byte
mask off upper seven bits
store the result in the CARRY location.
3. Study the instruction set carefully. This step helps in specifying the available
instructions and their format and constraints. For example, the segment registers
cannot be directly initialized by a memory variable. Instead we have to first move
the offset for segment into a register, and then move the contents of register to the
segment register.
You can exit to DOS, by using interrupt routine 21h, with function 4Ch, placed in AH
register.
53
The Central
Processing Unit
It is a nice practice to first code your program on paper, and use comments liberally.
Assembly Language
Programming This makes programming easier, and also helps you understand your program later.
Please note that the number of comments do not affect the size of the program.
After the program development, you may assemble it using an assembler and correct
it for errors, finally creating exe file for execution.
7. String input and output can be achieved using INT 21H with
function number 09h and 0Ah respectively.
16. EXE program contains a header module, which is used by DOS for
calculating segment addresses.
18. EXE programs are more easily relocatable than COM programs.
54
Introduction to
Assembly Language
2.8 SUMMARY Programming
55
The Central
Processing Unit
Assembly Language
Programming
2.9 SOLUTIONS/ ANSWERS
Check Your Progress 1
1. (a) It helps in better understanding of computer architecture and work in
machine language.
(b) Results in smaller machine level code, thus result in efficient execution of
programs.
(c) Flexibility of use as very few restrictions exist.
3. (a) False
(b) False
(c) True
(d) True
(e) False
(f) True
56
Assembly Language
Programming
UNIT 3 ASSEMBLY LANGUAGE (Part I)
PROGRAMMING (PART – I)
Structure Page No.
3.0 Introduction 57
3.1 Objectives 57
3.2 Simple Assembly Programs 57
3.2.1 Data Transfer
3.2.2 Simple Arithmetic Application
3.2.3 Application Using Shift Operations
3.2.4 Larger of the Two Numbers
3.3 Programming With Loops and Comparisons 63
3.3.1 Simple Program Loops
3.3.2 Find the Largest and the Smallest Array Values
3.3.3 Character Coded Data
3.3.4 Code Conversion
3.4 Programming for Arithmetic and String Operations 69
3.4.1 String Processing
3.4.2 Some More Arithmetic Problems
3.5 Summary 75
3.6 Solutions/ Answers 75
3.0 INTRODUCTION
After discussing a few essential directives, program developmental tools and simple
programs, let us discuss more about assembly language programs. In this unit, we will
start our discussions with simple assembly programs, which fulfil simple tasks such as
data transfer, arithmetic operations, and shift operations. A key example here will be
about finding the larger of two numbers. Thereafter, we will discuss more complex
programs showing how loops and various comparisons are used to implement tasks
like code conversion, coding characters, finding largest in array etc. Finally, we will
discuss more complex arithmetic and string operations. You must refer to further
readings for more discussions on these programming concepts.
3.1 OBJECTIVES
After going through this unit, you should be able to:
! write assembly programs with simple arithmetic logical and shift operations;
! implement loops;
! use comparisons for implementing various comparison functions;
! write simple assembly programs for code conversion; and
! write simple assembly programs for implementing arrays.
57
Assembly Language
Programming
; Program 1: This program shows the difference of MOV and XCHG instructions:
DATA SEGMENT
VAL DB 5678H ; initialize variable VAL
DATA ENDS
CODE SEGMENT
ASSUME CS: CODE, DS: DATA
MAINP: MOV AX, 1234H ; AH=12 & AL=34
XCHG AH, AL ; AH=34 & AL=12
MOV AX, 1234H ; AH=12 & AL=34
MOV BX, VAL ; BH=56 & BL=78
XCHG AX, BX ; AX=5678 & BX=1234
XCHG AH, BL ; AH=34, AL=78, BH=12, & BL=56
MOV AX, 4C00H ; Halt using INT 21h
INT 21H
CODE ENDS
END MAINP
Discussion:
Just keep on changing values as desired in the program.
DATA SEGMENT
VALUE1 DB 0Ah ; Variables
VALUE2 DB 14h
DATA ENDS
CODE SEGMENT
ASSUME CS:CODE, DS:DATA
MOV AX, DATA ; Initialise data segments
MOV DS, AX ; using AX
MOV AL, VALUE1 ; Load Value1 into AL
XCHG VALUE2,AL ; exchange AL with Value2.
MOV VALUE1,AL ; Store A1 in Value1
INT 21h ; Return to Operating system
CODE ENDS
END
Discussion:
The question is why cannot we simply use XCHG instruction with two memory
variables as operand? To answer the question let us look into some of constraints for
the MOV & XCHG instructions:
The MOV instruction has the following constraints and operands:
The statement MOV AL, VALUE1, copies the VALUE1 that is 0Ah in the AL
register:
58
Assembly Language
AX : 00 0A 0A (VALUE1) Programming
AH AL 14 (VALUE2) (Part I)
The instruction, XCHG AL, VALUE2 ; exchanges the value of AL with VALUE2
AX : 00 14 0A (VALUE1)
0A (VALUE2)
AX : 00 14 14 (VALUE1)
0A (VALUE2)
Other statements in the above program have already been discussed in the preceding
units.
; Input : Two memory variables stored in memory locations FIRST and SECOND
; REGISTERS ; Uses DS, CS, AX, BL
; PORTS ; None used
DATA SEGMENT
FIRST DB 90h ; FIRST number, 90h is a sample value
SECOND DB 78h ; SECOND number, 78h is a sample value
AVGE DB ? ; Store average here
DATA ENDS
CODE SEGMENT
ASSUME CS:CODE, DS:DATA
START: MOV AX, DATA ; Initialise data segment, i.e. set
MOV DS, AX ; Register DS to point to Data Segment
MOV AL, FIRST ; Get first number
ADD AL, SECOND ; Add second to it
MOV AH, 00h ; Clear all of AH register
ADC AH, 00h ; Put carry in LSB of AH
MOV BL, 02h ; Load divisor in BL register
DIV BL ; Divide AX by BL. Quotient in AL,
; and remainder in AH
MOV AVGE, AL ; Copy result to memory
CODE ENDS
END START
Discussion:
An add instruction cannot add two memory locations directly, so we moved a single
value in AL first and added the second value to it.
Please note, on adding the two values, there is a possibility of carry bit. (The values
here are being treated as unsigned binary numbers). Now the problem is how to put
59
Assembly Language
Programming
the carry bit into the AH register such that the AX(AH:AL) reflects the added value.
This is done using ADC instruction.
The ADC AH,00h instruction will add the immediate number 00h to the contents of
the carry flag and the contents of the AH register. The result will be left in the AH
register. Since we had cleared AH to all zeros, before the add, we really are adding
00h + 00h + CF. The result of all this is that the carry flag bit is put in the AH register,
which was desired by us.
Finally, to get the average, we divide the sum given in AX by 2. A more general
program would require positive and negative numbers. After the division, the 8-bit
quotient will be left in the AL register, which can then be copied into the memory
location named AVGE.
; Program 4: Convert the ASCII code to its BCD equivalent. This can be done by
simply replacing the bits in the upper four bits of the byte by four zeros. For example,
the ASCII ‘1’ is 32h = 0011 0010B. By making the upper four bits as 0 we get 0000
0010 which is 2 in BCD. The number obtained is called unpacked BCD number. The
upper four bits of this byte is zero. So the upper four bits can be used to store another
BCD digit. The byte thus obtained is called packed BCD number. For example, an
unpacked BCD number 59 is 00000101 00001001, that is, 05 09. The packed BCD
will be 0101 1001, that is 59.
The algorithm to convert two ASCII digits to packed BCD can be stated as:
Convert first ASCII digit to unpacked BCD.
Convert the second ASCII digit to unpacked BCD.
0101 0000
0000 1001
Pack 0101 1001 Using OR
;The assembly language program for the above can be written in the following
manner.
60
Assembly Language
; REGISTERS ; Uses CS, AL, BL, CL Programming
; PORTS ; None used (Part I)
CODE SEGMENT
ASSUME CS:CODE
START: MOV BL, '5' ; Load first ASCII digit in BL
MOV AL, '9' ; Load second ASCII digit in AL
AND BL, 0Fh ; Mask upper 4 bits of first digit
AND AL, 0Fh ; Mask upper 4 bits of second digit
MOV CL, 04h ; Load CL for 4 rotates
ROL BL, CL ; Rotate BL 4 bit positions
OR AL, BL ; Combine nibbles, result in AL contains 59
; as packed BCD
CODE ENDS
END START
Discussion:
8086 does not have any instruction to swap upper and lower four bits in a byte,
therefore we need to use the rotate instructions that too by 4 times. Out of the two
rotate instructions, ROL and RCL, we have chosen ROL, as it rotates the byte left by
one or more positions, on the other hand RCL moves the MSB into the carry flag and
brings the original carry flag into the LSB position, which is not what we want.
Let us now look at a program that uses RCL instructions. This will make the
difference between the instructions clear.
; Program 5: Add a byte number from one memory location to a byte from the next
memory location and put the sum in the third memory location. Also, save the carry
flag in the least significant bit of the fourth memory location.
; ALGORITHM:
; get NUM1
; add NUM2 in it
; put sum into memory location RESULT
; rotate carry in LSB of byte
; mask off upper seven bits of byte
; store the result in the CARRY location.
;
; PORTS : None used
; PROCEDURES : None used
; REGISTERS : Uses CS, DS, AX
;
DATA SEGMENT
NUM1 DB 25h ; First number
NUM2 DB 80h ; Second number
RESULT DB ? ; Put sum here
CARRY DB
DATA ENDS
CODE SEGMENT
ASSUME CS:CODE, DS:DATA
START:MOV AX, DATA ; Initialise data segment
MOV DS, AX ; register using AX
MOV AL, NUM1 ; Load the first number in AL
ADD AL, NUM2 ; Add 2nd number in AL
61
Assembly Language
Programming
MOV RESULT, AL ; Store the result
RCL AL, 01 ; Rotate carry into LSB
AND AL, 00000001B ; Mask out all but LSB
MOV CARRY, AL ; Store the carry result
MOV AH, 4CH
INT 21H
CODE ENDS
END START
Discussion:
RCL instruction brings the carry into the least significant bit position of the AL
register. The AND instruction is used for masking higher order bits, of the carry, now
in AL.
In a similar manner we can also write applications using other shift instructions.
Let’s look at three examples that show how the flags are set when the numbers are
compared. In example 1 BL is less than 10, so the carry flag is set. In example 2, the
zero flag is set because both operands are equal. In example 3, the destination (BX) is
greater than the source, so both the zero and the carry flags are clear.
Example 1:
Example 2:
Example 3:
In the following section we will discuss an example that uses the flags set by CMP
instruction.
62
Assembly Language
Programming
! Check Your Progress 1 (Part I)
1. In a MOV instruction, the immediate operand value for 8-bit destination cannot
exceed F0h.
4. A single instruction cannot swap the upper and lower four of a byte
register.
In the example above the control of the program will directly transfer to the label
THERE if the value stores in AX register is equal to that of the register BX. The same
example can be rewritten in the following manner, using different jumps.
Example 5:
CMP AX, BX ; compare instruction: sets flags
JNE FIX ; if not equal do addition
JMP THERE ; if equal skip next instruction
FIX: ADD AX, 02 ; add 02 to AX
63
Assembly Language
Programming
THERE: MOV CL, 07
The above code is not efficient, but suggest that there are many ways through which a
conditional jump can be implemented. Select the most optimum way.
Example 6:
CMP DX, 00 ; checks if DX is zero.
JE Label1 ; if yes, jump to Label1 i.e. if ZF=1
Example 7:
MOV AL, 10 ; moves 10 to AL
CMP AL, 20 ; checks if AL < 20 i.e. CF=1
JL Lab1 ; carry flag = 1 then jump to Lab1
LOOPING
; Program 6: Assume a constant inflation factor that is added to a series of prices
; stored in the memory. The program copies the new price over the old price. It is
; assumed that price data is available in BCD form.
; The algorithm:
;Repeat
; Read a price from the array
; Add inflation factor
; Adjust result to correct BCD
; Put result back in array
; Until all prices are inflated
Let us demonstrate the use of LOOP instruction, with the help of following program:
CODE SEGMENT
ASSUME : CS:CODE.
MAINP: MOV CX, 1AH ; 26 in decimal = 1A in hexadecimal Counter.
MOV DL, 41H ; Loading DL with ASCII hexadecimal of A.
NEXTC: MOV AH, 02H ; display result character in DL
INT 21H ; DOS interrupt
INC DL ; Increment DL for next char
LOOP NEXTC ; Repeat until CX=0.(loop automatically decrements
; CS and checks whether it is zero or not)
MOV AX, 4C00H ; Exit DOS
INT 21H ; DOS Call
CODE ENDS
END MAINP
DATA SEGMENT
XX DB ?
YY DB ?
DATA ENDS
CODE SEGMENT
ASSUME CS: CODE, DS: DATA
MAINP: MOV AX, DATA ; initialize data
MOV DS, AX ; segment using AX
MOV CX, 03H ; set counter to 3.
NEXTP: MOV AH, 01H ; Waiting for user to enter a char.
INT 21H
MOV XX, AL ; store the 1st input character in XX
MOV AH, 01H ; waiting for user to enter second
INT 21H ; character.
MOV YY, AL ; store the character to YY
MOV BH, XX ; load first character in BH
MOV BL, YY ; load second character in BL
CMP BH, BL ; compare the characters
JNE NOT_EQUAL ;
65
Assembly Language
Programming
EQUAL: MOV AH, 02H ; if characters are equal then control
MOV DL, ‘Y’ ; will execute this block and
INT 21H ; display ‘Y’
JMP CONTINUE ; Jump to continue loop.
Discussion:
This program will be executed, at least 3 times.
; Program 9: Initialise the smallest and the largest variables as the first number in
; the array. They are then compared with the other array values one by one. If the
; value happens to be smaller than the assumed smallest number or larger than the
; assumed largest value, the smallest and the largest variables are changed with the
; new values respectively. Let us use register DI to point the current array value and
; LOOP instruction for looping.
DATA SEGMENT
ARRAY DW -1, 2000, -4000, 32767, 500,0
LARGE DW ?
SMALL DW ?
DATA ENDS
END.
CODE SEGMENT
MOV AX,DATA
MOV DS,AX ; Initialize DS
MOV DI, OFFSET ARRAY ; DI points to the array
MOV AX, [DI] ; AX contains the first element
MOV DX, AX ; initialize large in DX register
MOV BX, AX ; initialize small in BX register
MOV CX, 6 ; initialize loop counter
A1: MOV AX, [DI] ; get next array value
CMP AX, BX ; Is the new value smaller?
JGE A2 ; If greater then (not smaller) jump to
; A2, to check larger than large in DX
MOV BX, AX ; Otherwise it is smaller so move it to
; the smallest value (BX register)
JMP A3 ; as it is small, thus no need
; to compare it with the large so jump
; to A3 to continue or terminate loop.
66
Assembly Language
A2: CMP AX, DX ; [DI] = large Programming
JLE A3 ; if less than it implies not large so (Part I)
; jump to A3
; to continue or terminate
MOV DX, AX ; otherwise it is larger value, so move
; it to DX that store the large value
A3: ADD DI, 2 ; DI now points to next number
LOOP A1 ; repeat the loop until CX = 0
MOV LARGE, DX
MOV SMALL, BX ; move the large and small in the
; memory locations
MOV AX, 4C00h
INT 21h ; halt, return to DOS
CODE ENDS
Discussion:
Since the data is word type that is equal to 2 bytes and memory organisation is byte
wise, to point to next array value DI is incremented by 2.
As each digit is input, we would store its ASCII code in a memory byte. After the
first number was input the number would be stored as follows:
Each of these numbers will be input as equivalent ASCII digits and need to be
converted either to digit string to a 16-bit binary value that can be used for
computation or the ASCII digits themselves can be added which can be followed by
instruction that adjust the sum to binary. Let us use the conversion operation to
perform these calculations here.
Another important data format is packed decimal numbers (packed BCD). A packed
BCD contains two decimal digits per byte. Packed BCD format has the following
advantages:
! The BCD numbers allow accurate calculations for almost any number of
significant digits.
! Conversion of packed BCD numbers to ASCII (and vice versa) is relatively fast.
! An implicit decimal point may be used for keeping track of its position in a
separate variable.
The instructions DAA (decimal adjust after addition) and DAS (decimal adjust after
subtraction) are used for adjusting the result of an addition of subtraction operation on
67
Assembly Language
Programming
packed decimal numbers. However, no such instruction exists for multiplication and
division. For the cases of multiplication and division the number must be unpacked.
First, multiplied or divided and packed again. The instruction DAA and DAS has
already been explained in unit 1.
Program 10:
; This program converts an ASCII input to equivalent hex digit that it represents.
; Thus, valid ASCII digits are 0 to 9, A to F and the program assumes that the
; ASCII digit is read from a location in memory called ASCII. The hex result is
; left in the AL. Since the program converts only one digit number the AL is
; sufficient for the results. The result in AL is made FF if the character in ASCII
; is not the proper hex digit.
; ALGORITHM
; IF number <30h THEN error
; ELSE
; IF number <3Ah THEN Subtract 30h (it’s a number 0-9)
; ELSE (number is >39h)
; IF number <41h THEN error (number in range 3Ah-40h which is not a valid
; A-F character range)
; ELSE
; IF number <47h THEN Subtract 37h for letter A-F 41-46 (Please note
; that 41h – 37h = Ah)
; ELSE ERROR
;
; PORTS : None used
; PROCEDURES : None
; REGISTERS : Uses CS, DS, AX,
;
DATA SEGMENT
ASCII DB 39h ; Any experimental data
DATA ENDS
CODE SEGMENT
ASSUME CS:CODE, DS:DATA
START: MOV AX, DATA ; initialise data segment
MOV DS, AX ; Register using AX
MOV AL, ASCII ; Get the ASCII digits of the number
; start the conversion
CMP AL, 30h ; If the ASCII digit is below 30h then it is not
JB ERROR ; a proper Hex digit
CMP AL, 3Ah ; compare it to 3Ah
JB NUMBER ; If greater then possibly a letter between A-F
CMP AL, 41h ; This step will be done if equal to or above
; 3Ah
JB ERROR ; Between 3Ah and 40h is error
CMP AL, 46h
JA ERROR ; The ASCII is out of 0-9 and A-F range
SUB AL, 37h ; It’s a letter in the range A-F so convert
JMP CONVERTED
NUMBER: SUB AL, 30h ; it is a number in the range 0-9 so convert
JMP CONVERTED
68
Assembly Language
ERROR: MOV AL, 0FFh ; You can also display some message here Programming
CONVERTED: MOV AX, 4C00h (Part I)
INT 21h ; the hex result is in AL
CODE ENDS
END START
Discussions:
The above program demonstrates a single hex digit represented by an ASCII
character. The above programs can be extended to take more ASCII values and
convert them into a 16-bit binary number.
69
Assembly Language
Programming
The intermediate code in assembly language generated by a non-optimising compiler
for the above piece may look like:
MOV IND, 00 ; ind : = 0
L3: CMP IND, 08 ; ind < 9
JG L1 ; not so; skip
LEA AX, STR1 ; offset of str1 in AX register
MOV BX, IND ; it uses a register for indexing into
; the array
LEA CX, STR2 ; str2 in CX
MOV DL, BYTE PTR CX[BX]
CMP DL, BYTE PTR AX[BX] ; str1[ind] = str2[ind]
JNE L1 ; no, skip
MOV IND, BX
ADD IND, 01
L2: JMP L3 ; loop back
L1:
What we find in the above code: a large code that could have been improved further,
if the 8086 string instructions would have been used.
; Program 11: Matching two strings of same length stored in memory locations.
; REGISTERS : Uses CS, DS, ES, AX, DX, CX, SI, DI
DATA SEGMENT
PASSWORD DB 'FAILSAFE' ; source string
DESTSTR DB 'FEELSAFE' ; destination string
MESSAGE DB 'String are equal $'
DATA ENDS
CODE SEGMENT
ASSUME CS:CODE, DS:DATA, ES:DATA
MOV AX, DATA
MOV DS, AX ; Initialise data segment register
MOV ES, AX ; Initialise extra segment register
; as destination string is considered to be in extra segment. Please note that ES is also
; initialised to the same segment as of DS.
LEA SI, PASSWORD ; Load source pointer
LEA DI, DESTSTR ; Load destination pointer
MOV CX, 08 ; Load counter with string length
CLD ; Clear direction flag so that comparison is
; done in forward direction.
Discussion:
In the above program the instruction CMPSB compares the two strings, pointed by SI
in Data Segment and DI register in extra data segment. The strings are compared byte
by byte and then the pointers SI and DI are incremented to next byte. Please note the
last letter B in the instruction indicates a byte. If it is W, that is if instruction is
CMPSW, then comparison is done word by word and SI and DI are incremented by 2,
70
Assembly Language
that is to next word. The REPE prefix in front of the instruction tells the 8086 to Programming
decrement the CX register by one, and continue to execute the CMPSB instruction, (Part I)
until the counter (CX) becomes zero. Thus, the code size is substantially reduced.
Similarly, you can write efficient programs for moving one string to another, using
MOVS, and scanning a string for a character using SCAS.
A very useful application of assembly is to produce delay loops. Such loops are used
for waiting for some time prior to execution of next instruction.
But how to find the time for the delay? The rate at which the instructions are executed
is determined by the clock frequency. Each instruction takes a certain number of clock
cycles to execute. This, multiplied by the clock frequency of the microprocessor, gives
the actual time of execution of a instruction. For example, MOV instruction takes four
clock cycles. This instruction when run on a microprocessor with a 4Mhz clock takes
4/4, i.e. 1 microsecond. NOP is an instruction that is used to produce the delay,
without affecting the actual running of the program.
1
1 clock cycle =
5MHz
1
= Seconds
5 " 106
Thus, a 1-millisecond delay will require:
1 " 10)3
= clock cycles
( 1 %
& 6 #
' 5 " 10 $
= 5000 clock cycles.
The following program segment can be used to produce the delay, with the counter
value correctly initialised.
LOOP instruction takes 17 clock cycles when the condition is true and 5 clock cycles
otherwise. The condition will be true, ‘N’ number of times and false only once, when
the control comes out of the loop.
To calculate ‘N’:
Total clock cycles = clock cycles for MOV + N(2*NOP clock
cycles + 17) – 12 (when CX = 0)
71
Assembly Language
Programming
5000 = 4 + N(6 + 17) – 12
N = 5000/23 = 218 = 0DAh
Therefore, the counter, CX, should be initialized by 0DAh, in order to get the delay of
1 millisecond.
Use of array in assembly
Let us write a program to add two 5-byte numbers stored in an array. For example,
two numbers in hex can be:
20 11 01 10 FF
FF 40 30 20 10
1 1F 51 31 31 1F
Carry
Let us also assume that the numbers are represented as the lowest significant byte first
and put in memory in two arrays. The result is stored in the third array SUM. The
SUM also contains the carry out information, thus would be 1 byte longer than
number arrays.
DATA SEGMENT
NUM1 DB 0FFh, 10h ,01h ,11h ,20h
NUM2 DB 10h, 20h, 30h, 40h ,0FFh
SUM DB 6DUP(0)
DATA ENDS
LEN EQU 05h ; constant for length of the array
CODE SEGMENT
ASSUME CS:CODE, DS:DATA
START: MOV AX, DATA ; initialise data segment
MOV DS, AX ; using AX register
MOV SI, 00 ; load displacement of 1st number.
; SI is being used as index register
MOV CX, 0000 ; clear counter
MOV CL, LEN ; set up count to designed length
CLC ; clear carry. Ready for addition
AGAIN: MOV AL, NUM1[SI] ; get a byte from NUM1
ADC AL, NUM2[SI] ; add to byte from NUM2 with carry
72
Assembly Language
MOV SUM[SI], AL ; store in SUM array Programming
INC SI (Part I)
LOOP AGAIN ; continue until no more bytes
RCL AL, 01h ; move carry into bit 0 of AL
AND AL, 01h ; mask all but the 0th bit of AL
MOV SUM[SI], AL ; put carry into 6th byte
FINISH: MOV AX, 4C00h
INT 21h
CODE ENDS
END START
CODE SEGMENT
ASSUME CS:CODE, DS:DATA
START: MOV AX, DATA ; initialise data segment
MOV DS, AX ; using AX register
MOV AX, BCD ; get the BCD number AX = 4567
MOV BX, AX ; copy number into BX; BX = 4567
MOV AL, AH ; place for upper 2 digits in AX = 4545
MOV BH, BL ; place for lower 2 digits in BX = 6767
; split up numbers so that we have one digit
; in each register
MOV CL, 04 ; bit count for rotate
ROR AH, CL ; digit 1 (MSB) in lower four bits of AH.
; AX = 54 45
ROR BH, CL ; digit 3 in lower four bits of BH.
; BX = 76 67
AND AX, 0F0FH ; mask upper four bits of each digit.
; AX = 04 05
73
Assembly Language
Programming
AND BX, 0F0FH ; BX = 06 07
MOV CX, AX ; copy AX into CX so that can use AX for
; multiplication CX = 04 05
74
Assembly Language
Programming
3.5 SUMMARY (Part I)
In this unit, we have covered some basic aspects of assembly language programming.
We started with some elementary arithmetic problems, code conversion problems,
various types of loops and graduated on to do string processing and slightly complex
arithmetic. As part of good programming practice, we also noted some points that
should be kept in mind while coding. Some of them are:
In the next block, we take up more advanced assembly language programming, which
also includes accessing interrupts of the machine.
75
Assembly Language
Programming
2. Assuming that each array element is a word variable.
MOV CX, COUNT ; put the number of elements of the array in
; CX register
MOV AX, 0000h ; zero SI and AX
MOV SI, AX
; add the elements of array in AX again and again
AGAIN: ADD AX, ARRAY[SI] ; another way of handling array
ADD SI, 2 ; select the next element of the array
LOOP AGAIN ; add all the elements of the array. It will
terminate when CX becomes zero.
MOV TOTAL, AX ; store the results in TOTAL.
2. Direction flag if clear will cause REPE statement to perform in forward direction.
That is, in the given example the strings will be compared from first element to
last.
76
Assembly Language
Programming
UNIT 4 ASSEMBLY LANGUAGE (Part II)
PROGRAMMING (PART-II)
Structure Page No.
4.0 Introduction 77
4.1 Objectives 77
4.2 Use of Arrays in Assembly 77
4.3 Modular Programming 80
4.3.1 The stack
4.3.2 FAR and NEAR Procedures
4.3.3 Parameter Passing in Procedures
4.3.4 External Procedures
4.4 Interfacing Assembly Language Routines to High Level Language
Programs 93
4.4.1 Simple Interfacing
4.4.2 Interfacing Subroutines With Parameter Passing
4.5 Interrupts 97
4.6 Device Drivers in Assembly 99
4.7 Summary 101
4.8 Solutions/ Answers 102
4.0 INTRODUCTION
In the previous units, we have discussed the instruction set, addressing modes, and
other tools, which are needed to develop assembly language programs. We shall now
use this knowledge in developing more advanced tools. We have divided this unit
broadly into four sections. In the first section, we discuss the design of some simple
data structures using the basic data types. Once the programs become lengthier, it is
advisable to divide them into small modules, which can be easily written, tested and
debugged. This leads to the concept of modular programming, and that is the topic of
our second section in this unit. In the third section, we will discuss some techniques to
interface assembly language programs to high level languages. We have explained the
concepts using C and C ++ as they are two of the most popular high-level languages.
In the fourth section we have designed some tools necessary for interfacing the
microprocessor with external hardware modules.
4.1 OBJECTIVES
After going through this unit, you should be able to:
77
Assembly Language
Programming
An important application of array is the tables that are used to store related
information. For example, the names of all the students in the class, their CGPA, the
list of all the books in the library, or even the list of people residing in a particular area
can be stored in different tables. An important application of tables would be character
translation. It can be used for data encryption, or translation from one data type to
another. A critical factor for such kind of applications is the speed, which just happens
to be a strength of assembly language. The instruction that is used for such kind of
applications is XLAT.
Offset 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
Contents 30 31 32 33 34 35 36 37 38 39 41 42 43 44 45 46
The content of this entry is now moved to the AL register, that is, 41h is moved to AL.
In other words, XLAT sets AL to 41h because this value is located at HEXA table
offset 0Ah. Please note that the 41h is the ASCII code for hex digit A. The following
sequence of instructions would accomplished this:
MOV AL, 0Ah ; index value
MOV BX, OFFSET HEXA ; offset of the table HEXA
XLAT
The above tasks can be done without XLAT instruction but it will require a long series
of instructions such as:
MOV AL, 0Ah ; index value
MOV BX, OFFSET HEXA ; offset of the table HEXA
PUSH BX ; save the offset
ADD BL, AL ; add index value to table
; HEXA offset
MOV AL, [BX] ; retrieve the entry
POP BX ; restore BX
Let us use the instruction XLAT for data encoding. When you want to transfer a
message through a telephone line, then such encoding may be a good way of
preventing other users from reading it. Let us show a sample program for encoding.
78
Assembly Language
PROGRAM 1: Programming
(Part II)
; A program for encoding ASCII Alpha numerics.
; ALGORITHM:
; create the code table
; read an input string character by character
; translate it using code table
; output the strings
DATA SEGMENT
CODETABLE DB 48 DUP (0) ; no translation of first
; 48 ASCII
DB ‘4590821367’ ; ASCII codes 48 –
; 57 ! (30h – 39h)
DB 7 DUP (0) ; no translation of
these 7 characters
DB ‘GVHZUSOBMIKPJCADLFTYEQNWXR’
DB 6 DUP (0) ; no translation
DB ‘gvhzusobmikpjcadlftyeqnwxr’
DB 133 DUP (0) ; no translation of remaining
; character
DATA ENDS
CODE SEGMENT
MOV AX, DATA
MOV DS, AX ; initialize DS
MOV BX, OFFSET CODETABLE ; point to lookup table
GETCHAR:
MOV AH, 06 ; console input no wait
MOV DL, 0FFh ; specify input request
INT 21h ; call DOS
JZ QUIT ; quit if no input is waiting
MOV DL, AL ; save character in DL
XLAT CODETABLE ; translate the character
CMP AL, 0 ; translatable?
JE PUTCHAR ; no : write it as is.
MOV DL, AL ; yes : move new character
; to DL
PUTCHAR:
MOV AH, 02 ; write DL to output
INT 21h
JMP GETCHAR ; get another character
QUIT: MOV AX, 4C00h
INT 21h
CODE ENDS
END
Discussion:
The program above will code the data. For example, a line from an input file will be
encoded:
The program above can be run using the following command line. If the program file
name is coding.asm
coding infile > outfile
79
Assembly Language
Programming
The infile is the input data file, and outfile is the output data file.
You can write more such applications using 8086 assembly tables.
Main Module
Module D Module E
80
Assembly Language
The advantages of modular programming are: Programming
(Part II)
1. Smaller, easier modules to manage
2. Code repetition may be avoided by reusing modules.
You can divide a program into subroutines or procedures. You need to CALL the
procedure whenever needed. A subroutine call transfers the control to subroutine
instructions and brings the control back to calling program.
In 8086 microprocessor a stack is created in the stack segment. The SS register stores
the offset of stack segment and SP register stores the top of the stack. A value is
pushed in to top of the stack or taken out (poped) from the top of the stack. The stack
segment can be initialized as follows:
STACK_ SEG SEGMENT STACK
DW 100 DUP (0)
TOS LABEL WORD
STACK_SEG ENDS
CODE SEGMENT
ASSUME CS:CODE, SS:STACK_SEG
MOV AX, STACK_SEG
MOV SS,AX ; initialise stack segment
LEA SP,TOP ; initialise stack pointer
CODE ENDS
END
The directive STACK_SEG SEGMENT STACK declares the logical segment for the
stack segment. DW 100 DUP(0) assigns actual size of the stack to 100 words. All
locations of this stack are initialized to zero. The stacks are identified by the stack top
and that is why the Label Top of Stack (TOS) has been selected. Please note that the
stack in 8086 is a WORD stack. Stack facilities involve the use of indirect addressing
through a special register, the stack pointer (SP). SP is automatically decremented as
items are put on the stack and incremented as they are retrieved. Putting something on
to stack is called a PUSH and taking it off is called a POP. The address of the last
element pushed on to the stack is known as the top of the stack (TOS).
81
Assembly Language
Programming
join them together in such a way that they can communicate with each other. This
extra code is sometimes referred to as linkage overhead.
A procedure call involves:
1. Unlike other branch instructions, a procedure call must save the address of the
next instruction so that the return will be able to branch back to the proper
place in the calling program.
2. The registers used by the procedures need to be stored before their contents
are changed and then restored just before the procedure is finished.
3. A procedure must have a means of communicating or sharing data with the
procedures that call it, that is parameter passing.
Calls, Returns, and Procedures definitions in 8086
The 8086 microprocessor supports CALL and RET instructions for procedure call.
The CALL instruction not only branches to the indicated address, but also pushes the
return address onto the stack. In addition, it also initialized IP with the address of the
procedure. The RET instructions simply pops the return address from the stack. 8086
supports two kinds of procedure call. These are FAR and NEAR calls.
The NEAR procedure call is also known as Intrasegment call as the called procedure
is in the same segment from which call has been made. Thus, only IP is stored as the
return address. The IP can be stored on the stack as:
.
.
Stack segment base (SS)
Low address
Please note the growth of stack is towards stack segment base. So stack becomes full
on an offset 0000h. Also for push operation we decrement SP by 2 as stack is a word
stack (word size in 8086 = 16 bits) while memory is byte organised memory.
FAR procedure call, also known as intersegment call, is a call made to separate code
segment. Thus, the control will be transferred outside the current segment. Therefore,
both CS and IP need to be stored as the return address. These values on the stack after
the calls look like:
When the 8086 executes the FAR call, it first stores the contents of the code segment
register followed by the contents of IP on to the stack. A RET from the NEAR
procedure. Pops the two bytes into IP. The RET from the FAR procedure pops four
bytes from the stack.
Procedure is defined within the source code by placing a directive of the form:
82
Assembly Language
<Procedure name> PROC <Attribute> Programming
(Part II)
A procedure is terminated using:
<Procedure name> ENDP
The <procedure name> is the identifier used for calling the procedure and the
<attribute> is either NEAR or FAR. A procedure can be defined in:
1. The same code segment as the statement that calls it.
2. A code segment that is different from the one containing the statement that calls
it, but in the same source module as the calling statement.
3. A different source module and segment from the calling statement.
In the first case the <attribute> code NEAR should be used as the procedure and code
are in the same segment. For the latter two cases the <attribute> must be FAR.
Let us describe an example of procedure call using NEAR procedure, which contains
a call to a procedure in the same segment.
PROGRAM 2:
Write a program that collects in data samples from a port at 1 ms interval. The upper 4
bits collected data same as mastered and stored in an array in successive locations.
; REGISTERS :Uses CS, SS, DS, AX, BX, CX, DX, SI, SP
; PROCEDURES : Uses WAIT
DATA_SEG SEGMENT
PRESSURE DW 100 DUP(0) ; Set up array of 100 words
NBR_OF_SAMPLES EQU 100
PRESSURE_PORT EQU 0FFF8h ; hypothetical input port
DATA_SEG ENDS
CODE_SEG SEGMENT
ASSUME CS:CODE_SEG, DS:DATA_SEG, SS:STACK_SEG
START: MOV AX, DATA_SEG ; Initialise data segment register
MOV DS, AX
MOV AX, STACK_SEG ; Initialise stack segment register
MOV SS, AX
MOV SP, OFFSET STACK – TOP ; initialise stack pointer top of
; stack
LEA SI, PRESSURE ; SI points to start of array
; PRESSURE
MOV BX, NBR_OF_SAMPLES ; Load BX with number
; of samples
MOV DX, PRESSURE_PORT ; Point DX at input port
; it can be any A/D converter or
; data port.
83
Assembly Language
Programming
INC SI ; Increment SI by two as dealing with
INC SI ; 16 bit words and not bytes
DEC BX ; Decrement sample counter
JNZ READ_NEXT ; Repeat till 100
; samples are collected
STOP: NOP
WAIT PROC NEAR
MOV CX, 2000H ; Load delay value
; into CX
HERE: LOOP HERE ; Loop until CX = 0
RET
WAIT ENDP
CODE_SEG ENDS
END
Discussion:
Please note that the CALL to the procedure as above does not indicate whether the
call is to a NEAR procedure or a FAR procedure. This distinction is made at the time
of defining the procedure.
The procedure above can also be made a FAR procedure by changing the definition of
the procedure as:
WAIT PROC FAR
.
.
WAIT ENDS
The procedure can now be defined in another segment if the need so be, in the same
assembly language file.
Let us discuss a program that uses a procedure for converting a BCD number to binary
number.
PROGRAM 3:
Conversion of BCD number to binary using a procedure.
Algorithm for conversion procedure:
Take a packed BCD digit and separate the two digits of BCD.
Multiply the upper digit by 10 (0Ah)
Add the lower digit to the result of multiplication
Program 3 (a): Use of registers for parameter passing: This program uses AH register
for passing the parameter.
84
Assembly Language
We are assuming that data is available in memory location. BCD and the result is Programming
stored in BIN (Part II)
DATA_SEG SEGMENT
BCD DB 25h ; storage for BCD value
BIN DB ? ; storage for binary value
DATA_SEG ENDS
STACK_SEG SEGMENT STACK
DW 200 DUP(0) ; stack of 200 words
TOP_STACK LABEL WORD
STACK_SEG ENDS
CODE_SEG SEGMENT
ASSUME CS:CODE_SEG, DS:DATA_SEG, SS:STACK_SEG
START: MOV AX, DATA_SEG ; Initialise data segment
MOV DS, AX ; Using AX register
MOV AX, STACK_SEG ; Initialise stack
MOV SS, AX ; Segment register. Why
; stack?
MOV SP, OFFSET TOP_STACK ; Initialise stack pointer
MOV AH, BCD
CALL BCD_BINARY ; Do the conversion
MOV BIN, AH ; Store the result in the
; memory
:
:
; Remaining program can be put here
;PROCEDURE : BCD_BINARY - Converts BCD numbers to binary.
;INPUT : AH with BCD value
;OUTPUT : AH with binary value
;DESTROYS : AX
85
Assembly Language
Programming
RET ; and return to calling program
BCD_BINARY ENDP
CODE_SEG ENDS
END START
Discussion:
The above program is not an optimum program, as it does not use registers minimally.
By now you should be able to understand this module. The program copies the BCD
number from the memory to the AH register. The AH register is used as it is in the
procedure. Thus, the contents of AH register are used in calling program as well as
procedure; or in other words have been passed from main to procedure. The result of
the subroutine is also passed back to AH register as returned value. Thus, the calling
program can find the result in AH register.
The advantage of using the registers for passing the parameters is the ease with which
they can be handled. The disadvantage, however, is the limit of parameters that can be
passed. For example, one cannot pass an array of 100 elements to a procedure using
registers.
CODE_SEG SEGMENT
ASSUME CS:CODE_SEG, DS:DATA_SEG, SS:STACK_SEG
START: MOV AX, DATA_SEG ; Initialize data
MOV DS, AX ; segment using AX register
MOV AX, STACK_SEG ; initialize stack
MOV SS, AX ; segment. Why stack?
MOV SP, OFFSET TOP_STACK ; initialize stack pointer
; Put pointer to BCD storage in SI and DI prior to procedure call.
MOV SI, OFFSET BCD ; SI now points to BCD_IN
MOV DI, OFFSET BIN ; DI points BIN_VAL
; (returned value)
CALL BCD_BINARY ; Call the conversion
86
Assembly Language
; procedure Programming
NOP ; Continue with program (Part II)
; here
Discussion:
In the program above, SI points to the BCD and the DI points to the BIN. The
instruction MOV AL,[SI] copies the byte pointed by SI to the AL register. Likewise,
MOV [DI], AL transfers the result back to memory location pointed by DI.
This scheme allows you to pass the procedure pointers to data anywhere in memory.
You can pass pointer to individual data element or a group of data elements like arrays
and strings. This approach is used for parameters passing to BIOS procedures.
87
Assembly Language
Programming
PROGRAM 3: Version 3
DATA_SEG SEGMENT
BCD DB 25h ; Storage for BCD test value
BIN DB ? ; Storage for binary value
DATA_SEG ENDS
CODE_SEG SEGMENT
ASSUME CS:CODE_SEG, DS:DATA_SEG, SS:STACK_SEG
START: MOV AX, DATA ; Initialise data segment
MOV DS, AX ; using AX register
MOV AX, STACK-SEG . ; initialise stack segment
MOV SS, AX ; using AX register
MOV SP, OFFSET TOP_STACK ; initialise stack pointer
MOV AL, BCD ; Move BCD value into AL
PUSH AX ; and push it onto word stack
CALL BCD_BINARY ; Do the conversion
POP AX ; Get the binary value
MOV BIN, AL ; and save it
NOP ; Continue with program
; PROCEDURE : BCD_BINARY Converts BCD numbers to binary.
; INPUT : None - BCD value assumed to be on stack before call
; OUTPUT : None - Binary value on top of stack after return
; DESTROYS : Nothing
Discussion:
The parameter is pushed on the stack before the procedure call. The procedure call
causes the current instruction pointer to be pushed on to the stack. In the procedure
flags, AX, BX, CX and BP registers are also pushed in that order. Thus, the stack
looks to be:
The instruction MOV BP, SP transfers the contents of the SP to the BP register. Now
BP is used to access any location in the stack, by adding appropriate offset to it. For
example, MOV AX, [BP + 12] instruction transfers the word beginning at the 12th
byte from the top of the stack to AX register. It does not change the contents of the BP
register or the top of the stack. It copies the pushed value of AH and AL at offset
008Eh into the AX register. This instruction is not equivalent to POP instruction.
Stacks are useful for writing procedures for multi-user system programs or recurvise
procedures. It is a good practice to make a stack diagram as above while using
procedure call through stacks. This helps in reducing errors in programming.
Segment Combinations
In 8086 assembler provides a means for combining the segments declared in different
modules. Some typical combine types are:
1. PUBLIC: This combine directive combines all the segments having the same
names and class (in different modules) as a single combined segment.
2. COMMON: If the segments in different object modules have the same name and
the COMMON combine type then they have the same beginning address. During
execution these segments overlay each other.
89
Assembly Language
Programming
3. STACK: If the segments in different object modules have the same name and the
combine type is STACK, then they become one segment, with the length the sum
of the lengths of individual segments.
These details will be more clear after you go through program 4 and further readings.
Identifiers
a) Access to External Identifiers: An external identifier is one that is referred in
one module but defined in another. You can declare an identifier to be external
by including it on as EXTRN in the modules in which it is to be referred. This
tells the assembler to leave the address of the variable unresolved. The linker
looks for the address of this variable in the module where it is defined to be
PUBLIC.
b) Public Identifiers: A public identifier is one that is defined within one module
of a program but potentially accessible by all of the other modules in that
program. You can declare an identifier to be public by including it on a
PUBLIC directive in the module in which it is defined.
Let us explain all the above with the help of the following example:
PROGRAM 4:
Write a procedure that divides a 32-bit number by a 16-bit number. The procedure
should be general, that is, it is defined in one module, and can be called from another
assembly module.
PUBLIC DIVISOR
90
Assembly Language
MOV SS, AX ; using AX register Programming
MOV SP, OFFSET TOP_STACK ; Initialize stack pointer (Part II)
MOV AX, DIVIDEND ; Load low word of
; dividend
MOV DX DIVIDEND + 2 ; Load high word of
; dividend
MOV CX, DIVISOR ; Load divisor
CALL SMART_DIV
; This procedure returns Quotient in the DX:AX pair and Remainder in CX register.
; Carry bit is set if result is invalid.
JNC SAVE_ALL ; IF carry = 0, result valid
JMP STOP ; ELSE carry set, don’t
; save result
ASSUME DS:MORE_DATA ; Change data segment
SAVE_ALL: PUSH DS ; Save old DS
MOV BX, MORE_DATA ; Load new data segment
MOV DS, BX ; register
MOV QUOTIENT, AX ; Store low word of
; quotient
MOV QUOTIENT + 2, DX ; Store high word of
; quotient
MOV REMAINDER, CX ; Store remainder
ASSUME DS:DATA_SEG
POP DS ; Restore initial DS
JMP ENDING
STOP: MOV DL, OFFSET MESSAGE
MOV AX, AH 09H
INT 21H
ENDING: NOP
CODE_SEG ENDS
END START
Discussion:
The linker appends all the segments having the same name and PUBLIC directive
with segment name into one segment. Their contents are pulled together into
consecutive memory locations.
The next statement to be noted is PUBLIC DIVISOR. It tells the assembler and the
linker that this variable can be legally accessed by other assembly modules. On the
other hand EXTRN SMART_DIV:FAR tells the assembler that this module will
access a label or a procedure of type FAR in some assembly module. Please also note
that the EXTRN definition is enclosed within the PROCEDURES SEGMENT
PUBLIC and PROCEDURES ENDS, to tell the assembler and linker that the
procedure SMART_DIV is located within the segment PROCEDURES and all such
PROCEDURES segments need to be combined in one. Let us now define the
PROCEDURE module:
; PROGRAM MODULE PROCEDURES
Discussion:
The procedure accesses the data item named DIVISOR, which is defined in the main,
therefore the statement EXTRN DIVISOR:WORD is necessary for informing
assembler that this data name is found in some other segment. The data type is defined
to be of word type. Please not that the DIVISOR is enclosed in the same segment
name as that of main that is DATA_SEG and the procedure is in a PUBLIC segment.
(b) A FAR call uses one word in the stack for storing the return address.
(c) While making a call to a procedure, the nature of procedure that is NEAR
or FAR must be specified.
(d) Parameter passing through register is not suitable when large numbers of
parameters are to be passed.
92
Assembly Language
Programming
(f) Parameter passing through stack is used whenever assembly language (Part II)
programs are interfaced with any high level language programs.
(i) A segment if declared PUBLIC informs the linker to append all the
segments with same name into one.
What are the main considerations for interfacing assembly to HLL? To answer that we
need to answer the following questions:
The answer to the above questions are dependent on the high level language (HLL).
Let us take C Language as the language for interfacing. The C Language is very
useful for writing user interface programs, but the code produced by a C compiler
does not execute fast enough for telecommunications or graphics applications.
Therefore, system programs are often written with a combination of C and assembly
language functions. The main user interface may be written in C and specialized high
speed functions written in assembly language.
93
Assembly Language
Programming
You must give a specific segment name to the code segment of your assembly
language subroutine. The name varies from compiler to compiler. Microsoft C,
and Turbo C require the code segment name to be_TEXT or a segment name
with suffix_TEXT. Also, it requires the segment name _DATA for the data
segment.
(iii) The arguments from C to the assembly language are passed through the stack.
For example, a function call in C:
function_name (arg1, arg2, ..., argn) ;
would push the value of each argument on the stack in the reverse order. That
is, the argument argn is pushed first and arg1 is pushed last on the stack. A
value or a pointer to a variable can also be passed on the stack. Since the stack
in 8086 is a word stack, therefore, values and pointers are stored as words on
stack or multiples of the word size in case the value exceeds 16 bits.
(iv) You should remember to save any special purpose registers (such as CS, DS,
SS, ES, BP, SI or DI) that may be modified by the assembly language routine. If
you fail to save them, then you may have undesirable/ unexplainable
consequences, when control is returned to the C program. However, there is no
need to save AX, BX, CX or DX registers as they are considered volatile.
(v) Please note the compatibility of data types:
char Byte (DB)
int Word (DW)
long Double Word (DD)
(vi) Returned value: The called assembly routine uses the followed registers for
returned values:
char AL
Near/ int AX
Far/ long DX : AX
Let us now look into some of the examples for interfacing.
94
Assembly Language
refer to Assembler manuals on details on models of C program. The models primarily Programming
differ in number of segments). (Part II)
PROGRAM 5:
Write an assembly function that hides the cursor. Call it from a C program.
. PUBLIC CUROFF
. MODEL small,C
. CODE
CUROFF PROC
MOV AH,3 ; get the current cursor position
XOR BX,BX ; empty BX register
INT 10h ; use int 10hto do above
OR CH,20h ; force to OFF condition
MOV AH,01 ; set the new cursor values
INT 10h
RET
CUROFF ENDP
END
For details on various interrupt functions used in this program refer to further
readings.
You can write another procedure in assembly language program to put the cursor on.
This can be done by replacing OR CH,20h instruction by AND CH,1Fh. You can call
this new function from C program to put the cursor on after the curoff.
Why the parameter is found in [BP+4]? Please look into the following stack for the
answer.
Parameter (0 or 1) BP + 4
Return Address BP + 2
Old value BP + 0
PROGRAM 7:
Write a subroutine in C that toggles the cursor. It takes one argument that toggles the
value between on (1) and off (0) using simplified directives:
PUBLIC CURSW
.MODEL small, C
.CODE
CURSW PROC switch:word
96
Assembly Language
switch off or switch on the cursor // Programming
: (Part II)
:
CURSW ENDP
END
In a similar manner the variables can be passed in C as pointers also. Values can be
returned to C either by changing the variable values in the C data segment or by
returning the value in the registers as given earlier.
4.5 INTERRUPTS
Interrupts are signals that cause the central processing unit to suspend the currently
executing program and transfer to a special program called an interrupt handler. The
interrupt handler determines the cause of the interrupt, services the interrupt, and
finally returns the control to the point of interrupt. Interrupts are caused by events
external or internal to the CPU that require immediate attention. Some external events
that cause interrupts are:
- Completion of an I/O process
- Detection of a hardware failure
How can we write an Interrupt Servicing Routine? The following are the basic but
rigid sequence of steps:
1. Save the system context (registers, flags etc. that will be modified by the ISR).
2. Disable the interrupts that may cause interference if allowed to occur during this
ISR's processing
3. Enable those interrupts that may still be allowed to occur during this ISR
processing.
4. Determine the cause of the interrupt.
5. Take the appropriate action for the interrupt such as – receive and store data
from the serial port, set a flag to indicate the completion of the disk sector
transfer, etc.
6. Restore the system context.
7. Re-enable any interrupt levels that were blocked during this ISR execution.
8. Resume the execution of the process that was interrupted on occurrence of the
interrupt.
MS-DOS provides you facilities that enable you to install well-behaved interrupt
handlers such that they will not interfere with the operating system function or other
interrupt handlers. These functions are:
Function Action
Int 21h function 25h Set interrupt vector
Int 21h function 35h Get interrupt vector
Int 21h function 31h Terminate and stay residents
97
Assembly Language
Programming
Here are a few rules that must be kept in mind while writing down your own Interrupt
Service Routines:
1. Use Int 21h, function 35h to get the required IVT entry from the IVT. Save this
entry, for later use.
2. Use Int 21h, function 25h to modify the IVT.
3. If your program is not going to stay resident, save the contents of the IVT, and
later restore them when your program exits.
4. If your program is going to stay resident, use one of the terminate and stay
resident functions, to reserve proper amount of memory for your handler.
Let us now write an interrupt routine to handle “division by zero”. This file can be
loaded like a COM file, but makes itself permanently resident, till the system is
running.
This ISR is divided into two major sections: the initialisation and the interrupt
handler. The initialisation procedure (INIT) is executed only once, when the program
is executed from the DOS level. INIT takes over the type zero interrupt vector, it also
prints a sign-on message, and then performs a terminate and “stay resident exit” to
MS-DOS. This special exit reserves the memory occupied by the program, so that it is
not overwritten by subsequent application programs. The interrupt handler (ZDIV)
receives control when a divide-by-zero interrupt occurs.
CR EQU ODH ; ASCII carriage return
LF EQU 0Ah ; ASCII line feed
BEEP EQU 07h ; ASCII beep code
BACKSP EQU 08h ; ASCII backspace code
1) The leader
2) The strategy procedure
99
Assembly Language
Programming
3) The interrupt procedure
The driver has either .sys or .exe extension and is originated at offset address 0000h.
The Header
The header contains information that allows DOS to identify the driver. It also
contains pointers that allow it to chain to other drivers loaded into the system.
The header section of a device driver is 18 bytes in length and contains pointers and
the name of the driver.
The first double word contains a –1 that informs DOS this is the last driver in the
chain. If additional drivers are added DOS inserts a chain address in this double word
as the segment and offset address. The chain address points to the next driver in the
chain. This allows additional drivers installed at any time.
The attribute word indicates the type of headers included for the driver and the type of
device the driver installs. It also indicates whether the driver control a character driver
or a block device.
The request header contains the length of the request header as its first byte. This is
necessary because the length of the request header varies from command to command.
The return status word communicate information back to DOS from the device driver.
The initialise driver command (00H) is always executed when DOS initialises the
device driver. The initialisation commands pass message to the video display
indicating that the driver is loaded into the system and returns to DOS the amount of
memory needed by the driver. You may only use DOS INT 21H functions 00H. You
can get more details on strategy from the further readings.
100
Assembly Language
Programming
! Check Your Progress 3 (Part II)
(e) Hardware interrupts can be invoked with the help of INT function.
4.7 SUMMARY
In the above module, we studied some programming techniques, starting from arrays,
to interrupts.
Arrays can be of byte type or word type, but the addressing of the arrays is always
done with respect to bytes. For a word array, the address will be incremented by two
for the next access.
As the programs become larger and larger, it becomes necessary to divide them into
smaller modules called procedures. The procedures can be NEAR or FAR depending
upon where they are being defined and from where they are being called. The
parameters to the procedures can be passed through registers, or through memory or
stack. Passing parameters in registers is easier, but limits the total number of variables
that can be passed. In memory locations it is straight forward, but limits the use of the
procedure. Passing parameters through stack is most complex out of all, but is a
standard way to do it. Even when the assembly language programs are interfaced to
high level languages, the parameters are passed on stack.
Interrupt Service Routines are used to service the interrupts that could have arisen
because of some exceptional condition. The interrupt service routines can be
modified- by rewriting them, and overwriting their entry in the interrupt vector table.
101
Assembly Language
Programming
4.8 SOLUTIONS/ ANSWERS
Check Your Progress 1
1. We will give you an algorithm using XLAT instruction. Please code and run the
program yourself.
2.
SP " . .
SP " 00 00
50 50
30
00
50
" 55
Low address
Original after (a) after (b)
(c) The return for FIRST can occur only after return of SECOND. Therefore, the
stack will be back in original state.
2.
! Save the system context
! Block any interrupt, which may cause interference
! Enable allowable interrupts
! Determine the cause of interrupt
! Take appropriate action
! Restore system context
! Enable interrupts which were blocked in Step 2
102