0% found this document useful (0 votes)
35 views553 pages

ECE2015 - CA - All Slides

The document provides information about a computer architecture course taught by Dr. Ellison at VIT-AP. It includes the contact details of the professor, pre-requisites for the course covering digital logic and data representation, objectives and expected outcomes of the course, evaluation criteria, textbook and reference books. It also gives an overview of topics that will be covered like computer organization, structure and function, operations, history of computers from first to second generation.

Uploaded by

Nitya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views553 pages

ECE2015 - CA - All Slides

The document provides information about a computer architecture course taught by Dr. Ellison at VIT-AP. It includes the contact details of the professor, pre-requisites for the course covering digital logic and data representation, objectives and expected outcomes of the course, evaluation criteria, textbook and reference books. It also gives an overview of topics that will be covered like computer organization, structure and function, operations, history of computers from first to second generation.

Uploaded by

Nitya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 553

Computer Architecture

ECE 2015

DR. M. S. ELLISON
ASSOCIATE PROFESSOR
SENSE

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 1


Contact Details
Email: [email protected]
Phone: +91-9491902516
Cabin: CB-206

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 2


Course Pre-requisites
Digital logic design
Chap. 1: Digital Logic Circuits
• Logic Gates, • Boolean Algebra
• K-Map Simplification, • Combinational Circuits
• Flip-Flops, • Sequential Circuits
Please refer M. Morris Mano, Computer System
Chap. 2: Digital Components Architecture, Pearson Education, Third Edition
• Integrated Circuits, • Decoders, • Multiplexers
• Registers, • Shift Registers, • Binary Counters
• Memory Unit
Chap. 3: Data Representation
• Data Types • Complements • Fixed Point Representation
• Floating Point Representation
• Other Binary Codes

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 3


Objectives: 1. To familiarize students about hardware design including logic design, basic structure and
behavior of the various functional modules of the computer and how they interact to provide
the processing needs of the user.
2. To obtain a working knowledge of assembly language
Expected Outcome: On completion of the course, students will have the ability to
1. Understand the overview of basic computer architecture.
2. Learn basic computer logic: adders, multipliers, ALU, and memory.
3. Learn basic assembly language programming, basic Instruction Set Architecture (ISA), and the design of
single cycle CPU
4. Understand Parallel processors, RISC and CISC architecture.
Mode of Evaluation Practice/Quiz Tests-20%, Continuous Assessment Tests-60%, Practical Assesment-20%

Quiz Test 20%


Continuous Assessment Test-1 20%
Continuous Assessment Test-2 20%
Continuous Assessment Test-3 20%
Practical Assessment (Mini Project) 20%
Open Hours Will be displayed

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 4


Text Book:

William Stallings, Computer Organization and


Architecture: Designing for Performance,
Pearson Education, Tenth Edition,2013

Reference Books:
1. M. Morris Mano, Rajib Mall, Computer System Architecture, Pearson Education Third Edition,2017.
2. Carl Hamacher, Zvonkovranesic, Safwat Zaky , Computer Organization, McGraw Hill, Fifth Edition,2011.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 5


DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 6
Brain

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 7


DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 8
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 9
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 10
Architecture & Organization
Architecture is those attributes visible to the programmer
◦ Instruction set, number of bits used for data representation, I/O mechanisms, addressing techniques.
◦ e.g. Is there a multiply instruction?

Computer Architecture is also referred as Instruction set architecture (ISA) which has an algorithm to control various
components.
ISA

Instruction Instruction Instruction and


Register
format opcode Data memory

Organization is how features are implemented by interconnecting the operational units to realize the specific architectural
specifications.
◦ Control signals, interfaces, memory technology.
◦ e.g. Is there a hardware multiply unit or is it done by repeated addition?

All Intel x86 family share the same basic architecture


The IBM System/370 family share the same basic architecture
This gives code compatibility
Organization differs between different versions
Ex: Pipelining

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 11


Structure & Function
Structure is the way in which components relate to each other
Function is the operation of individual components as part of the structure

Function Structure

Data Data Data System


Control CPU Main memory I/O
processing storage movement interconnection

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 12


Function
All computer functions are:
◦ Data processing
◦ Data storage
◦ Data movement
◦ Control

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 13


Operations (1) Data movement

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 14


Operations (2) Storage

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 15


Operation (3) Processing from/to
storage

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 16


Operation (4) Processing from storage to I/O

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 17


Structure
The Computer
◦ CPU
◦ Controls the operation of the computer and performs its data
processing functions.
◦ Main memory
◦ Stores data
◦ Fast Page Mode RAM
◦ Synchronous DRAM
◦ Extended data output) RAM
◦ I/O
◦ Moves data between the computer and its external environment
◦ System interconnection
◦ Provides for communication among CPU, main memory, and I/O
Structure - Top Level

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 19


Multicore computer

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 20


Review Questions
1. What, in general terms, is the distinction between computer organization and computer architecture?
2. What, in general terms, is the distinction between computer structure and computer function?
3. What are the four main functions of a computer?
4. List and briefly define the main structural components of a computer.
5. List and briefly define the main structural components of a processor.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 21


History of computers
First Generation: Vacuum Tubes
Second Generation: Transistors
Third Generation: Integrated circuits
Later generations: LSI and VLSI

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 22


First Gen: Vacuum tubes
ENIAC (Electronic Numerical Integrator and
Computer)
Built in 1943 for WW-II
Weighed 30 tons, occupying 1500 square
feet of floor space, and containing more than
18,000 vacuum tubes
Consumed 140 kilowatts of power
Faster than any electromechanical
computer, capable of 5000 additions per
second

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 23


ENIAC
It was a decimal computer, rather than a binary one
Its memory consisted of 20 “accumulators,” each capable of holding a 10-digit
decimal number
A ring of 10 vacuum tubes represented each digit
The major drawback is that it had to be programmed manually by setting switches and plugging
and unplugging cables

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 24


Von Neumann machine
Program could be represented in a form suitable for storing in memory alongside
the data
A computer could get its instructions by reading them from memory
This program could be set or altered by setting the values of a portion of memory
This idea, known as the stored-program concept, usually attributed to
the mathematician John von Neumann
Shortly, the design of a new stored program computer, referred to as the IAS computer, at the
Princeton Institute for Advanced Studies
Took 6 Years to build and is the prototype for all the subsequent general-purpose computers

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 25


Second GEN: Transistors
Invented at Bell Labs in 1947; by 1950s  electronic revolution.
Late 1950s  fully transistorized computers were commercially available
Smaller, cheaper, and dissipates less heat; can be used in the same way as a vacuum tube to
construct computers
The second generation introduced more complex ALUs and CUs
Use of high-level programming languages, and system software with the computer.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 26


DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 27
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 28
Third gen: Integrated circuits
A single, self-contained transistor is called a discrete component
1950s to early 1960s, electronic equipment was composed largely of discrete components—
transistors, resistors, capacitors, and so on.
Discrete components were manufactured separately  packaged in their own containers 
soldered or wired together onto masonite-like circuit boards  then installed in computers,
oscilloscopes, and other electronic equipment.
Whenever an electronic device needed a transistor replacement, a small piece of silicon had to
be soldered to a circuit board; made the manufacturing process expensive and cumbersome.
The Early second-generation computers contained about 10,000 transistors. This figure grew to
the hundreds of thousands; manufacturing of newer, more powerful machines became
increasingly difficult.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 29


Third gen: Integrated circuits
1958  invention of the Integrated circuit
The integrated circuit exploits the fact that  components as
transistors, resistors, and conductors can be fabricated from a
semiconductor such as silicon.
Fabrication of an entire circuit in a tiny piece of silicon; rather than
assemble discrete components using separate pieces of silicon
Many transistors can be produced at the same time on a single
wafer of silicon; and can be connected with a process of
metallization to form circuits

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 30


Integrated circuits
Initially, only a few gates or memory
cells could be reliably manufactured
and packaged together; known as SSI
As time went on, it became possible
to pack more and more components
on the same chip.
This figure reflects the famous
Moore’s law, which was propounded
by Gordon Moore, cofounder of
Intel, in 1965.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 31


Integrated circuits
Moore observed that the #Transistors was doubling every year; correctly
predicted that this pace would continue
The pace continued year after year and decade after decade; it slowed to a
doubling every 18 months in the 1970s but has sustained that rate ever since
Consequences of Moore’s law:
The cost of a chip has remained virtually unchanged during this period of rapid growth in
density. This means that the cost of computer logic and memory circuitry has fallen at a
dramatic rate.
Because logic and memory elements are placed closer together on more densely packed
chips, the electrical path length is shortened, increasing operating speed.
The computer becomes smaller, making it more convenient to place in a variety of
environments.
There is a reduction in power and cooling requirements.
The interconnections on the integrated circuit are much more reliable than solder
connections. With more circuitry on each chip, there are fewer interchip connections
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 32
Later generations

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 33


Later generations
With the introduction of largescale integration (LSI), more than 1000 components can be
placed on a single integrated circuit chip.
Very-large-scale integration (VLSI) achieved more than 10,000 components per chip, while
current ultra-large-scale integration (ULSI) chips can contain more than one million components.
The first application of integrated circuit technology to computers was construction of the
processor
It was also found that this same technology could be used to construct memories

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 34


Later generations
In the 1950s and 1960s, most computer memory was constructed from tiny rings of
ferromagnetic materials
Magnetized one way, a ring (called a core) represented a one; magnetized the other way, it
stood for a zero.
Magnetic-core memory was rather fast; it took as little as a millionth of a second to read a bit
stored in memory.
But it was expensive, bulky, and used destructive readout: The simple act of reading a core
erased the data stored in it.
It was therefore necessary to install circuits to restore the data as soon as it had been
extracted.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 35


Later generations
Then, in 1970, Fairchild produced the first semiconductor memory.
This chip, about the size of a single core, could hold 256 bits of memory.
It was nondestructive and much faster than core. It took only 70 billionths of a second to
read a bit.
However, the cost per bit was higher than for that of core
In 1974, the price per bit of semiconductor memory dropped below the price per bit of core
memory.
Following this, there has been a continuing and rapid decline in memory cost accompanied
by a corresponding increase in physical memory density.
This has led the way to smaller, faster machines with memory sizes of larger and more
expensive machines from just a few years earlier.
Developments in memory technology, together with developments in processor technology
changed the nature of computers in less than a decade.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 36


Microprocessors
A breakthrough was achieved in 1971, when Intel developed its 4004.
The 4004 was the first chip to contain all of the components of a CPU on a single chip: The
microprocessor was born.
The 4004 can add two 4-bit numbers and can multiply only by repeated addition.
By today’s standards, the 4004 is hopelessly primitive, but it marked the beginning of a
continuing evolution of microprocessor capability and power.
The next major step in the evolution of the microprocessor was the introduction in 1972 of the
Intel 8008.
This was the first 8-bit microprocessor and was almost twice as complex as the 4004.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 37


Microprocessors
The introduction, in 1974, of the Intel 8080.
This was the first general-purpose microprocessor.
The 4004 and the 8008 had been designed for specific applications, the 8080 was designed to
be the CPU of a general-purpose microcomputer.
Like the 8008, the 8080 is an 8-bit microprocessor.
It was faster, has a richer instruction set, and has a large addressing capability.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 38


Microprocessors
About the same time, 16-bit microprocessors began to be developed.
However, it was not until the end of the 1970s that powerful, general-purpose 16-bit
microprocessors appeared. One of these was the 8086.
The next step in this trend occurred in 1981, when both Bell Labs and Hewlett-Packard
developed 32-bit, single-chip microprocessors.
Intel introduced its own 32-bit microprocessor, the 80386, in 1985

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 39


DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 40
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 41
Speeding it up
• Pipelining
• On board cache
• On board L1 & L2 cache
• Branch prediction
• Data flow analysis

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 42


Performance Balance
• Processor speed increased
• Memory capacity increased
• Memory speed lags behind processor speed

Logic and Memory Performance Gap

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 43


Solutions
• Increase number of bits retrieved at one time
• Make DRAM “wider” rather than “deeper”
• Change DRAM interface
• Cache
• Reduce frequency of memory access
• More complex cache and cache on chip
• Increase interconnection bandwidth
• High speed buses
• Hierarchy of buses

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 44


I/O Devices
• Peripherals with intensive I/O demands
• Large data throughput demands
• Processors can handle this
• Solutions:
• Caching
• Higher-speed interconnection buses
• More elaborate bus structures
• Multiple-processor configurations

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 45


Typical I/O Device Data Rates

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 46


Key is Balance
• Processor components
• Main memory
• I/O devices
• Interconnection structures

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 47


Improvements in Chip Organization and
Architecture
• Increase hardware speed of processor
• Fundamentally due to shrinking logic gate size
• More gates, packed more tightly, increasing clock rate
• Propagation time for signals reduced

• Increase size and speed of caches


• Dedicating part of processor chip
• Cache access times drop significantly

• Change processor organization and architecture


• Increase effective speed of execution
• Parallelism

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 48


Problems with Clock Speed and Logic Density
• Power
• Power density increases with density of logic and clock speed
• Dissipating heat
• RC delay
• Speed at which electrons flow limited by resistance and capacitance of metal wires
connecting them
• Delay increases as RC product increases
• Wire interconnects thinner, increasing resistance
• Wires closer together, increasing capacitance
• Memory latency
• Memory speeds lag processor speeds
• Solution:
• More emphasis on organizational and architectural approaches

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 49


Intel Microprocessor Performance

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 50


Increased Cache Capacity
• Typically two or three levels of cache between processor and main memory
• Chip density increased
• More cache memory on chip
• Faster cache access

• Pentium chip devoted about 10% of chip area to cache


• Pentium 4 devotes about 50%

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 51


More Complex Execution Logic
• Enable parallel execution of instructions
• Pipeline works like assembly line
• Different stages of execution of different instructions at same time along pipeline
• Superscalar allows multiple pipelines within single processor
• Instructions that do not depend on one another can be executed in parallel

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 52


Diminishing Returns
• Internal organization of processors complex
• Can get a great deal of parallelism
• Further significant increases likely to be relatively modest
• Benefits from cache are reaching limit
• Increasing clock rate runs into power dissipation problem
• Some fundamental physical limits are being reached

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 53


New Approach – Multiple Cores
• Multiple processors on single chip
• Large shared cache
• Within a processor, increase in performance proportional to square root of
increase in complexity
• If software can use multiple processors, doubling number of processors almost
doubles performance
• So, use two simpler processors on the chip rather than one more complex
processor
• With two processors, larger caches are justified
• Power consumption of memory logic less than processing logic

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 54


IAS Computer

A main memory, which stores both data and instructions


An arithmetic and logic unit (ALU) capable of operating on binary data
A control unit, which interprets the instructions in memory and causes them to
be executed
Input and output (I/O) equipment operated by the control unit

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 55


Working of IAS
The memory of the IAS consists of 1000 storage locations, called words, of 40 binary digits
(bits) each.
Both data and instructions are stored there.
Numbers are represented in binary form, and each instruction is a binary code.
Each number is represented by a sign bit and a 39-bit value.
A word may also contain two 20-bit instructions, with each instruction consisting of an 8-bit
operation code (opcode) specifying the operation to be performed and a 12-bit address
designating one of the words in memory (numbered from 0 to 999).

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 56


EXAMPLE

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 57


Registers in IAS
Memory buffer register (MBR): Contains a word to be stored in memory or sent to the I/O unit,
or is used to receive a word from memory or from the I/O unit.
Memory address register (MAR): Specifies the address in memory of the word to be written
from or read into the MBR.
Instruction register (IR): Contains the 8-bit opcode instruction being executed.
Instruction buffer register (IBR): Employed to hold temporarily the righthand instruction from
a word in memory.
Program counter (PC): Contains the address of the next instruction-pair to be fetched from
memory.
Accumulator (AC) and multiplier quotient (MQ): Employed to hold temporarily operands and
results of ALU operations.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 58


DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 59
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 60
Computer function
Basic function performed by the computer is execution of the program  set of instructions
stored in memory
Instruction processing consists of two steps
Processor reads / Fetches
Executing the fetched instruction

Processing required for a single instruction is known as instruction cycle

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 61


Instruction fetch and execute
At the beginning of each instruction cycle, the processor fetches an instruction from memory
A register called the program counter (PC) holds the address of the instruction to be fetched
next
Unless told otherwise, the processor always increments the PC after each instruction fetch so
that it will fetch the next instruction in sequence i.e. the instruction located in the next higher
memory address
The fetched instruction is loaded into a register in the processor known as the
instruction register (IR)
The instruction contains bits that specify the action the processor needs to take
The processor interprets the instruction and performs the required action

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 62


Example
Suppose a processor contains a
single data register, called an
accumulator (AC)
Both instructions and data are 16
bits long
The instruction format provides 4
bits for the opcode  16 opcodes
and 4096 (4K) memory

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 63


DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 64
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 65
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 66
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 67
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 68
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 69
Example
In this example, three instruction cycles, each consisting of a fetch cycle and an execute cycle,
are needed to add the contents of location 940 to the contents of 941.
With a more complex set of instructions, fewer cycles would be needed.
Some older processors, for example, included instructions that contain more than one memory
address.
For example, the PDP-11 processor expressed symbolically as ADD B,A
Fetch the ADD instruction.
Read the contents of memory location A into the processor.
Read the contents of memory location B into the processor. In order that the contents of A are not lost,
the processor must have at least two registers for storing memory values, rather than a single
accumulator.
Add the two values.
Write the result from the processor to memory location A.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 70


DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 71
States
Instruction address calculation (iac): Determine the address of the next instruction to be
executed.
Instruction fetch (if): Read instruction from its memory location into the processor.
Instruction operation decoding (iod): Analyze instruction to determine type of operation to be
performed and operand(s) to be used.
Operand address calculation (oac): If the operation involves reference to an operand in
memory or available via I/O, then determine the address of the operand.
Operand fetch (of): Fetch the operand from memory or read it in from I/O.
Data operation (do): Perform the operation indicated in the instruction.
Operand store (os): Write the result into memory or out to I/O.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 72


States
The oac state appears twice, because an instruction may involve a read, a write, or both.
Example  ADD A,B results in the following sequence of states: iac, if, iod, oac, of, oac, of, do,
oac, os.
A single instruction can specify an operation to be performed on an array of numbers or a
string of characters  involves repetitive operand fetch and store operations

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 73


IAS structure

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 74


DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 75
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 76
Example of addition
1. LOAD M(X) 500, ADD M(X) 501 (PC=1) 2. STOR M(X) 500, OTHER INSTRUCTION (PC=2)

MAR<PC MAR<PC
MBR<M[MAR] MBR<M[MAR]
IBR<MBR[20:39] IBR<MBR[20:39]
IR<MBR[0:7] IR<MBR[0:7]
MAR<MBR[8:19] MAR<MBR[8:19]
MBR<M[MAR] MBR<AC
AC<MBR M[MAR]<MBR
IR<IBR[0:7]
MAR<IBR[8:19]
PC<PC+1
MBR<M[MAR]
AC<AC+MBR

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 77


P1

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 78


P2

P3

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 79


Answer
P1
This program will store the absolute value of content at memory location 0FA into memory location 0FB.

P2 OPCODE OPERAND
00000001 000000000010
First, the CPU must make access memory to fetch the instruction. The instruction contains the address of the
data we want to load. During the execute phase accesses memory to load the data value located at that
address for a total of two trips to memory.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 80


Answer

P3
The vectors A, B, and C are each stored in
1,000 continuous locations in memory,
beginning at locations 1001, 2001, and 3001,
respectively.
The program begins with the left half of
location 3.
A counting variable N is set to 999 and
decremented after each step until it reaches -
1.
Thus, the vectors are processed from high
location to low location.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 81


DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 82
Example 3
main ()
{
int a=15, b=5, c;
if (a >= b)
c = a – b;
else
c = a + b;
}

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 83


Example 3 (continued)
• Optimized

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 84


0 15 a
15 b
Example3 ( with a > b) 2c
31
4 begin
5.a>b
main () { 5 load M(0)
int a=15, b=5, c; 6 sub M(1)
7 sub M(3)
if (a > b) 8 jump+ M(10)
c = a – b; 9 jump M(14)
10 . True, c = a- b
else 10 load M(0)
c = a + b; 11 sub M(1)
12 stor M(2)
} 13 jump M(17)
14 . False, c = a + b
14 load M(0)
15 add M(1)
16 stor M(2)
17 halt
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 85
Example 6
main () { Give it a try.
int a=2, b=2, I;
I = 1;
while (I < 10) {
a = a +b;
I = I +1;
}
}

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 86


Example 6 (continued)
main () {
int a=2, b=2, I;
I = 1;
while (I < 10) {
a = a +b;
I = I +1;
}
}

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 87


Problem
Write a program to add 5 numbers located at consecutive memory locations starting at address 500. Add
another 5 numbers located at consecutive memory locations starting at address 600. Subtract the lower
value from the higher value and store it at location having address 700.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 88


Problem
Write a program to add 5 numbers located at consecutive memory locations starting at address 500. Add
another 5 numbers located at consecutive memory locations starting at address 600. Subtract the lower
value from the higher value and store it at location having address 700.
Sol: 800: LOAD M(500) [Left Instr] 803: ADD M(601) [Right Instr] 807: LOAD -M(702) [Left Instr]
800: ADD M(501) [Right Instr] 804: ADD M(602) [Left Instr] 807: STOR M(700) [Right Instr]
801: ADD M(502) [Left Instr] 804: ADD M(603) [Right Instr]
801: ADD M(503) [Right Instr] 805: ADD M(604) [Left Instr]
802: ADD M(504) [Left Instr] 805: SUB M(701) [Right Instr]
802: STOR M(701) [Right Instr] 806: JUMP+ M(807,20:39) [Left Instr]
803: LOAD M(600) [Left Instr] 806: STOR M(702) [Right Instr]

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 89


DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 90
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 91
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 92
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 93
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 94
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 95
Performance Assessment
 Performance is one of the key parameters to consider, along with cost, size, security, reliability, and, in some cases
power consumption.
 Application performance depends not just on the raw speed of the processor, but on the instruction set, choice of
implementation language, efficiency of the compiler, and skill of the programming.
 Clock Speed
 The System Clock: The most fundamental level, the speed of a processor is dictated by the pulse frequency
produced by the clock, measured in cycles per second, or Hertz (Hz).
 Clock signals are generated by a quartz crystal
 The rate of pulses is known as the clock rate, or clock speed
 One increment, or pulse, of the clock is referred to as a clock cycle, or a clock tick.
 The time between pulses is the cycle time.
 For example, a 1-GHz processor receives 1 billion pulses per second.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 96


Performance Assessment
 Millions of instructions per second (MIPS) or MIPS rate
 The System Clock: The most fundamental level, the speed of a processor is dictated by the pulse frequency
produced by the clock, measured in cycles per second, or Hertz (Hz).

CPI: Cycle per instruction


f: Clock frequency (number of cycle per second)
Example: Consider a program having 500 million instructions, running on a 400 MHz processor. The program
consists of three major types of instructions - ALU related, load/store, and branching. These instructions require
1, 2, and 4 CPI with a instruction mix of 60, 30, and 10% respectively in the program. Estimate the MIPS of the
processor.
Solution: CPI=0.6x1+0.3x2+0.1x4=1.6
MIPS=(400x106)/(1.6x106)=250 MIPS
 Similarly there are other performance parameters.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 97


DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 98
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 99
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 100
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 101
Integer representation
Arbitrary numbers can be represented with just the digits zero and one, the minus sign, and the
period

For computer storage and processing  minus signs and periods cannot be used
Only binary digits (0 and 1) may be used to represent numbers
If limited to non-negative integers, the representation is straightforward

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 102


Sign-Magnitude representation
There are several conventions used to represent negative as well as positive integers  all of
which involve treating the most significant (leftmost) bit in the word as a sign bit.
If the sign bit is 0  the number is positive; if the sign bit is 1  the number is negative
The simplest form of representation that employs a sign bit is the sign-magnitude
representation
In an n-bit word, the rightmost bits hold the magnitude of the integer

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 103


Sign-Magnitude representation
General case

DRAWBACKS
Addition and subtraction requires consideration of both the signs of the numbers and their
relative magnitudes to carry out the required operation
Another drawback is that there are two representations of 0

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 104


DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 105
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 106
2’s complement representation
2’s complement representation also uses the MSB as a sign bit

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 107


2’s complement representation
Consider an n-bit integer, A, in 2’s complement representation
If A is positive, then the sign bit, is zero. The remaining bits represent the magnitude of
the number in the same fashion as for sign magnitude representation

The number zero is identified as positive and therefore has a 0 sign bit and a magnitude of all
0s.
The range of positive integers that may be represented is from 0 (all of the magnitude bits are
0) through (all of the magnitude bits are 1).
Any larger number would require more bits

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 108


2’s complement representation
Now, for a negative number the sign bit, is one
The remaining bits can take on any one of values
Therefore, the range of negative integers that can be represented is from
The weight of the most significant bit is
This is the convention used in 2’s complement representation, yielding the following expression
for negative numbers

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 109


2’s complement representation
For , the term and the equation
defines a non-negative integer
 When , the term is subtracted from the
summation term, yielding a negative integer

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 110


Working
Consider an n-bit sequence of binary digits interpreted as a 2’s complement
integer A, so that its value is

If A is a positive number, the rule clearly works. Now, if A is negative and we want
to construct an m-bit representation, with m > n. Then

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 111


Working

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 112


2’s complement using value box
representation
The nature of 2’s complement representation is a value box, in which the value on the far right
in the box is
Each succeeding position to the left is double in value, until the leftmost position, which is
negated
The most negative 2’s complement number that can be represented is i.e. if any of the
bits other than the sign bit is one, it adds a positive amount to the number

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 113


DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 114
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 115
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 116
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 117
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 118
Converting between different bit lengths
It is sometimes desirable to take an n-bit integer and store it in m bits, where m > n
In sign-magnitude notation, this is easily accomplished: simply move the sign bit to
the new leftmost position and fill in with zeros.

 For 2’s complement negative numbers

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 119


Converting between different bit lengths
The rule for 2’s complement integers is to move the sign bit to the new leftmost position and
fill in with copies of the sign bit.
For positive numbers, fill in with zeros, and for negative numbers, fill in with ones. This is called
sign extension.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 120


Integer arithmetic: negation
In sign-magnitude representation, the rule for forming the negation of an integer is simple:
invert the sign bit
In 2’s complement notation,
Take the Boolean complement of each bit of the integer
Treating the result as an unsigned binary integer, add 1

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 121


Integer arithmetic: Addition

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 122


Integer arithmetic: Addition

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 123


Integer arithmetic: Addition

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 124


Integer arithmetic: subtraction

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 125


Integer arithmetic: subtraction

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 126


Integer arithmetic: subtraction

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 127


DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 128
Integer arithmetic: multiplication
Compared with addition and subtraction, multiplication is a complex operation, whether
performed in hardware or software
A wide variety of algorithms have been used in various computers
Multiplication of unsigned integers
Multiplication involves the generation of partial products, one for each digit in the multiplier. These
partial products are then summed to produce the final product
The partial products are easily defined. When the multiplier bit is 0, the partial product is 0. When the
multiplier bit is 1, the partial product is the multiplicand
The total product is produced by summing the partial products. For this operation, each successive
partial product is shifted one position to the left relative to the preceding partial product
The multiplication of two n-bit binary integers results in a product of up to 2n bits in length (e.g., 11 x 11
= 1001)

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 129


Integer arithmetic: multiplication
Compared with the pencil-and-paper approach, there are several things we can do to make
computerized multiplication more efficient
First, we can perform a running addition on the partial products rather than waiting until the
end.
For each 1 on the multiplier, an add and a shift operation are required; but for each 0, only a
shift is required.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 130


Array multiplication
Let us consider the multiplicand to be M = (A3, A2, A1, A0) and the multiplier to be Q = (B3, B2,
B1, B0)

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 131


Hardware configuration of 4*4 Array multiplier

For m-bit*n-bit multiplier


we need m*n AND gates, n
Half adders, and (m-2)*n
Full adders

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 132


Hardware configuration of 4*4 Array multiplier

For m-bit*n-bit
multiplier we need m*n
AND gates, n Half
adders, and (m-2)*n
Full adders

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 133


Signed multiplication using array
multiplier

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 134


Multiplication

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 135


Multiplication

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 136


Multiplying Negative Numbers
This does not work! Eg. If we multiply 11 (1011) by 13 (1101) we should get 143 (10001111).
If we interpret these as two’s complement numbers, we have -5 (1011) times
-3(1101) which equals -113(10001111) which is incorrect.
It will also not work if either the multiplicand or the multiplier is negative.
If 9 and 3 are treated as unsigned integers, the multiplication proceeds simply fig a.
But if 1001 is interpreted as the
twos complement value -7, then
each partial product must be a
negative twos complement number of 2n(8) bits, as shown in Figure b.
Note: this is accomplished by padding out each partial product to the left with binary 1s

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 137


If the multiplier is negative
If the multiplier is negative, straightforward multiplication also will not work.
The reason is that the bits of the multiplier no longer correspond to the shifts or multiplications that must take
place.
For example, the 4-bit decimal number -3 is written 1101 in twos complement. If we simply took partial products
based on each bit position, we have

instead of

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 138


Solutions to the above issues
Solution 1
◦ Convert to positive if required
◦ Multiply as above
◦ If signs were different, negate answer

Solution 2
◦ Booth’s algorithm

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 139


Booth Multiplication

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 140


Booth Multiplication: Example

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 141


Division

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 142


Division Restoring Algorithm

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 143


Non-restoring Algorithm

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 144


Signed Division process
To deal with negative numbers, the remainder is defined by

Consider the following examples of integer division with all possible combinations of signs of D
and V

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 145


Floating Point Numbers
Now we have seen unsigned and signed integers. In real life we also need to be able represent
numbers with fractional parts (like: -12.5 & 45.39).

 Called Floating Point numbers.


 We will learn the IEEE 32-bit floating point representation.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 146


Floating Point Numbers
In the decimal system, a decimal point (radix point) separates the whole numbers from the
fractional part
Examples:
37.25 ( whole = 37, fraction = 25/100)
123.567
10.12345678

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 147


Floating Point Numbers
For example, 37.25 can be analyzed as:

101 100 10-1 10-2


Tens Units Tenths Hundredths
3 7 2 5

37.25 = (3 x 10) + (7 x 1) + (2 x 1/10) + (5 x 1/100)

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 148


Binary Equivalence
The binary equivalent of a floating point number can be determined by computing the
binary representation for each part separately.

1) For the whole part:


Use subtraction or division method previously learned.
2) For the fractional part:
Use the subtraction or multiplication method (to be shown
next)
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 149
Fractional Part – Multiplication Method
In the binary representation of a floating point number the column values will be as
follows:

… 25 24 23 22 21 20 . 2-1 2-2 2-3 2-4 …


… 32 16 8 4 2 1 . 1/2 1/4 1/8 1/16…
… 32 16 8 4 2 1 . .5 .25 .125 .0625…

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 150


Fractional Part – Multiplication Method
Ex 1. Find the binary equivalent of 0.25
Step 1: Multiply the fraction by 2 until the fractional part becomes
0
.25
x2
0.5
x2
1.0

Step 2: Collect the whole parts in forward order. Put


them after the radix point
. .5 .25 .125 .0625
. 0 1 DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 151
Fractional Part – Multiplication Method
Ex 2. Find the binary equivalent of 0.625
Step 1: Multiply the fraction by 2 until the fractional part becomes 0
.625
x 2
1.25
x 2
0.50
x 2
1.0
Step 2: Collect the whole parts in forward order. Put them
after the radix point
. .5 .25 .125 .0625
. 1 0 1
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 152
Fractional Part – Subtraction Method

Start with the column values again, as follows:

… 20 . 2-1 2-2 2-3 2-4 2-5 2-6…


… 1 . 1/2 1/4 1/8 1/16 1/32 1/64…
… 1 . .5 .25 .125 .0625 .03125 .015625…

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 153


Fractional Part – Subtraction Method
Starting with 0.5, subtract the column values from left to right.
Insert a 0 in the column if the value cannot be subtracted or 1
if it can be. Continue until the fraction becomes .0

Ex 1.

.25 .5 .25 .125 .0625


- .25 .0 1
.0

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 154


Binary Equivalent of FP number
Ex 2. Convert 37.25, using subtraction method.
64 32 16 8 4 2 1 . .5 .25 .125 .0625
26 25 24 23 22 21 20 . 2-1 2-2 2-3 2-4
1 0 0 1 . 00 1 1
37 .25
- 32 - .25 5
.0
-4
1
-1 37.2510 = 100101.012
0
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 155
Binary Equivalent of FP number
Ex 3. Convert 18.625, using subtraction method.
64 32 16 8 4 2 1 . .5 .25 .125 .0625
26 25 24 23 22 21 20 . 2-1 2-2 2-3 2-4
1 0 0 1 0 1 0 1

18 .625
- 16 - .5
2 .125
- 2 - .125
0 0
18.62510 = 10010.1012
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 156
Problem storing binary form

We have no way to store the radix point!

Large numbers will take so much space

Standards committee came up with a way to store floating point numbers (that have a decimal
point)

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 157


IEEE Floating Point Representation

Floating point numbers can be stored into 32-bits, by dividing the


bits into three parts:
the sign, the exponent, and the mantissa.

1 2 9 10 32

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 158


Sign
Base = 2
Exponent = Value of the 8bit exponent – Bias
(where Bias = 2k-1-1, k = no. of bits in the exponent)

Significand

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 159


Floating point representation
Any floating-point number can be expressed in many ways
To simplify operations on floating-point numbers, it is typically required that they
be normalized
A normalized number is one in which the most significant digit of the significand is nonzero
For base 2 representation, a normalized number is therefore one in which the most
significant bit of the significand is one
The typical convention is that there is one bit to the left of the radix point

where b is a binary digit (either 0 or 1)


Thus, the 23-bit field is used to store a 24-bit significand with a value in the half open
interval [1, 2)

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 160


Sample problems
Convert the following decimal numbers into binary using IEEE 754 floating point representation.
(i) -3347.7991 x 221
(ii) 157.4773 x 2-11
(iii) -1234.5997 x 217
(iv) -488.6791 x 2-31

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 161


Floating point representation
In the IBM base-16 format, the exponent is stored to represent 5 rather than 20

The advantage of using a larger exponent is that a greater range can be achieved for the same
number of exponent bits.
However, a larger exponent base gives a greater range at the expense of less precision.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 162


Floating point Arithmetic

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 163


DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 164
ADDITION ALGORITHM

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 165


Floating point arithmetic
For addition and subtraction, it is necessary to ensure that both operands have the same
exponent value.
This may require shifting the radix point on one of the operands to achieve alignment.
Multiplication and division are more straightforward.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 166


Floating point arithmetic
Exponent overflow: A positive exponent exceeds the maximum possible exponent value. In
some systems, this may be designated as or +∞ to - ∞
Exponent underflow: A negative exponent is less than the minimum possible exponent
value (e.g., is less than ). This means that the number is too small to be represented, and it
may be reported as 0.
Significand underflow: In the process of aligning significands, digits may flow off the right
end of the significand. As we shall discuss, some form of rounding is required.
Significand overflow: The addition of two significands of the same sign may result in a carry
out of the most significant bit. This can be fixed by realignment, as we shall explain.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 167


Floating point arithmetic: Addition and
subtraction

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 168


Floating point arithmetic: Addition and
subtraction
Phase 1: Zero check. Because addition and subtraction are identical except for a
sign change, the process begins by changing the sign of the subtrahend if it is a
subtract operation. Next, if either operand is 0, the other is reported as the result.
Phase 2: Significand alignment. The next phase is to manipulate the numbers so that the
two exponents are equal.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 169


Floating point arithmetic: Addition and
subtraction
Phase 3: Addition. Next, the two significands are added together, taking into account their
signs. Because the signs may differ, the result may be 0. There is also the possibility of
significand overflow by 1 digit. If so, the significand of the result is shifted right and the
exponent is incremented. An exponent overflow could occur as a result; this would be
reported and the operation halted.
Phase 4: Normalization. The final phase normalizes the result. Normalization consists of
shifting significand digits left until the most significant digit (bit, or 4 bits for base-16
exponent) is nonzero. Each shift causes a decrement of the exponent and thus could cause
an exponent underflow. Finally, the result must be rounded off and then reported. We defer
a discussion of rounding until after a discussion of multiplication and division.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 170


Floating point arithmetic: Addition and
subtraction

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 171


ADDITION & SUBTRACTION

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 172


MULTIPLICATION

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 173


DIVISION

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 174


Precision considerations
The exponent and significand are stored in ALU registers.
The length of the register is almost always greater than the length of the significand plus an
implied bit.
The register contains additional bits, called guard bits, which are used to pad out the right end
of the significand with 0s.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 175


DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 176
Rounding
The result of any operation on the significands is generally stored in a longer register.
When the result is put back into the floating-point format, the extra bits must be disposed of.
Round to nearest: The result is rounded to the nearest representable number.
Round toward +∞: The result is rounded up toward plus infinity.
Round toward -∞: The result is rounded down toward negative infinity.
Round toward 0: The result is rounded toward zero.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 177


Rounding
Round to nearest is the default rounding mode listed in the standard and is defined as follows:
The representable value nearest to the infinitely precise result shall be delivered.
If the extra bits, beyond the 23 bits that can be stored, are 10010, then the extra bits amount to
more than one-half of the last representable bit position.
In this case, the correct answer is to add binary 1 to the last representable bit, rounding up to
the next representable number.
If the extra bits are 01111, they are less than one-half of the last representable bit position.
The correct way is simply to drop the extra bits (truncate).
If the result of a computation is exactly midway between two representable numbers, the value
is rounded up if the last representable bit is currently 1 and not rounded up if it is currently 0.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 178


Rounding
Rounding to plus and minus infinity provides an efficient method for monitoring and
controlling errors in floating-point computations by producing two values for each result.
The two values correspond to the lower and upper endpoints of an interval that contains the
true result.
If the range between the upper and lower bounds is sufficiently narrow, then a sufficiently
accurate result has been obtained.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 179


Rounding
Rounding toward zero is, in fact, simple truncation: The extra bits are ignored.
However, the result is that the magnitude of the truncated value is always less than or equal to
the more precise original value, and it affects every operation for which there are non-zero extra
bits.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 180


DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 181
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 182
Special BIT patterns
An exponent of zero together with a fraction of zero represents positive or negative zero,
depending on the sign bit. As was mentioned, it is useful to have an exact value of 0
represented.
An exponent of all ones together with a fraction of zero represents positive or negative infinity,
depending on the sign bit. It is also useful to have a representation of infinity. This leaves it up to
the user to decide whether to treat overflow as an error condition or to carry the value and
proceed with whatever program is being executed.
An exponent of zero together with a nonzero fraction represents a denormalized number. In
this case, the bit to the left of the binary point is zero and the true exponent is -126 or -1022.
The number is positive or negative depending on the sign bit.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 183


Special BIT patterns
An exponent of all ones together with a nonzero fraction is given the value NaN,
which means Not a Number, and is used to signal various exception conditions.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 184


8086 microprocessor
The 8086 Microprocessor has two units
1. Bus Interface Unit
2. Execution Unit

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 185


DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 186
Bus interface Unit
The Bus Interface Unit consists of different units such as the Instruction Queue, Segment
Registers, Instruction Pointer, Address adder.
It interfaces the processor to the outside  performing all the all external bus operations
like fetch, read, write, input and output of data.
The BIU uses instruction queue for pipelined instructions  6-byte First-in-First-out register.
It provides a full 16-bit bidirectional data bus and 20-bit address bus.
Specifically, it performs Instruction fetch, Instruction queuing, Operand fetch and storage,
Address relocation and Bus control.
The BIU uses a mechanism known as an instruction stream queue to implement a pipeline
architecture.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 187


Bus interface Unit
This queue permits prefetch of up to six bytes of instruction code.
Whenever the queue of the BIU is not full, it has room for at least two more bytes and at
the same time the EU is not requesting it to read or write operands from memory, the BIU is
free to look ahead in the program by prefetching the next sequential instruction.
These prefetching instructions are held in its FIFO queue.
With its 16-bit data bus, the BIU fetches two instruction bytes in a single memory cycle.
After a byte is loaded at the input end of the queue, it automatically shifts up through the
FIFO to the empty location nearest the output.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 188


Execution Unit
The Execution Unit consists of the following units such as Control circuitry, Instruction
decoder, ALU, Pointer and Index register, Flag register.
The EU decodes and executes the instructions fetched by the BIU.
It extracts instructions from the top of the queue in the BIU, decodes them, generates
operands if necessary, passes them to the BIU and requests it to perform the read or write
bus cycles to memory or I/O and perform the operation specified by the instruction on the
operands.
If the BIU is already in the process of fetching an instruction when the EU requests it to
read or write operands from memory or I/O, the BIU first completes the instruction fetch
bus cycle before initiating the operand read / write cycle.
It also tests the status and control flags and updates these flags based on the results of the
instruction.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 189


Software Model of the 8086 Microprocessors

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 190


Register organization
8086 has a powerful set of registers of 16-bit each
The registers are categorized into 4 groups
1. General data registers
2. Segment registers
3. Pointer and index registers
4. Flag register

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 191


General Registers
8086 contains 4 general data registers AX, BX, CX, and DX.
They are used to hold data, variables, results etc., temporarily for faster operation.

AX - the Accumulator
BX - the Base Register
CX - the Count Register
DX - the Data Register

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 192


General Registers
AX is used as a 16-bit accumulator, with the lower 8-bits designated as AL and higher 8-bits as
AH for 8-bit operations.
It performs all the arithmetic and logic operations and it is also used to store the result of any
operation.
BX register is used as a general purpose register as well as to store the offset for forming
physical address in certain addressing modes.
CX register is used as a default counter in case of string and loop instructions.
It is also used for the count of the number of bits by which the contents of an operand must be
shifted or rotated during the execution of the multibit shift or rotate instructions.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 193


General Registers
DX register is used in I/O operations to hold the address of the I/O port.
DX register also holds the remainder after a word division and holds the high-order bits (MSB)
of the result after a word multiplication (32-bit).

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 194


Segment Registers
There are 4 segment registers
1. Code segment
2. Data segment
3. Stack segment
4. Extra segment

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 195


Segment Registers
Code segment (CS) register is used to address a memory location in the code segment of
memory where the executable program or instructions are stored
Stack segment (SS) register is used for addressing stack segment of memory which is used to
store stack data.
The data segment (DS) register points to the data segment of the memory where the data is
stored
The extra segment (ES) register points to the extra segment of the memory. This is used as
another data segment for extra data storage.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 196


Pointers and index Registers

•All 16 bits wide, L/H bytes separately are not accessible

•Used as memory pointers


◦ Example: MOV AH, [SI]
◦ Move the byte stored in memory location whose address is contained in register SI to register AH

•IP is not under direct control of the programmer


DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 197
Pointers and index Registers
The function of IP is similar to a program counter, but it contains the offset address instead of
the actual address of the next instruction.
It contains the offset address within the code segment and the IP is combined with the CS
register to generate the actual address of the next instruction to be executed.
Stack pointer also contains the offset value which is added to the SS register to obtain the
actual address of the stack segment.
Base pointer is similar to the SP since it also contains the offset value pointing to the stack
segment.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 198


Pointers and index Registers
The index registers are used as general purpose registers as well as for offset storage purpose.
The source index register is used to store the offset of the source data in the data segment.
And the destination index register is used to store the offset of the destination data in the data
or extra segment.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 199


FLAG Register

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 200


FLAG Registers
Overflow flag is based on the (n-1)th bit carry of the ALU result.
Overflow occurs when signed numbers are added or subtracted.
For 8-bit operation, if there is carry from the D6 bit to the D7 bit, then the overflow flag is set
Similarly, for 16-bit operation, if there is carry from the D14 bit to the D15 bit, then the
overflow flag is set
Trap flag is when the processor enters into single step mode or else it is reset.
In single step mode, the processor executes one instruction at a time and it is useful for
debugging programs.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 201


FLAG Registers
Interrupt flag is set when an maskable interrupt or INTR is received by the processor.
Direction flag is used for string manipulation instructions i.e. the direction flag selects the
increment or decrement mode for DI and SI registers in string instructions
If DF = 1; then the registers are automatically decremented, or else DF = 0 then the registers
are incremented.
Carry Flag is set when there is a carry generated from the MSB bit addition. Otherwise, CF=0.
Auxillary flag is set when there is a carry from D3 bit to D4 bit in 8-bit operations / D7 bit to D8
bit in 16-bit operations. If there is no carry generated for these bits, then AF=0.
Zero flag is set when the result of ALU operation is 0. Otherwise, for any non-zero value, ZF = 0.
Sign flag is set when the result of ALU operation has 1 as its MSB. Otherwise, SF = 0.
Parity flag is set when the result of ALU operation has even number of 1’s. Otherwise, PF = 0.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 202


Addressing modes
Addressing mode is the way of locating the data or operands, the types of operands used and
the way they are accessed for executing an instruction.
Based on the flow of instructions, the instructions in 8086 can be categorized as
1. Sequential control flow instructions
2. Control transfer instructions
Sequential control flow are the instructions in which after execution, the control is transferred
to the next instruction appearing immediately after it in the program. Eg. Arithmetic
instructions, logical, data transfer, and processor control instructions.
Control transfer instructions transfer their control to some predefined address after their
execution. Eg. INT, CALL, RET, and JUMP instructions.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 203


Addressing modes
1. Register addressing mode 1. Intrasegment direct addressing mode
2. Immediate addressing mode 2. Intrasegment indirect addressing mode
3. Direct addressing mode 3. Intersegment direct addressing mode
4. Register indirect addressing mode 4. Intersegment indirect addressing mode
5. Register relative addressing mode
6. Indexed addressing mode
7. Based indexed addressing mode
8. Relative based indexed addressing mode

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 204


Sequential Control flow modes
1. Register addressing mode
2. Immediate addressing mode
3. Direct addressing mode
4. Register indirect addressing mode
5. Register relative addressing mode
6. Indexed addressing mode
7. Based indexed addressing mode
8. Relative based indexed addressing mode

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 205


Register addressing mode
In this mode, both the operands are specified by registers.
Eg. MOV AX, BX
All the registers can be used in this mode.
Both the source and destination registers should be of the same size
A segment to segment movement of data is not allowed

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 206


Immediate addressing mode
In this mode, the source operand is specified as immediate data byte or word and the
destination is either a register or a memory location.
Eg. MOV AL, 22H; MOV BX, 3456H
All the registers can be used in this mode.
Both the source and destination should be of the same size

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 207


Direct addressing mode
In this mode, the source is a memory location and the destination is a register.
Eg. MOV AL, [1234H]; MOV BX, [5000H]
Here, a 16-bit memory address i.e. the offset address is directly specified in the instruction as a
part of it.
The content of the physical address which is formed from the offset address is the source data.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 208


Register indirect addressing mode
Register indirect addressing mode allows data to be addressed at any memory location using
the offset registers: BP, BX, DI or SI
DS is the default segment when the registers BX, DI or SI are used.
SS is the default segment when the register BP is used.
Eg. MOV AX, [BX]

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 209


Register relative addressing mode
In this mode, the data in a segment of memory are addressed by adding an 8-bit or 16-bit
displacement to the contents of a base register (BX or BP) or an index register (SI or DI).
ES and DS are the default segments when the registers BX, DI or SI are used.
SS is the default segment when the register BP is used.
Eg. MOV AX, 1000H [BX]

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 210


Indexed addressing mode
In this mode, the offset address of the operand is stored in one of the index registers like SI or
DI.
ES and DS are the default segments for DI or SI
Eg. MOV AX, [SI]

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 211


Based-Indexed addressing mode
In this mode, one base register (BX or BP) and one index register (SI or DI) are used to indirectly
address memory.
The effective address is formed by adding contents of a base register to the contents of the
index register.
Eg. MOV AX, [BX] [DI]

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 212


Relative Based-Indexed addressing mode
It is similar to the based indexed mode, but it adds a displacement along with the base register
and index register to form the memory address
The effective address is formed by adding the 8-bit or 16-bit displacement with the addition
result of the base register and the index register.
Eg. MOV AX, 2000H [BX] [DI]

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 213


DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 214
Control transfer instruction modes
1. Intrasegment direct addressing mode
2. Intrasegment indirect addressing mode
3. Intersegment direct addressing mode
4. Intersegment indirect addressing mode

If the address location to which the control is to be transferred lies in a different segment
other than the current one, the mode is called intersegment mode.
If the destination lies in the same segment, the mode is called intrasegment mode

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 215


Intrasegment direct mode
In this mode, the effective branch address to which the control is to be transferred lies in the
same segment in which the control transfer instruction lies and it appears directly in the
instruction as an immediate displacement value.
The effective branch address is the sum of an 8-bit or 16-bit displacement in the current
contents of IP.
Eg. JMP [02]( Eff. offset Addr = [IP] + [02] )

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 216


Intrasegment indirect mode
In this mode, the effective branch address is the contents of the register or memory location
that is accessed using any one of the data addressing modes.
The contents of the IP will be replaced by the effective branch address
Eg. JMP BX ( Eff. offset Addr = [IP] + [BX] )
In this instruction, the control is jumped to an address specified by the 16-bit register.
The value of IP+BX is copied into IP with CS value unchanged.
Then the physical address of the next instruction is obtained using the current content of CS
and new value of IP

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 217


Intersegment direct mode
This addressing mode is used to provide means of branching from one segment to another
segment.
Eg. JMP 2000H: 3000H
i.e. the JMP instruction loads CS with 2000H and loads IP with 3000H

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 218


Intersegment indirect mode
This addressing mode replaces the contents of the IP and CS with the contents of two
consecutive words in memory that are addressed using indirect addressing.
Eg. JMP [5000H] or JMP [BX or SI or DI]
i.e. the contents of [5000H] & [5000H+1] in DS is loaded into IP and loads the contents of
[5000H +2] & [5000H +3] in DS into CS.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 219


What is an Instruction Set?
The complete collection of instructions that are understood by a CPU
Machine Code
Binary
Usually represented by programmer as assembly codes

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 220


Elements of an Instruction
Operation code (Op code)
◦ Do this

Source Operand reference


◦ To this

Result Operand reference


◦ Put the answer here

Next Instruction Reference


◦ When you have done that, do this...

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 221


Instruction Representation
In machine code each instruction has a unique bit pattern
For human consumption (well, programmers anyway) a symbolic representation is used
◦ e.g. ADD, SUB, LOAD

Operands can also be represented in this way


◦ ADD A,B

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 222


Simple Instruction Format

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 223


Instruction Types
Data processing
Data storage (main memory)
Data movement (I/O)
Program flow control

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 224


Number of Addresses (a)
3 addresses
◦ Operand 1, Operand 2, Result
◦ a = b + c;
◦ May be a forth - next instruction (usually implicit)
◦ Not common
◦ Needs very long words to hold everything

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 225


Number of Addresses (b)
2 addresses
◦ One address doubles as operand and result
◦ a=a+b
◦ Reduces length of instruction
◦ Requires some extra work
◦ Temporary storage to hold some results

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 226


Number of Addresses (c)
1 address
◦ Implicit second address
◦ Usually a register (accumulator)
◦ Common on early machines

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 227


Number of Addresses (d)
0 (zero) addresses
◦ All addresses implicit
◦ Uses a stack
◦ e.g. push a
◦ push b
◦ add
◦ pop c

◦ c=a+b

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 228


DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 229
Instruction Set
8086 supports 6 types of instructions.

1. Data Transfer Instructions

2. Arithmetic Instructions

3. Logical Instructions

4. String manipulation Instructions

5. Process Control Instructions

6. Control Transfer Instructions

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 230


Instruction Set
1. Data Transfer Instructions

Instructions that are used to transfer data/ address in to


registers, memory locations and I/O ports.

Generally involve two operands: Source operand and


Destination operand of the same size.

Source: Register or a memory location or an immediate data


Destination : Register or a memory location.

The size should be a either a byte or a word.

A 8-bit data can only be moved to 8-bit register/ memory


and a 16-bit data can be moved to 16-bit register/ memory.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 231


8086 Microprocessor

Instruction Set
1. Data Transfer Instructions

Mnemonics: MOV, XCHG, PUSH, POP, IN, OUT …


MOV reg2/ mem, reg1/ mem

MOV reg2, reg1 (reg2)  (reg1)


MOV mem, reg1 (mem)  (reg1)
MOV reg2, mem (reg2)  (mem)

MOV reg/ mem, data

MOV reg, data (reg)  data


MOV mem, data (mem)  data

XCHG reg2/ mem, reg1

XCHG reg2, reg1 (reg2)  (reg1)


XCHG mem, reg1 (mem)  (reg1)

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 232


Stack operations
Data is placed on the stack using the PUSH instruction and removed from the stack using the
POP instruction.
When an item is pushed onto the stack, the processor decrements the SP register, then writes
the item at the new top of stack.
When an item is popped off the stack, the processor reads the item from the top of stack, then
increments the SP register.
Therefore, the stack grows down in memory (towards lesser addresses) when items are pushed
on the stack and shrinks up (towards greater addresses) when the items are popped from the
stack.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 233


The Stack

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 234


PUSH/POP
These instructions are used to copy a word on top of the stack or remove the word from top of
the stack in the register specified.
The operand in both (PUSH and POP) instructions can be a general purpose register, segment
register(except CS) or a memory location.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 235


DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 236
Instruction Set
1. Data Transfer Instructions

Mnemonics: MOV, XCHG, PUSH, POP, IN, OUT …


PUSH reg16/ mem

PUSH reg16 (SP)  (SP) – 2


MA S = (SS) x 1610 + SP
(MA S ; MA S + 1)  (reg16)

PUSH mem (SP)  (SP) – 2


MA S = (SS) x 1610 + SP
(MA S ; MA S + 1)  (mem)

POP reg16/ mem

POP reg16 MA S = (SS) x 1610 + SP


(reg16)  (MA S ; MA S + 1)
(SP)  (SP) + 2

POP mem MA S = (SS) x 1610 + SP


(mem)  (MA S ; MA S + 1)
(SP)  (SP) + 2

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 237


Instruction Set
1. Data Transfer Instructions

Mnemonics: MOV, XCHG, PUSH, POP, IN, OUT …


IN A, [DX] OUT [DX], A

IN AL, [DX] PORTaddr = (DX) OUT [DX], AL PORTaddr = (DX)


(AL)  (PORT) (PORT)  (AL)

IN AX, [DX] PORTaddr = (DX) OUT [DX], AX PORTaddr = (DX)


(AX)  (PORT) (PORT)  (AX)

IN A, addr8 OUT addr8, A

IN AL, addr8 (AL)  (addr8) OUT addr8, AL (addr8)  (AL)

IN AX, addr8 (AX)  (addr8) OUT addr8, AX (addr8)  (AX)

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 238


Instruction Set
2. Arithmetic Instructions
Mnemonics: ADD, ADC, SUB, SBB, INC, DEC, MUL, DIV, CMP…
ADD reg2/ mem, reg1/mem

ADD reg2, reg1 (reg2)  (reg1) + (reg2)


ADD reg2, mem (reg2)  (reg2) + (mem)
ADD mem, reg1 (mem)  (mem)+(reg1)

ADD reg/mem, data

ADD reg, data (reg)  (reg)+ data


ADD mem, data (mem)  (mem)+data

ADD A, data

ADD AL, data8 (AL)  (AL) + data8


ADD AX, data16 (AX)  (AX) +data16

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 239


Instruction Set
2. Arithmetic Instructions
Mnemonics: ADD, ADC, SUB, SBB, INC, DEC, MUL, DIV, CMP…
ADC reg2/ mem, reg1/mem

ADC reg2, reg1 (reg2)  (reg1) + (reg2)+CF


ADC reg2, mem (reg2)  (reg2) + (mem)+CF
ADC mem, reg1 (mem)  (mem)+(reg1)+CF

ADC reg/mem, data

ADC reg, data (reg)  (reg)+ data+CF


ADC mem, data (mem)  (mem)+data+CF

ADC A, data

ADC AL, data8 (AL)  (AL) + data8+CF


ADC AX, data16 (AX)  (AX) +data16+CF

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 240


Instruction Set
2. Arithmetic Instructions
Mnemonics: ADD, ADC, SUB, SBB, INC, DEC, MUL, DIV, CMP…
SUB reg2/ mem, reg1/mem

SUB reg2, reg1 (reg2)  (reg1) - (reg2)


SUB reg2, mem (reg2)  (reg2) - (mem)
SUB mem, reg1 (mem)  (mem) - (reg1)

SUB reg/mem, data

SUB reg, data (reg)  (reg) - data


SUB mem, data (mem)  (mem) - data

SUB A, data

SUB AL, data8 (AL)  (AL) - data8


SUB AX, data16 (AX)  (AX) - data16

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 241


Instruction Set
2. Arithmetic Instructions
Mnemonics: ADD, ADC, SUB, SBB, INC, DEC, MUL, DIV, CMP…
SBB reg2/ mem, reg1/mem

SBB reg2, reg1 (reg2)  (reg1) - (reg2) - CF


SBB reg2, mem (reg2)  (reg2) - (mem)- CF
SBB mem, reg1 (mem)  (mem) - (reg1) –CF

SBB reg/mem, data

SBB reg, data (reg)  (reg) – data - CF


SBB mem, data (mem)  (mem) - data - CF

SBB A, data

SBB AL, data8 (AL)  (AL) - data8 - CF


SBB AX, data16 (AX)  (AX) - data16 - CF

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 242


Instruction Set
2. Arithmetic Instructions
Mnemonics: ADD, ADC, SUB, SBB, INC, DEC, MUL, DIV, CMP…
INC reg/ mem

INC reg8 (reg8)  (reg8) + 1

INC reg16 (reg16)  (reg16) + 1

INC mem (mem)  (mem) + 1

DEC reg/ mem

DEC reg8 (reg8)  (reg8) - 1

DEC reg16 (reg16)  (reg16) - 1

DEC mem (mem)  (mem) - 1

After DEC instruction we can use any JMP (Cond. Or Non-conditional) incase of loop
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 243
Instruction Set
2. Arithmetic Instructions
Mnemonics: ADD, ADC, SUB, SBB, INC, DEC, MUL, DIV, CMP…
MUL reg/ mem

MUL reg For byte : (AX)  (AL) x (reg8)


For word : (DX)(AX)  (AX) x (reg16)

MUL mem For byte : (AX)  (AL) x (mem8)

IMUL reg/ mem

IMUL reg For byte : (AX)  (AL) x (reg8)


For word : (DX)(AX)  (AX) x (reg16)

IMUL mem For byte : (AX)  (AX) x (mem8)

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 244


Instruction Set
2. Arithmetic Instructions
Mnemonics: ADD, ADC, SUB, SBB, INC, DEC, MUL, DIV, CMP…

DIV reg/ mem

DIV reg For 16-bit :- 8-bit :


(AL)  (AX) :- (reg8) Quotient
(AH)  Remainder

For 16-bit :- 8-bit :


DIV mem (AL)  (AX) :- (mem8) Quotient
(AH)  Remainder

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 245


Instruction Set
2. Arithmetic Instructions
Mnemonics: ADD, ADC, SUB, SBB, INC, DEC, MUL, DIV, CMP…

IDIV reg/ mem

IDIV reg For 16-bit :- 8-bit :


(AL)  (AX) :- (reg8) Quotient
(AH)  Remainder

For 16-bit :- 8-bit :


IDIV mem (AL)  (AX) :- (mem8) Quotient
(AH)  Remainder

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 246


Instruction Set
2. Arithmetic Instructions
Mnemonics: ADD, ADC, SUB, SBB, INC, DEC, MUL, DIV, CMP…
CMP reg2/mem, reg1/ mem

CMP reg2, reg1 Modify flags  (reg2) – (reg1)

If (reg2) > (reg1) then CF=0, ZF=0, SF=0


If (reg2) < (reg1) then CF=1, ZF=0, SF=1
If (reg2) = (reg1) then CF=0, ZF=1, SF=0

CMP reg2, mem Modify flags  (reg2) – (mem)

If (reg2) > (mem) then CF=0, ZF=0, SF=0


If (reg2) < (mem) then CF=1, ZF=0, SF=1
If (reg2) = (mem) then CF=0, ZF=1, SF=0

CMP mem, reg1 Modify flags  (mem) – (reg1)

If (mem) > (reg1) then CF=0, ZF=0, SF=0


If (mem) < (reg1) then CF=1, ZF=0, SF=1
If (mem) = (reg1) then CF=0, ZF=1, SF=0

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 247


8086 Microprocessor

Instruction Set
2. Arithmetic Instructions
Mnemonics: ADD, ADC, SUB, SBB, INC, DEC, MUL, DIV, CMP…

CMP reg/mem, data

CMP reg, data Modify flags  (reg) – (data)

If (reg) > data then CF=0, ZF=0, SF=0


If (reg) < data then CF=1, ZF=0, SF=1
If (reg) = data then CF=0, ZF=1, SF=0

CMP mem, data Modify flags  (mem) – (data)

If (mem) > data then CF=0, ZF=0, SF=0


If (mem) < data then CF=1, ZF=0, SF=1
If (mem) = data then CF=0, ZF=1, SF=0

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 248


8086 Microprocessor

Instruction Set
2. Arithmetic Instructions
Mnemonics: ADD, ADC, SUB, SBB, INC, DEC, MUL, DIV, CMP…

CMP A, data

CMP AL, data8 Modify flags  (AL) – data8

If (AL) > data8 then CF=0, ZF=0, SF=0


If (AL) < data8 then CF=1, ZF=0, SF=1
If (AL) = data8 then CF=0, ZF=1, SF=0

CMP AX, data16 Modify flags  (AX) – data16

If (AX) > data16 then CF=0, ZF=0, SF=0


If (mem) < data16 then CF=1, ZF=0, SF=1
If (mem) = data16 then CF=0, ZF=1, SF=0

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 249


Eg: SUM OF ‘N’ CONSECUTIVE NUMBERS
MOV SI, 2000
MOV CL, [SI]
MOV AL, 00
MOV BL, 01
LOOP ADD AL, BL
INC BL
DEC CL
JNZ LOOP
MOV DI, 2002
MOV [DI], AX
HLT

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 250


Instruction Set
3. Logical Instructions
Mnemonics: AND, OR, XOR, TEST, SHR, SHL, RCR, RCL …

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 251


Instruction Set
3. Logical Instructions
Mnemonics: AND, OR, XOR, TEST, SHR, SHL, RCR, RCL …

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 252


Instruction Set
3. Logical Instructions
Mnemonics: AND, OR, XOR, TEST, SHR, SHL, RCR, RCL …

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 253


Instruction Set
3. Logical Instructions
Mnemonics: AND, OR, XOR, TEST, SHR, SHL, RCR, RCL …

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 254


Instruction Set
3. Logical Instructions
Mnemonics: AND, OR, XOR, TEST, SHR, SHL, RCR, RCL …

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 255


Instruction Set
3. Logical Instructions
Mnemonics: AND, OR, XOR, TEST, SHR, SHL, RCR, RCL …

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 256


Instruction Set
3. Logical Instructions
Mnemonics: AND, OR, XOR, TEST, SHR, SHL, RCR, RCL …

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 257


Instruction Set
3. Logical Instructions
Mnemonics: AND, OR, XOR, TEST, SHR, SHL, RCR, ROL …

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 258


Instruction Set
4. String Manipulation Instructions

 String : Sequence of bytes or words

 8086 instruction set includes instruction for string movement, comparison,


scan, load and store.

 REP instruction prefix : used to repeat execution of string instructions

 String instructions end with S or SB or SW.


S represents string, SB string byte and SW string word.

 Offset or effective address of the source operand is stored in SI register and


that of the destination operand is stored in DI register.

 Depending on the status of DF, SI and DI registers are automatically


updated.

 DF = 0  SI and DI are incremented by 1 for byte and 2 for word.

 DF = 1  SI and DI are decremented by 1 for byte and 2 for word.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 259


Instruction Set
4. String Manipulation Instructions
Mnemonics: REP, MOVS, CMPS, SCAS, LODS, STOS

MOVS

MOVSB MA = (DS) x 1610 + (SI)


MAE = (ES) x 1610 + (DI)

(MAE)  (MA)

If DF = 0, then (DI)  (DI) + 1; (SI)  (SI) + 1


If DF = 1, then (DI)  (DI) - 1; (SI)  (SI) - 1

MOVSW MA = (DS) x 1610 + (SI)


MAE = (ES) x 1610 + (DI)

(MAE ; MAE + 1)  (MA; MA + 1)

If DF = 0, then (DI)  (DI) + 2; (SI)  (SI) + 2


If DF = 1, then (DI)  (DI) - 2; (SI)  (SI) - 2

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 260


Instruction Set
4. String Manipulation Instructions
Mnemonics: REP, MOVS, CMPS, SCAS, LODS, STOS
REP Example:
REPZ/ REPE While CX  0 and ZF = 1, repeat execution of string instruction MOVSB
and REP MOVSB
(Repeat string instruction until ZF = 0) (CX)  (CX) – 1

MOVSW
REPNZ/ REPNE
REP MOVSW
While CX  0 and ZF = 0, repeat execution of string instruction
(Repeat string instruction until ZF = 1) and
(CX)  (CX) - 1

In the example, the first form copies a single byte from the source string, at address DS:SI, to the destination string, at
address ES:DI, then increments (or decrements, if the Direction flag is set) both SI and DI.

The second form performs this operation and then decrements CX; if CX is not zero, the operation is repeated.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 261


Instruction Set
4. String Manipulation Instructions
Mnemonics: REP, MOVS, CMPS, LODS, STOS

Compare two string byte or string word

CMPS

CMPSB MA = (DS) x 1610 + (SI)


MAE = (ES) x 1610 + (DI)

Modify flags  (MA) - (MAE)

If (MA) > (MAE), then CF = 0; ZF = 0; SF = 0


If (MA) < (MAE), then CF = 1; ZF = 0; SF = 1
CMPSW If (MA) = (MAE), then CF = 0; ZF = 1; SF = 0

For byte operation


If DF = 0, then (DI)  (DI) + 1; (SI)  (SI) + 1
If DF = 1, then (DI)  (DI) - 1; (SI)  (SI) - 1

For word operation


If DF = 0, then (DI)  (DI) + 2; (SI)  (SI) + 2
If DF = 1, then (DI)  (DI) - 2; (SI)  (SI) - 2

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 262


Instruction Set
4. String Manipulation Instructions
Mnemonics: REP, MOVS, CMPS, SCAS, LODS, STOS

Load string byte in to AL or string word in to AX

LODS

LODSB MA = (DS) x 1610 + (SI)


(AL)  (MA)

If DF = 0, then (SI)  (SI) + 1


If DF = 1, then (SI)  (SI) – 1

LODSW MA = (DS) x 1610 + (SI)


(AX)  (MA ; MA + 1)

If DF = 0, then (SI)  (SI) + 2


If DF = 1, then (SI)  (SI) – 2

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 263


Instruction Set
4. String Manipulation Instructions
Mnemonics: REP, MOVS, CMPS, SCAS, LODS, STOS

Store byte from AL or word from AX in to string

STOS

STOSB MAE = (ES) x 1610 + (DI)


(MAE)  (AL)

If DF = 0, then (DI)  (DI) + 1


If DF = 1, then (DI)  (DI) – 1

STOSW MAE = (ES) x 1610 + (DI)


(MAE ; MAE + 1 )  (AX)

If DF = 0, then (DI)  (DI) + 2


If DF = 1, then (DI)  (DI) – 2

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 264


Instruction Set
5. Processor Control Instructions

Mnemonics Explanation
STC Set CF  1

CLC Clear CF  0

CMC Complement carry CF  CF/

STD Set direction flag DF  1

CLD Clear direction flag DF  0

STI Set interrupt enable flag IF  1

CLI Clear interrupt enable flag IF  0

NOP No operation

HLT Halt after interrupt is set

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 265


Instruction Set
6. Control Transfer Instructions

Transfer the control to a specific destination or target instruction


Do not affect flags

 8086 Unconditional transfers

Mnemonics Explanation
CALL reg/ mem/ disp16 Call subroutine

RET Return from subroutine

JMP reg/ mem/ disp8/ disp16 Unconditional jump

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 266


Instruction Set
6. Control Transfer Instructions

Checks flags

If conditions are true, the program control is


transferred to the new memory location in the same
segment by modifying the content of IP

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 267


Instruction Set
6. Control Transfer Instructions
Name Alternate name Name Alternate name
JE disp8 JZ disp8 JE disp8 JZ disp8
Jump if equal Jump if result is 0 Jump if equal Jump if result is 0

JNE disp8 JNZ disp8 JNE disp8 JNZ disp8


Jump if not equal Jump if not zero Jump if not equal Jump if not zero
JG disp8 JNLE disp8 JA disp8 JNBE disp8
Jump if greater Jump if not less or equal Jump if above Jump if not below or
equal
JGE disp8 JNL disp8
Jump if greater than or Jump if not less JAE disp8 JNB disp8
equal Jump if above or equal Jump if not below
JL disp8 JNGE disp8 JB disp8 JNAE disp8
Jump if less than Jump if not greater than Jump if below Jump if not above or
or equal equal
JLE disp8 JNG disp8
Jump if less than or Jump if not greater JBE disp8 JNA disp8
equal Jump if below or equal Jump if not above

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 268


8086 Microprocessor
Instruction Set
6. Control Transfer Instructions

 8086 conditional branch instructions affecting individual flags

Mnemonics Explanation

JC disp8 Jump if CF = 1

JNC disp8 Jump if CF = 0

JP disp8 Jump if PF = 1

JNP disp8 Jump if PF = 0

JO disp8 Jump if OF = 1

JNO disp8 Jump if OF = 0

JS disp8 Jump if SF = 1

JNS disp8 Jump if SF = 0

JZ disp8 Jump if result is zero, i.e, Z = 1

JNZ disp8 Jump if result is not zero, i.e, Z = 1

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 269


Branch Instruction

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 270


Nested Procedure Calls

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 271


String Manipulation Instructions
A series of data bytes or words available in the memory at consecutive locations are called byte
strings or word strings
Length of the string is stored in CX reg
For referring to a string, starting/ending address of the string and length of the string is
required

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 272


LODS/STOS
Load string byte or word
Eg: LODSB  Loads a byte into AL
AL = DS : [SI]
LODSW  Loads a word into AX
AX = DS : [SI]
Eg: STOSB  Stores a byte in AL
ES : [DI] = AL
STOSW  Stores a word in AX
ES : [DI] = AX

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 273


MOVS
Moves a string byte or word in mem to another mem location
Eg: MOVSB  Moves string byte
ES : [DI] = DS : [SI]
MOVSW  Moves string word
ES : [DI] = DS : [SI]

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 274


CMPS
Compares two strings
Eg: CMPSB  Compares string bytes
ES : [DI] DS : [SI]
CMPSW  Compares string words
ES : [DI] DS : [SI]

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 275


SCAS
It scans the string of bytes (SCASB) or word (SCASW) for an operand byte or word specified
in the register AL or AX
Eg: SCASB  Compares AL with byte in memory
AL ES : [DI]
SCASW  Compares string words
AX ES : [DI]

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 276


SCAS
It scans the string of bytes (SCASB) or word (SCASW) for an operand byte or word specified
in the register AL or AX
Eg: SCASB  Compares AL with byte in memory
AL ES : [DI]
SCASW  Compares string words
AX ES : [DI]

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 277


REP

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 278


DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 279
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 280
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 281
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 282
WAP to add a series of 8 bit numbers stored in memory locations
starting from 1200 to 1209. Store the result in 120A
MOV SI,1200H
MOV AL,00H
MOV CL,0AH
L1: ADD AL,[SI]
INC SI
DEC CL
JNZ L1
MOV [SI],AL
HLT

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 283


Searching the existence of a certain data in a
given data array

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 284


FLOW CHART
IF
START AX=[SI] YES
?
Initialize SI to 1200 memory
location NO

Data of SI moved to AX IF
NO CX=00
?
Counter CX is assigned by 05
yes
YES

0000 is assigned to SI
Increment SI by 02 times

compare the content contents of SI moved


of AX with [SI] to 1400 memory location

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP


STOP 285
PROGRAM
Memory address label mnemonics
1000 MOV SI,1200H
MEMORY DATA IN
ADDRESS MOV AX,[SI]
1200 AB96 MOV CX,0005H
1202 89CD GG INC SI
1204 AB96 INC SI
1206 4EDF CMP AX,[SI]
1208 9197 JE SS
120A 9600 DEC CX
JNZ GG
MOV SI,0000H
SS MOV [1400], SI
HLT
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 286
MEMORY DATA IN MEMORY DATA OUT
ADDRESS ADDRESS
1200 AB96 1400 1204
1202 89CD
1204 AB96
1206 4EDF
1208 9197
120A 9600

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 287


To separate odd and even numbers in a given data array
Address Program Explanation
MOV CL, 06 Set counter in CL register
MOV SI, 1600 Set source index as 1600
MOV DI, 1500 Set Destination index as memory address 1500
Loop1: MOV AL,[SI] Load data from source memory
ROR AL, 01 Rotate AL once to right
JNC Loop1 If bit is one Jump to Loop1
ROL AL,01 Rotate AL once to left
MOV [DI], AL Move result to Destination
INC SI Increment Destination index
INC DI Increment Destination index
DEC CL Decrement the count
JNZ Loop1 Jump if CL not 0 to Loop1
HLT Stop the program
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 288
Similarly write a program to separate –ve numbers from +ve numbers in a given set of data
WAP to count the number of odd and even numbers in a given set of data array
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 289
WAP to find the greatest number in a
given set of data array MNEMONICSCOMMENT
MOV SI, 1500H SI<-1500
Memory Data (Input) MOV CL, [SI] CL<-[SI]
address(offset
INC SI SI<-SI+1
1500 05 MOV AL, [SI] AL<-[SI]
1501 25 DEC CL CL<-CL-1

1502 35 INC SI SI<-SI+1

1503 20 L1: CMP AL, [SI] AL-[SI]


JNC L2 JUMP TO L2 IF CY=0
1504 30
MOV AL, [SI] AL<-[SI]
1505 15
L2: INC SI SI<-SI+1
LOOP L1 CX<-CX-1 & JUMP TO L1 IF CX NOT 0
Memory Data (Output)
MOV [1600], AL AL->[1600]
address(offset
HLT END
1600 35

How the program will be modify to write for finding


DR. ELLISON | COMPUTERthe smallest
ARCHITECTURE number?
(ECE2015) | VIT-AP 290
Write a program to sort a 8 bit data array in ascending order. The array consists of 5
numbers starting from location 3000H:4000H.
MOV AX,3000H
MOV DS,AX
MOV CH, 04H
L3: MOV CL,04H
MOV SI, 4000H
L2: MOV AL, [SI] Address Initial Ext loop 1 Ext loop 2 Ext loop 3 Ext loop 4
MOV AH, [SI+01H] offset data
CMP AL, AH 4000 55 45 35 25 15
JB L1
4001 45 35 25 15 25
JZ L1
MOV [SI+1], AL 4002 35 25 15 35 35
MOV [SI], AH 4003 25 15 45 45 45
L1: INC SI 4004 15 55 55 55 55
DEC CL
JNZ L2
DEC CH
JNZ L3
HLT
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 291
WAP to reverse a string of 10 bytes using stack. Check whether the string is
palindrome or not. If it is palindrome display FFH on a display unit whose
address is 52H else display 00H.
MOV SI, 2300H SI STACK DI
MOV SI, 2300H
MOV DI, 2500H
MOV DI, 2500H 2300 11 FFF1 AA 2500 AA
MOV CL, 0AH
MOV CL, 0AH 2301 22 FFF2 99 2501 99
L1: MOV AL, [SI]
CLD 2302 33 FFF3 88 2502 88
PUSH AL
REPZ CMPSB 2303 44 FFF4 77 2503 77
INC SI
JNZ L3 2304 55 FFF5 66 2504 66
DEC CL
MOV AL, FFH 2305 66 FFF6 55 2505 55
JNZ L1
MOV CL, 0AH OUT 52H 2306 77 FFF7 44 2506 44

L2: POP AL JMP Exit 2307 88 FFF8 33 2507 33


MOV [DI], AL L3: MOV AL, 00H 2308 99 FFF9 22 2508 22
INC DI OUT 52H 2309 AA FFFA 11 2509 11
DEC CL Exit: HLT
JNZ L2

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 292


ALP to find the Sum of cubes of an array of size
10 by using 8086.
MOV SI,0200 H
MOV DI,0220H
MOV CL,0AH
Up: MOV AL,[SI]
MOV BL,AL
MUL BL
MUL BL
MOV [DI],AX
INC SI
INC DI
INC DI
DEC CL
JNZ Up
HLT
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 293
Write an ALP to convert binary number to gray code

Start

Initialize SI to 1200 memory location

Move contents of SI to AX and BX

Shift contents of BX to one bit right

Perform XOR operation on AX and BX

Move contents of AX to
1400 memory location

Stop
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 294
Program
MOV SI, 1200

MOV AX, [SI]

MOV BX, [SI]

SHR BX, 01

XOR AX, BX

MOV [1400], AX

HLT

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 295


Write an ALP to convert binary number to gray code

Start

Initialize SI to 1200 memory location

Move contents of SI to AX and BX

Shift contents of BX to one bit right


A3 A2 A1 A0
0 A3 A2 A1
Perform XOR operation on AX and BX
A3XOR0 A2XORA3 A1XORA2 A0XORA1
Move contents of AX to
1400 memory location

Stop
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 296
Program
MOV SI, 1200

MOV AX, [SI]

MOV BX, [SI]

SHR BX, 01

XOR AX, BX

MOV [1400], AX

HLT

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 297


Program to check a number for bit wise palindrome. If
palindrome place FFH at 2500H or place 00H at 2500H

MOV AX, [2300H]


MOV CL,10H ;Initialize the counter 10.
UP: ROR AX,1 ;Rotate right one time.
RCL DX,1 ;Rotate left with carry one time.
DEC CL
JNZ UP ;Loop the process.
CMP AX,DX ;Compare AX and DX.
JNZ DOWN ;If no zero go to DOWN label.
MOV [2500H], FFH ;Declare as a PALINDROME.
JMP EXIT ;Jump to EXIT label.
DOWN: MOV [2500H], 00H ; Declare as not a PALINDROME
EXIT: HLT

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 298


Program to get the square root of a
number
M0V AX, [0500H] move the data from offset 500 to register AX
MOV CX, 0000H move 0000 to register CX
MOV BX, FFFFH move FFFF to register BX

L1: ADD BX, 0002H add BX and 02


INC CX increment the content of CX by 1
SUB AX, BX subtract contents of BX from AX
JNZ L1 jump to address 040A if zero flag(ZF) is 0
MOV [0600], CX store the contents of CX to offset 600
HLT end the program

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 299


CONVERSION OF BCD TO HEXADECIMAL
AL
56 0 0 1 1 1 0 0 0
AND
F0 1 1 1 1 0 0 0 0

30 0 0 1 1 0 0 0 0
ROR,4
AL=03 0 0 0 0 0 0 1 1
MUL 0A
30 in 0 0 0 1 1 1 1 0
dec
AH
56 0 0 1 1 1 0 0 0
AND
0F 0 0 0 0 1 1 1 1
AH
08 0 0 0 0 1 0 0 0
BL DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 300
Write a program to get Factorial of 10 numbers stored from the starting location
4000H:1000H. The results should be stored in 4000H:2000H
MOV 4000H, AX
MOV DS, AX
MOV SI,1000H
MOV DI, 2000H
Mov CL,0AH
MOV AL, 01H
Next:MOV BL,[SI]
LOOK:MUL BL
DEC BL
JNZ LOOK
MOV [DI], AL
INC SI
INC DI
LOOP NEXT
HLT DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 301
Flowchart Start

Initialize SI to 1200 memory location

Move contents of SI to AX and BX

Shift contents of BX to one bit right

Perform XOR operation on AX and BX

Compare contents of BX with 0000

No If equal
?
Yes
Move contents of AX to
Stop
1400 memory location
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 302
Program
Mnemonics
MOV SI, 1200
MOV AX, [ SI ]
MOV BX, [ SI ]
LOOP 1 SHR BX, 01
XOR AX, BX
CMP BX, 0000
JE LOOP 2
JMP LOOP 1
LOOP 2 MOV [ 1400 ], AX
HLT

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 303


Program to find Greatest common divisor (GCD)
of given numbers
MOV SI, 2300H ;Store Offset address 2300h in SI
MOV DI, 2400H ;Store offset address 2400h in DI

MOV AX, [SI] ;Move the first number to AX.


MOV BX, [SI+1] ;Move the second number to BX.
UP: CMP AX, BX ;Compare the two numbers.
JE EXIT ;If equal, go to EXIT label.
JB EXCG ;If first number is below than second, go to EXCG label.

UP1: MOV DX,0000H ;Initialize the DX.

DIV BX ;Divide the first number by second number.


CMP DX,0000H ;Compare remainder is zero or not.
JE EXIT ;If zero, jump to EXIT label.

MOV AX,DX ;If non-zero, move remainder to AX.

JMP UP ;Jump to UP label.


EXCG: XCHG AX,BX ;Exchange the remainder and quotient.

JMP UP1 ;Jump to UP1.


EXIT: MOV [DI], BX ;Store the result in DI.

HLT ; Stop

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 304


Program to find least common multiple (LCM) of
a given numbers
MOV SI, 2300H ;Store Offset address 2300h in SI
MOV DI, 2400H ;Store offset address 2400h in DI
MOV DX,0000H ;Initialize the DX
MOV AX,[SI] ;Move the first number to AX
MOV BX,[SI+2] ;Move the second number to AX
UP: PUSH AX ;Store the quotient/first number STACK
PUSH DX ;Store the remainder STACK
DIV BX ;Divide the first number by second number
CMP DX,0000H ;Compare the remainder.
JE EXIT ;If remainder is zero, go to EXIT label
POP DX ;If remainder is non-zero, ;Retrieve the remainder from stack
POP AX ;Retrieve the quotient from stack
ADD AX,[SI] ;Add first number with AX
JNC DOWN ;If no carry jump to DOWN label
INC DX ;Increment DX
DOWN: JMP UP ;Jump to Up label
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 305
EXIT: MOV [DI], AX ;If remainder is zero, store the value of LCM at destination
Some important programs
Program to count logical 1’s and 0’s in a given data
Program for getting square of array of numbers
Program to find LCM of a given numbers
Program to find GCD of a given numbers
Program to check a number is Bit wise palindrome or not
Program to check a 16 bit number is Nibble wise palindrome or not
Program to reverse a string
Program to search for a character in a string

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 306


Write a program to swap the nibbles of 10
data stored in the memory which starts from
2000
MOV CL, 0A
MOV SI, 2000
MOV DI, 3000
L1: MOV AL, [SI]
ROR AL, 04
MOV [DI], AL
INC SI
INC DI
DEC CL
JNZ L1
HLT

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 307


Consider all the 8 bits of an output port having address 54H are connected to 8 LEDs.
Write a program to blink all the LEDs ON and OFF with a time gap of 5 seconds. Consider
the time taken by any instruction to be executed is 1 Second.
L1: MOV AL, FFH
OUT 54H
CALL DELAY
MOV AL, 00H
OUT 54H
CALL DELAY
JMP L1

DELAY: MOV CL, 02H


L2: DEC CL
JNZ L2
RET
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 308
Write a program to add the contents of the memory location 0500H to register BX
and CX. Add immediate byte 05H to the data residing in memory location, whose
address is computed using offset=0600H. Store the result of the addition in 0700H.
Assume data segment’s starting physical address is 20000H.
MOV AX, 2000H
MOV DS, AX
ADD BX, [0500H]
ADD CX, BX
MOV DL, 05H
ADD DL, [0600]
MOV [0700], DL
HLT

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 309


Write a program to get Factorial of 10 numbers stored from the starting location
4000H:1000H. The results should be stored in 4000H:2000H
MOV 4000H, AX
MOV DS, AX
MOV SI,1000H
MOV DI, 2000H
MOV CL,0AH
MOV AL, 01H
Next:MOV BL,[SI]
LOOK:MUL BL
DEC BL
JNZ LOOK
MOV [DI], AL
INC SI
INC DI
LOOP NEXT
HLT DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 310
Bus arbitration for I/O devices
Data transfer techniques

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 311


The Purpose of Interrupts
I/O Organizations can be of 3 different types
1. Programmed I/O (Polling/Daisy chain)
2. Interrupt driven I/O
3. DMA
Interrupts are useful when interfacing I/O devices with low data-transfer rates, like a keyboard or a mouse, in
which case polling the device wastes valuable processing time
The peripheral interrupts the normal application execution, requesting to send or receive data.
The processor jumps to a special program called Interrupt Service Routine to service the peripheral
After the processor services the peripheral, the execution of the interrupted program continues.

Main Program Main Program Main Program Main Program

Printer Interrupt Modem Interrupt Modem Interrupt

312
Interrupts
An interrupt is used to cause a temporary halt in the execution of program.
Microprocessor responds to the interrupt with an interrupt service routine  short program or
subroutine that instructs the microprocessor on how to handle the interrupt.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 313


BASIC INTERRUPT TERMINOLOGY
Interrupt pins: Set of pins used in hardware interrupts
Interrupt Service Routine (ISR) or Interrupt handler: code used for handling a specific interrupt
Interrupt priority: In systems with more than one interrupt inputs, some interrupts have a
higher priority than other
◦ They are serviced first if multiple interrupts are triggered simultaneously

Interrupt vector: Code loaded on the bus by the interrupting device that contains the Address
(segment and offset) of specific interrupt service routine
Interrupt Masking: Ignoring (disabling) an interrupt
Non-Maskable Interrupt: Interrupt that cannot be ignored (power-down)

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 314


Main program

Interrupt processing flow Interrupt


Req
N

Accept N
Interrupt

Get interrupt
vector

Jump to ISR
Save PC

Load PC

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 315


Interrupts
A non-maskable interrupt requires an immediate response by microprocessor  usually used
for serious circumstances like power failure
A maskable interrupt is an interrupt that the microprocessor can ignore depending upon some
predetermined condition defined by status register

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 316


Interrupts
Hardware, software and internal interrupt are serviced on priority basis.
Each interrupt is given a different priority level by assigning it to a type number.
Type 0 identifies the highest-priority and type 255 identifies the lowest-priority interrupt.
The 8086 chips allow up to 256 vectored interrupts  it can have up to 256 different sources
for an interrupt and the 8086 will directly call the service routine for that interrupt without any
software processing.
This is in contrast to non-vectored interrupts that transfer control directly to a single interrupt
service routine, regardless of the interrupt source.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 317


Hardware Interrupts – Interrupt pins and timing
x86 Interrupt Pins
◦ INTR: Interrupt Request. Activated by a peripheral device to interrupt the processor.
◦ Level triggered. Activated with a logic 1.
◦ /INTA: Interrupt Acknowledge. Activated by the processor to inform the interrupting device the the
interrupt request (INTR) is accepted.
◦ Level triggered. Activated with a logic 0.
◦ NMI: Non-Maskable Interrupt. Used for major system faults such as power failures.
◦ Edge triggered. Activated with a positive edge (0 to 1) transition.
◦ Must remain at logic 1, until it is accepted by the processor.
◦ Before the 0 to 1 transition, NMI must be at logic 0 for at least 2 clock cycles.
◦ No need for interrupt acknowledgement.

INTR

INTA΄

D7-D0 Vector
Interrupts
The 8086 provides a 256 entry interrupt vector table beginning at address 0:0 in memory.
The Interrupt Vector Table occupies the address range from 00000H to 003FFH (the first
1024 bytes in the memory map).
This is a 1K table containing 256 4-byte entries.
Each entry in this table contains a segmented address that points at the interrupt service
routine in memory.
The lowest five types are dedicated to specific interrupts such as the divide by zero interrupt
and the non maskable interrupt.
The next 27 interrupt types, from 5 to 31 are reserved by Intel for use in future
microprocessors.
The upper 224 interrupt types, from 32 to 255, are available to use for hardware and
software interrupts.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 319


DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 320
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 321
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 322
Interrupt Vector - Example
Draw a circuit diagram to show how a device with interrupt vector 4CH can be connected
on an 8088 microprocessor system.
Answer:
◦ The peripheral device activates the INTR line
◦ The processor responds by activating the INTA signal
◦ The NAND gate enables the 74LS244 octal buffer
◦ the number 4CH appears on the data bus
◦ The processor reads the data bus to get the interrupt vector

D7

Peripheral
Device
8088 System

D0

A0 Y7 Y6 Y5 Y4 Y3 Y2 Y1 Y0
E2 INTR
LS244
E1
I7 I6 I5 I4 I3 I2 I1 I0
A19 +5V
INTA
4C = 0 1 0 0 1 1 0 0
INTR

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 323


Interrupt Vector Table – Real Mode (16-bit) Example
Using the Interrupt Vector Table shown below, determine the address of the ISR of a device
with interrupt vector 42H.
Answer: Address in table = 4 X 42H = 108H
(Multiply by 4 since each entry is 4 bytes)
Offset Low = [108] = 2A, Offset High = [109] = 33
Segment Low = [10A] = 3C, Segment High = [10B] = 4A
Address = 4A3C:332A = 4A3C0 + 332A = 4D6EAH

0 1 2 3 4 5 6 7 8 9 A B C D E F
00000 3C 22 10 38 6F 13 2C 2A 33 22 21 67 EE F1 32 25
00010 11 3C 32 88 90 16 44 32 14 30 42 58 30 36 34 66
......... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
00100 4A 33 3C 4A AA 1A 1B A2 2A 33 3C 4A AA 1A 3E 77
00110 C1 58 4E C1 4F 11 66 F4 C5 58 4E 20 4F 11 F0 F4
......... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
00250 00 10 10 20 3F 26 33 3C 20 26 20 C1 3F 10 28 32
00260 20 4E 00 10 50 88 22 38 10 5A 38 10 4C 55 14 54
......... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
003E0 3A 10 45 2F 4E 33 6F 90 3A 44 37 43 3A 54 54 7F
003F0 22 3C 80 01 3C | COMPUTER
DR. ELLISON 4F 4E 88 22
ARCHITECTURE 3C| VIT-AP
(ECE2015) 50 21 49 3F F4 65 324
Example
Write a sequence of instructions that initialize vector 40H to point to the ISR “isr40”.
Answer: Address in table = 4 X 40 = 100H
Set ds to 0 since the Interrupt Vector Table begins at 00000H
Get the offset address of the ISR using the Offset directive
and store it in the addresses 100H and 101H
Get the segment address of the ISR using the Segment directive
and store it in the addresses 102H and 103H

push ax Save registers in the stack


push ds
mov ax,0
Set ds to 0 to point to the interrupt vector table
mov ds,ax
mov ax,offset isr40 Get the offset address of the ISR and store
mov [0100h],ax it in the address 0100h (4X40 = 100h)
mov ax,segment isr40 Get the segment address of the ISR
mov [0102h],ax and store it in the address 0102h
pop ds
Restore registers from the stack
pop ax
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 325
Interrupt Masking
The processor can inhibit certain types of interrupts by use of a special interrupt mask bit.
This mask bit is part of the flags/condition code register, or a special interrupt register.
If this bit is clear, and an interrupt request occurs on the Interrupt Request input, it is ignored.
NMI cannot be masked

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 326


S/W Interrupt Processing
Save state
◦ Disable interrupts for the duration of the ISR or allow it to be interrupted too?
◦ Save program counter/Instruction Pointer
◦ Save flags
◦ Save register values?

Jump to interrupt service routine


◦ Location obtained by interrupt vector

Process interrupt
Restore state
◦ Load PC/IP, flags, registers etc.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 327


H/W Interrupt Processing
External interface sends an interrupt signal, to the Interrupt Request (INTR) pin, (or an internal interrupt occurs.)
The CPU finishes the present instruction (for a hardware interrupt) and checks the INTR pin.
If IF=0 the processor ignores the interrupt, else sends Interrupt Acknowledge (INTA) to hardware interface.
The interrupt type N is sent to the Central Processor Unit (CPU) via the Data bus from the hardware interface.
The contents of the flag registers are pushed onto the stack.
Both the interrupt (IF – FR bit 9) and (TF – FR bit 8) flags are cleared. This disables the INTR pin and the trap or single-
step feature.
The contents of the code segment register (CS) are pushed onto the Stack.
The contents of the instruction pointer (IP) are pushed onto the Stack.
The interrupt vector contents are fetched, from (4 x N) and then placed into the IP and from (4 x N +2) into the CS so
that the next instruction executes at the interrupt service procedure addressed by the interrupt vector.
While returning from the interrupt-service routine by the Interrupt Return (IRET) instruction, the IP, CS and Flag
registers are popped from the Stack and return to their state prior to the interrupt.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 328


The Intel x86 Interrupt Software Instructions
All x86 processors provide the following instructions related to interrupts:
◦ INT nn: Interrupt. Run the ISR pointed by vector nn.
◦ INT 0 is reserved for the Divide Error
◦ INT 1 is reserved for Single Step operation
◦ INT 2 is reserved for the NMI pin
◦ INT 3 is reserved for setting a Breakpoint
◦ INT 4 is reserved for Overflow (Same as the INTO (Interrupt on overflow) instruction.
◦ CLI: Clear Interrupt Flag. IF is set to 0, thus interrupts are disabled.
◦ STI: Set Interrupt Flag. IF is set to 1, thus interrupts are enabled.
◦ IRET: Return from interrupt. This is the last instruction in the ISR (Real Mode only). It pops from the
stack the Flag register, the IP and the CS.
◦ After returning from an ISR the interrupts are enabled, since the initial value of the flag register is poped from the stack.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 329


Reset
Processor initialization or start up is accomplished with activation (HIGH) of the RESET pin.
The 8086 RESET is required to be HIGH for greater than 4 CLK cycles.
The 8086 will terminate operations on the high-going edge of RESET and will remain
dormant as long as RESET is HIGH.
The low-going transition of RESET triggers an internal reset sequence for approximately 10
CLK cycles.
After this interval the 8086 operates normally beginning with the instruction in absolute
location FFFF0H.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 330


Non-Maskable interrupt
The processor provides a single non-maskable interrupt pin (NMI) which has higher priority
than the maskable interrupt request pin (INTR).
A typical use would be to activate a power failure routine.
The NMI is edge-triggered on a LOW-to-HIGH transition.
The activation of this pin causes a type 2 interrupt.
NMI is required to have a duration (in the HIGH state) of greater than two CLK cycles, but is
not required to be synchronized to the clock.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 331


Intel 8259A Interrupt Controller
The primary sources of interrupts are the PCs timer chip, keyboard, serial ports, parallel
ports, disk drives, CMOS real-time clock, mouse, sound cards, and other peripheral devices.
These devices connect to an Intel 8259A programmable interrupt controller (PIC) that
prioritizes the interrupts and interfaces with the 8086 CPU.
The 8259A chip adds considerable complexity to the software that processes interrupts.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 332


DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 333
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 334
Programmable 82C59A
The 82C59A is programmable. The 8386 determines the priority scheme to be used by setting a
control word in the 82C59A.
The following interrupt modes are possible:
Fully nested: The interrupt requests are ordered in priority from 0 (IR0) through 7 (IR7)
Rotating: In some applications a number of interrupting devices are of equal priority. In this
mode a device, after being serviced, receives the lowest priority in the group.
Special mask: This allows the processor to inhibit interrupts from certain devices

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 335


Necessary materials to read for
Interrupts
Stallings book chapter: Input / Output (Chapter 7)
Other materials will be provided in VTOP

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 336


Maskable interrupt
Whenever an external signal activates the INTR pin, the microprocessor will be interrupted
only if interrupts are enabled using set interrupt Flag instruction (STI).
If the interrupts are disabled using clear interrupt Flag instruction (CLI), the microprocessor
will not get interrupted even if INTR is activated.
That is, INTR can be masked.
INTR is a non vectored interrupt, which means, the 8086 does not know where to branch to
service the interrupt.
The 8086 has to be told by an external device like a Programmable Interrupt controller
regarding the branch.
Whenever the INTR pin is activated by an I/O port, if Interrupts are enabled and NMI is not
active at that time, the microprocessor finishes the current instruction that is being executed
and gives out a ‘0’ on INTA pin twice.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 337


Maskable interrupt
When INTA pin goes low for the first time, it asks the external device to get ready.
In response to the second INTA the microprocessor receives the 8 bit, say N, from a
programmable Interrupt controller.
The action taken is as follows:
1. Complete the current instruction.
2. Activates INTA output, and receives type Number, say N
3. Flag register value, CS value of the return address & IP value of the return address are
pushed on to the stack.
4. IP value is loaded from contents of word location N x 4.
5. CS is loaded from contents of the next word location.
6. Interrupt Flag and trap Flag are reset to 0.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 338


Maskable interrupt
At the end of the ISR, there will be an IRET instruction.
This performs popping off from the stack top to IP, CS and Flag registers.
Finally, the register values which are also saved on the stack at the start of ISR, are restored
from the stack and a return to the interrupted program takes place using the IRET
instruction.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 339


INT 21h
INT 21h / AH=1 - read character from standard input, with echo, result is stored in AL. if there is
no character in the keyboard buffer, the function waits until any key is pressed.

INT 21h / AH=2 - write character to standard output.


entry: DL = character to write, after execution AL = DL.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 340


INT 21h
INT 21h / AH=5 - output character to printer.
entry: DL = character to print, after execution AL = DL.

INT 21h / AH=6 - direct console input or output.


parameters for output: DL = 0..254 (ascii code)
parameters for input: DL = 255
for output returns: AL = DL
for input returns: ZF set if no character available and AL = 00h, ZF clear if character
available.
AL = character read; buffer is cleared.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 341


INT 21h
INT 21h / AH=1 - read character from standard input, with echo, result is stored in AL. if there is
no character in the keyboard buffer, the function waits until any key is pressed.

INT 21h / AH=2 - write character to standard output.


entry: DL = character to write, after execution AL = DL.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 342


INT 21h
INT 21h / AH=5 - output character to printer.
entry: DL = character to print, after execution AL = DL.

INT 21h / AH=6 - direct console input or output.


parameters for output: DL = 0..254 (ascii code)
parameters for input: DL = 255
for output returns: AL = DL
for input returns: ZF set if no character available and AL = 00h, ZF clear if character
available.
AL = character read; buffer is cleared.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 343


INT 21h
INT 21h / AH=7 - character input without echo to AL.
if there is no character in the keyboard buffer, the function waits until any key is pressed.

INT 21h / AH=9 - output of a string at DS:DX. String must be terminated by '$'.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 344


INT 21h
INT 21h / AH=0Ah - input of a string to DS:DX, first byte is buffer size, second byte is
number of chars actually read. this function does not add '$' in the end of string. to print
using INT 21h / AH=9 you must set dollar character at the end of it and start printing from
address DS:DX+2.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 345


Serial I/O Interface

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 346


DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 347
Serial I/O Interface
Serial data transmission is used for digital communication between
Sensors and computers
Computers and computers
Computers and peripheral devices (printer, stylus, mouse, ..)
It is one of the most widely used communication techniques to interface external
equipment.
The process of sending data sequentially over a computer bus is called as serial
communication, which means the data will be transmitted bit by bit.
While in parallel communication the data is transmitted in a byte (8-bit) or character on
several data lines or buses at a time.
Serial communication is slower than parallel communication but used for long data
transmission due to lower cost and practical reasons.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 348


Serial I/O Interface

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 349


RS232
RS232 is a standard protocol used for serial communication
It is used in serial communication up to 50 feet with the rate of 1.492kbps.
RS232 is used for connecting Data Terminal Equipment (DTE) and Data Communication
Equipment (DCE).

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 350


Handshaking
Handshaking is a process of dynamically setting the parameters of a communication between
the transmitter and receiver before the communication begins.
The need for handshaking is dictated by the speed at with the transmitter (DTE) transmits the
data, the speed at which the receiver (DCE) receives the data.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 351


Hardware Handshaking
In Hardware Handshaking, the transmitter first asks the receiver whether it is ready to receive
data.
The receiver then checks its buffer and if the buffer is empty, it will then tell the transmitter
that it is ready to receive.
The transmitter will transmit the data and it is loaded into the receiver buffer.
During this time, the receiver tells the transmitter not to send any further data until the data in
the buffer has been read by the receiver.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 352


Hardware Handshaking
The RS232 Protocol defines four signals for the purpose of Handshaking:
1. Ready to Send (RTS)
2. Clear to Send (CTS)
3. Data Terminal Ready (DTR)
4. Data Set Ready (DSR)

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 353


Hardware Handshaking
With the help of Hardware Handshaking, the data from the transmitter is never lost or
overwritten in the receiver buffer.
When the transmitter (DTE) wants to send data, it pulls the RTS (Ready to Send) line to high.
Then the transmitter waits for CTS (Clear to Send) to go high and hence it keeps on monitoring
it.
If the CTS line is low, it means that the receiver (DCE) is busy and not yet ready to receive data.
When the receiver is ready, it pulls the CTS line to high.
The transmitter then transmits the data. This method is also called as RTS/CTS Handshaking.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 354


Software Handshaking
Software Handshaking in RS232 involves two
special characters for starting and stopping
the communication.
These characters are X-ON and X-OFF
(Transmitter On and Transmitter OFF).
When the receiver sends an X-OFF signal,
the transmitter stops sending the data.
The transmitter starts sending data only
after it receives the X-ON signal.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 355


RS232
Electrical characteristics
1. Logic 1: -3V to -25V; typically -12V
2. Logic 0: +3v to +25V; typically +12V
3. Any signal in the range -3V to +3V has an indeterminate logical state
4. Quiescent or inactive state is -12V (i.e. logic 1)

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 356


RS232 (Serial i/o Interface)
Connectors
1. DB25S is a 25 pin connector with full RS-232 functionality
2. The computer socket has a female outer casing with male connecting pins
3. The terminating cable connector has a male outer casing with female connecting pins

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 357


RS232 (DB9 connector)

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 358


RS232
RS232 takes bytes and transmits the individual bits in a sequential fashion in a frame.
A frame is a defined structure, carrying meaningful sequence of bit or bytes of data.
It has a start bit followed by 8 data bits, a parity bit and a stop bit.
Frame character is as shown below

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 359


RS232
Each bit is sent one after the other.
This mode of transmission requires that receiver is aware when the actual data bits are arriving
to synchronize itself with coming data.
So logic 0 is sent as a start bit. The start bit in the frame signals the receiver that a new
character is coming.
Once the receiver acknowledges, the next five to eight bits are sent which represents the
character.
This is followed by parity bit used for error detection.
Parity bit is used to specify even or odd number of one’s in the set of bits.
The stop bit helps the receiver to identify the end of message.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 360


RS232
If a receiver detects a value other than mark when stop bit should be present, it knows that’s
there is synchronization error.
This causes a framing error condition during reception
The device then tries to resynchronize on new incoming bits.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 361


DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 362
RS232
Example
ASCII coding, even parity, 2 stop bits:
1111101000001011000001111111111111100000111111110001100111101010011111111111
1
{inactive}11111 {start bit} 0 {‘A’}1000001 {parity bit} 0 {stop bits} 11 {start bit}0 {‘p’}0000111
{parity bit} 1 {stop bits}11 {inactive}11111111 {start bit}0 {‘p’}0000111 {parity bit} 1 {stop bits}11
{inactive}11 {start bit}0 {‘L’}0011001 {parity bit} 1 {stop bits}11
Message is ‘AppL’

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 363


Universal serial bus
Industry standard that defines the cables, connectors and communications protocols used in a
bus for connection, communication, and power supply between computers and electronic
devices
Developed in1996
By Joint efforts of Intel, Microsoft, Digital Equipment Corporation, IBM, NEC, Nortel, Compaq
USB has superceded and effectively replaced Serial, Parallel & PS/2 ports 4

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 364


WHY USB?
To standardize connections of electrical devices, while maintaining and optimizing
High speed
Reliability
Cost of manufacturing

To make it fundamentally easier to connect external devices to PCs


by replacing the multitude of connectors at the back of PCs
addressing the usability issues of existing interfaces
simplifying software configuration of all devices connected to USB

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 365


Comparison of Speeds of USB
USB Version Speed Type
USB version 1.1/1.0 1.5Mbits/s Low Speed
USB version 1.1/1.0 12Mbits/s Full Speed
USB version 2.0 480Mbits/s High Speed
USB version 3.0 4.8 Gbit/s High Speed
USB version 3.1 10 Gbit/s Super Speed

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 366


Architectural Overview
Host-controlled
i.e. there can only be one host per bus

Tiered Star Topology


Many hubs or Devices can be connected to the root host

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 367


Tiered Star Topology
Uses 7-bit addressing of devices
Allowing connection of up to 127 devices on a single USB bus
Allows expandability and ease of use
Each device can be handled and removed individually without interrupting others

Allows plug’n’play connectivity which allows for dynamically loadable and unloadable devices
and drivers.
Plug the device in and the host loads the drivers without needing a reboot or initiation/termination of
connection, etc
Unplug the device the absence is automatically detected by the host and the drivers are unloaded

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 368


Electrical and Mechanical Characteristics
Mechanical Characteristics
There are commonly two kinds of USB connectors
1. Type A
2. Type B
Many cables are made with Type A connector on one end and Type B connector on the other
This is an attempt to prevent improper physical connections between devices, as the
connectors are not physically interchangeable

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 369


Protocol layer
USB Packet Types - Data is sent in packets Least Significant Bit (LSB) first
There are 4 main USB packet types:
1. Token
2. Data
3. Handshake
4. Start of Frame
The packets are then bundled into frames to create a USB message

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 370


USB Packet Fields
Each packet is constructed from different field types
1. SYNC (synchronize)
2. PID (packet ID)
3. Address
4. Data
5. Endpoint
6. CRC (cyclic redundancy check)
7. EOP (end of packet)

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 371


USB Packet Fields
Synchronize (Sync):
All packets must start with a sync field.
Used to synchronize clock rate between device and host.
It is 8 bits for low speed and 32 bits for full/high speed connection.
Packet ID (PID):
Packet ID identifies the type of packet being sent.
It has 4 bits for the value, and another 4 bits of the inverted value to prevent errors

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 372


USB Packet Fields
Address:
7 bit field
Specifies the device the packet is intended for out of 127 devices that can be connected to a
single bus.
Endpoint:
4-bit long
Allows for further flexibility in addressing
Can also be split for IN or OUT data

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 373


USB Packet Fields
Cyclic Redundancy Checks (CRC):
Performed on the data with the packet payload.
All token packets have 5 bit CRC while data packets have 16 bit CRC.
End of Packet (EOP):
Signaled by a ‘0’ for approximately 2 bit times followed by a J for 1 bit time.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 374


Token Packet Format
In
Informs the USB device that host wishes to read information
Out
Informs the USB device that the host wishes to send information
Setup
Used to begin control transfers

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 375


Data Packet Format
Maximum data payload size for low-speed devices is 8 bytes.
Maximum data payload size for full-speed devices is 1024 bytes
Payloads must be sent in multiples of bytes

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 376


Handshake Packet Format
There are three types of packets which consist simply of PID
ACK: Acknowledgement that the packet has been successfully received
NACK: Reports that the device can not send or receive data. Also used to interrupt when no
data is available to send
STALL: The device is in a state that requires intervention from the host

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 377


Start of Frame Packet Format
Data is split into frames before being transmitted.
The start of frame packet is used to signify the frame number to the host.
The frame number field is 11 bits long

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 378


Data Transfer Types
Data Transfer Types
1. Control
2. Isochronous
3. Bulk
4. Interrupt

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 379


DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 380
Control unit

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 381


Control unit
Each instruction is executed in one instruction cycle.
But each instruction cycle is made up of a number of smaller units
These smaller units, in general, are
1. Fetch
2. Indirect
3. Execute
4. Interrupt

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 382


Control unit
It can be seen that further decomposition is possible
Each of these smaller cycles involves a series of steps, each of which involves the processor
registers
Such steps are known as micro-operations, since they are very simple and accomplish very
little
Micro-operations are the functional, or atomic, operations of a processor

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 383


Micro-operations

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 384


Fetch - 4 Registers
Memory Address Register (MAR)
◦ Connected to address bus
◦ Specifies address for read or write op

Memory Buffer Register (MBR)


◦ Connected to data bus
◦ Holds data to write or last data read

Program Counter (PC)


◦ Holds address of next instruction to be fetched

Instruction Register (IR)


◦ Holds last instruction fetched

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 385


Fetch cycle
At the beginning of the fetch cycle, the address of the
next instruction to be executed is in the program counter
(PC); in this case, the address is 1100100

The first step is to move that address to the memory


address register (MAR) because this is the only register
connected to the address lines of the system bus.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 386


Fetch cycle
The second step is to bring in the instruction.
The desired address (in the MAR) is placed on the address
bus  the control unit issues a READ command on the
control bus  the result appears on the data bus and is
copied into the memory buffer register (MBR).
We also need to increment the PC by the instruction
length to get ready for the next instruction.
Because these two actions (read word from memory,
increment PC) do not interfere with each other, we can do
them simultaneously to save time.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 387


Fetch cycle
The third step is to move the contents of the MBR to the
instruction register (IR). This frees up the MBR for use
during a possible indirect cycle.
Thus, the simple fetch cycle actually consists of three
steps and four microoperations.
Each micro-operation involves the movement of data into
or out of a register.
So long as these movements do not interfere with one
another, several of them can take place during one step,
saving time.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 388


Fetch cycle
Symbolically, we can write this sequence of events as follows:

where I is the instruction length.


We assume that a clock is available for timing purposes and that it emits regularly spaced clock
pulses. Each clock pulse defines a time unit.
Thus, all time units are of equal duration. Each micro-operation can be performed within the
time of a single time unit.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 389


Fetch cycle
First time unit: Move contents of PC to MAR.
Second time unit: Move contents of memory location specified by MAR to MBR. Increment by I
the contents of the PC.
Third time unit: Move contents of MBR to IR
The third micro-operation could have been grouped with the fourth without affecting the fetch
operation:

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 390


Rules for Grouping
1. The proper sequence of events must be followed. Thus (MARPC) must precede (MBR 
Memory) because the memory read operation makes use of the address in the MAR.
2. Conflicts must be avoided. One should not attempt to read to and write from the same
register in one time unit, because the results would be unpredictable. For example, the
micro-operations (MBR  Memory) and (IR  MBR) should not occur during the same time
unit.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 391


Indirect cycle
Once an instruction is fetched, the next step is to fetch source operands.
If the instruction specifies an indirect address, then an indirect cycle must precede the execute
cycle.
The data flow includes the following micro-operations:

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 392


Indirect cycle
The address field of the instruction is transferred to the MAR.
This is then used to fetch the address of the operand.
Finally, the address field of the IR is updated from the MBR, so that it now contains a direct
rather than an indirect address.
The IR is now in the same state as if indirect addressing had not been used, and it is ready for
the execute cycle.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 393


Interrupt cycle
At the completion of the execute cycle, a test is made to determine whether any enabled
interrupts have occurred. If so, the interrupt cycle occurs.
The nature of this cycle varies greatly from one machine to another.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 394


Interrupt cycle
In the first step, the contents of the PC are transferred to the MBR, so that they can be saved
for return from the interrupt.
Then the MAR is loaded with the address at which the contents of the PC are to be saved, and
the PC is loaded with the address of the start of the interrupt-processing routine.
These two actions may each be a single micro-operation.
Once this is done, the final step is to store the MBR, which contains the old value of the PC, into
memory.
The processor is now ready to begin the next instruction cycle which is the ISR.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 395


Execute cycle
The fetch, indirect, and interrupt cycles are simple and predictable involving a small, fixed
sequence of micro-operations and these micro-operations are repeated.
Because of the variety opcodes, there are a number of different sequences of micro-operations
that can occur.
Consider an add instruction: ADD R1, X which adds the contents of the location X to register
R1.
The following sequence of micro-operations might occur:

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 396


Execute cycle
We begin with the IR containing the ADD instruction.
In the first step, the address portion of the IR is loaded into the MAR.
Then the referenced memory location is read. Finally, the contents of R1 and MBR are added by
the ALU.
However, this is a simplified example.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 397


Execute cycle
Let us look at two more complex examples.
A common instruction is increment and skip if zero: ISZ X
The content of location X is incremented by 1. If the result is 0, the next instruction is skipped.
A possible sequence of micro-operations is

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 398


Execute cycle
The new feature introduced here is the conditional action.
The PC is incremented if (MBR) = 0.
This test and action can be implemented as one micro-operation.
Note also that this micro-operation can be performed during the same time unit during which
the updated value in MBR is stored back to memory.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 399


Execute cycle
Consider a subroutine call instruction.
As an example, consider a branch-and-save-address instruction: BSA X
The address of the instruction that follows the BSA instruction is saved in location X, and
execution continues at location X+I
The saved address will later be used for return.
This is a straightforward technique for providing subroutine calls.
The following micro-operations suffice:

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 400


Execute cycle
The address in the PC at the start of the instruction is the address of the next instruction in
sequence.
This is saved at the address designated in the IR.
The latter address is also incremented to provide the address of the instruction for the next
instruction cycle.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 401


Instruction cycle
Each phase of the instruction cycle can be decomposed into a sequence of elementary micro-
operations.
To complete the picture, we need to tie sequences of micro-operations together.
We assume a new 2-bit register called the instruction cycle code (ICC). The ICC designates the
state of the processor in terms of which portion of the cycle it is in:
1. 00: Fetch
2. 01: Indirect
3. 10: Execute
4. 11: Interrupt

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 402


Instruction cycle

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 403


Control of the processor
By reducing the operation of the processor to its most fundamental level, we are able to define
exactly what it is that the control unit must cause to happen.
Thus, we can define the functional requirements for the control unit: those functions that the
control unit must perform.
1. Define the basic elements of the processor.
2. Describe the micro-operations that the processor performs.
3. Determine the functions that the control unit must perform to cause the micro-operations to
be performed.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 404


Control of the processor
1. Sequencing: The control unit causes the processor to step through a series of micro-
operations in the proper sequence, based on the program being executed.
2. Execution: The control unit causes each micro-operation to be performed.
The preceding is a functional description of what the control unit does.
The key to how the control unit operates is the use of control signals.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 405


Control unit implementation
Implementation of Control unit is broadly of two types
1. Hardwired implementation (RISC)
2. Microprogrammed implementation (CISC)

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 406


Hardwired Implementation
Control unit inputs
Flags and control bus
◦ Each bit means something
Instruction register
◦ Op-code causes different control signals for each different instruction
◦ Unique logic for each op-code
◦ Decoder takes encoded input and produces single output
◦ n binary inputs and 2n outputs
Clock
◦ Repetitive sequence of pulses
◦ Useful for measuring duration of micro-ops
◦ Must be long enough to allow signal propagation
◦ Different control signals at different times within instruction cycle
◦ Need a counter with different control signals for t1, t2 etc.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 407


Control Unit with Decoded Inputs

How to implement ???

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 408


4:16 DECODER

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 409


Hardwired control unit methods
State table method:
T-States I1 I2 I3 ---- In
T1 C11 C12 C13 --- C1n
T2 C21 C22 C23 --- C2n
--- --- --- --- --- ---
Tm Cm1 Cm2 Cm3 --- Cmn

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 410


Hardwired Control Unit Logic
For each control signal, to derive a Boolean expression of that signal as a function of the inputs
Let us consider a single control signal, C5, which causes data to be read from the external data bus
into the MBR
Let us define two new control signals, P and Q, that have the following interpretation:
PQ = 00 Fetch Cycle
PQ = 11 Interrupt Cycle
PQ = 10 Execute Cycle
PQ = 01 Indirect Cycle

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 411


DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 412
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 413
State table Logic Example
Then C5 can be defined as:

That is, the control signal C5 will be asserted during


the second time unit of both the fetch and indirect
cycles.
This is not complete
C5 is also needed during the execute cycle. For our
simple example, let us assume that there are only
three instructions that read from memory: LDA, ADD
and AND. Now we can define C5 as

PQ = 00 Fetch Cycle
Is it that simple? PQ = 11 Interrupt Cycle
PQ = 10 Execute Cycle
No. In a modern complex processor, the number of PQ = 01 Indirect Cycle
Boolean equations needed to define the control unit is
very large. Hence an efficient and simpler approach, known as Flow
chart/ delay element method is usually used
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 414
•Referring to the figures above, depict the control signals required for the execution cycle of LOAD 200 and
STORE 300. [5]
•Express the Boolean expression for each of the control signals. [5]
•Implement the Boolean expressions for each of the control signals using minimum number of gates. [5]

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 415


DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 416
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 417
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 418
Flow chart/ Delay element method: Start

C1

C2

C3

Is
Y x=0? N

C4 C5

C6

End

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 419


DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 420
Sequence counter method

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 421


DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 422
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 423
Wilkes’ Design for Micro programmed CU

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 424


DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 425
Actual Microprogrammed control unit

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 426


DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 427
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 428
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 429
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 430
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 431
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 432
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 433
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 434
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 435
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 436
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 437
CONTENT

 Problems on Pipeline

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 438


PIPELINE

Non-pipeline

Pipeline
A B C
A B C
A B C
Time

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 439


MODEL-1 : PIPELINE PROBLEM-1

Problem 1: Find the number of clock cycles required to execute 10 instructions with
I1
pipeline method and without pipeline method for the following instruction structure ?
I2
I3
I F (2) D (1) E (1)
I4
I5
Fetch - 2 Clock cycle I6
I7
Decoding - 1 Clock cycle
I8
Execution - 1 Clock cycle
I9
I10

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 440


MODEL-1 : PIPELINE PROBLEM-1

Non - Pipeline method I5


I4
F(2) D(1) E(1)
I3
I2 F(2) D(1) E(1) So on …

I1 F(2) D(1) E(1)


F(2) D(1) E(1)
F(2) D(1) E(1)
Time

No of Clock Cycle required to execute 10 instructions is


= (No of instructions) x (Total no of required for single instruction)
= 10 x 4 = 40 Clock cycles

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 441


MODEL-1 : PIPELINE PROBLEM-1

Pipeline method
F F D E
F F D E So on …
F F D E
F F D E
F F D E
No of Clock Cycle

0 1 2 3 4 5 6 7 8 9 10 11 12

No of Clock Cycle required to execute 10 instructions is


= (No of clocks required for 1st instruction)+ ((no of instruction -1) x (difference between two instruction))
= 4 + ((10-1) x 2) = 4 + (9 x 2) = 4 + 18 = 22 Clock cycles

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 442


MODEL-1 : PIPELINE PROBLEM-2

Problem 2: Find the number of clock cycles required to execute 100 instructions with
I1
pipeline method and without pipeline method for the following instruction structure ?
I2
I3
I F (2) D (1) E (3)
I4
I5
Fetch - 2 Clock cycle I6
I7
Decoding - 1 Clock cycle
I8
Execution - 3 Clock cycle
I9
I10

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 443


MODEL-1 : PIPELINE PROBLEM-2

Non - Pipeline method I5


I4
F(2) D(1) E(3)
I3
I2 F(2) D(1) E(3) So on …

I1 F(2) D(1) E(3)


F(2) D(1) E(3)
F(2) D(1) E(3)
Time

No of Clock Cycle required to execute 100 instructions is


= (No of instructions) x (Total no of cycles required for single instruction)
= 100 x 6 = 600 Clock cycles

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 444


MODEL-1 : PIPELINE PROBLEM-2

Pipeline method
F F D E E E
F F D E E E So on …
F F D E E E
F F D E E E
No of Clock Cycle

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

No of Clock Cycle required to execute 100 instructions is


= (No of clocks required for 1st instruction)+ ((no of instruction -1) x (difference between two instruction))
= 6 + ((100-1) x 3) = 6 + (99 x 3) = 6 + 297 = 303 Clock cycles

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 445


MODEL-1 : PIPELINE PROBLEM ASSIGNMENT-1

Assignment - 1: Find the number of clock cycles required to execute 1000 instructions with
I1
pipeline method and without pipeline method for the following instruction structure ?
I2
I3
I F (1) D (2) E (3)
I4
I5
Fetch - 1 Clock cycle I6
I7
Decoding - 2 Clock cycle
I8
Execution - 3 Clock cycle
I9
I10

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 446


MODEL-1 : PIPELINE PROBLEM ASSIGNMENT-1

Assignment - 1: Find the number of clock cycles required to execute 1000 instructions with
I1
pipeline method and without pipeline method for the following instruction structure ?
I2
I3
I F (1) D (2) E (3)
I4
I5
Fetch - 1 Clock cycle I6
I7
Decoding - 2 Clock cycle
I8
Execution - 3 Clock cycle
I9
I10

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 447


MODEL-1 : PIPELINE PROBLEM ASSIGNMENT-1

Non - Pipeline method I5


I4
F(1) D(2) E(3)
I3
I2 F(1) D(2) E(3) So on …

I1 F(1) D(2) E(3)


F(1) D(2) E(3)
F(1) D(2) E(3)
Time

No of Clock Cycle required to execute 1000 instructions is


= (No of instructions) x (Total no of required for single instruction)
= 1000 x 6 = 6000 Clock cycles

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 448


MODEL-1 : PIPELINE PROBLEM ASSIGNMENT-1

Pipeline method
F D D E E E
F D D E E E So on …
F D D E E E
F D D E E E
No of Clock Cycle

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

No of Clock Cycle required to execute 100 instructions is


= (No of clocks required for 1st instruction)+ ((no of instruction -1) x (difference between two instruction))
= 6 + ((1000-1) x 3) = 6 + (999 x 3) = 6 + 2997 = 3003 Clock cycles

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 449


MODEL-1 : PIPELINE PROBLEM-3

Problem 1: Find the number of clock cycles required to execute 1000 instructions with
I1
pipeline method and without pipeline method for the following instruction structure? If
microcontroller frequency is 1GHz then also find the max operating frequency?
I2
I3
I4
I F (1) D (1) E (4)
I5
I6
Fetch - 1 Clock cycle
I7
Decoding - 1 Clock cycle I8
Execution - 4 Clock cycle I9
I10

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 450


MODEL-1 : PIPELINE PROBLEM-3

Non - Pipeline method I5


I4
F(1) D(1) E(4)
I3
I2 F(1) D(1) E(4) So on …

I1 F(1) D(1) E(4)


F(1) D(1) E(4)
F(1) D(1) E(4)
Time

No of Clock Cycle required to execute 1000 instructions is


= (No of instructions) x (Total no of required for single instruction)
= 1000 x 6 = 6000 Clock cycles

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 451


MODEL-1 : PIPELINE PROBLEM-3

Pipeline method

F D E E E E
F D E E E E So on …

F D E E E E

No of Clock Cycle

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

No of Clock Cycle required to execute 1000 instructions is


= (No of clocks required for 1st instruction)+ ((no of instruction -1) x (difference between two instruction))
= 6 + ((1000-1) x 4) = 6 + (999 x 4) = 6 + 3996 = 4002 Clock cycles

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 452


MODEL-1 : PIPELINE PROBLEM-3

Pipeline method

F D E E E E
F D E E E E So on …

F D E E E E

No of Clock Cycle

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Maximum operating frequency is given by

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 453


MODEL-1 : PIPELINE PROBLEM-4

Problem 4: If microcontroller frequency is 1GHz then also find the max operating
I1
frequency?
I2
I F (2) D (1) E (1) I3
I4
I5
Fetch - 2 Clock cycle
I6
Decoding - 1 Clock cycle
I7
Execution - 1 Clock cycle I8
I9
I10

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 454


MODEL-1 : PIPELINE PROBLEM-4

Pipeline method
F F D E
F F D E So on …
F F D E
F F D E
F F D E
No of Clock Cycle

0 1 2 3 4 5 6 7 8 9 10 11 12

Maximum operating frequency is given by

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 455


MODEL-1 : PIPELINE PROBLEM-5

Problem 5: If microcontroller frequency is 1GHz then also find the max operating
I1
frequency ?
I2
I3
I F (2) D (1) E (3)
I4
I5
Fetch - 2 Clock cycle I6
I7
Decoding - 1 Clock cycle
I8
Execution - 3 Clock cycle
I9
I10

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 456


MODEL-1 : PIPELINE PROBLEM-5

Pipeline method
F F D E E E
F F D E E E So on …
F F D E E E
F F D E E E
No of Clock Cycle

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Maximum operating frequency is given by

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 457


MODEL-2 PROBLEMS

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 458


MODEL-2 : PIPELINE PROBLEM-1

Problem 1: Find the number of clock cycles required to execute 4 instructions with pipeline method and
without pipeline method for the following instruction structure? If microcontroller frequency is 1GHz then
also find the max operating frequency?

I F D E
I1
I2
Fetch – 2 Clock cycle I3
Decoding – 1 Clock cycle I4

Execution – 2 (I1), 4 (I2), 3 (I3) and 2 (I4) Clock cycles

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 459


MODEL-2 : PIPELINE PROBLEM-1

Non - Pipeline method


I4
I3
F(2) D(1) E(2)
I2
F(2) D(1) E(3)
I1 F(2) D(1) E(4)
F(2) D(1) E(2)
Time

No of Clock Cycle required to execute 4 instructions is


= 5 + 7 + 6 + 5 = 23 Clock cycles

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 460


MODEL-2 : PIPELINE PROBLEM-1

Pipeline method
F F D E E
F F D E E E
F F D E E E E
F F D E E
No of Clock Cycle

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

No of Clock Cycle required to execute 4 instructions is


= 14 Clock Cycles (From diagram)

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 461


MODEL-2 : PIPELINE PROBLEM-1

Pipeline method
F F D E E
F F D E E E
F F D E E E E
F F D E E
No of Clock Cycle

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Maximum operating frequency is given by

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 462


MODEL-2 : PIPELINE PROBLEM-2

Problem 2: Find the number of clock cycles required to execute 4 instructions with pipeline method
and without pipeline method for the following instruction structure? If microcontroller frequency is
1GHz then also find the max operating frequency?

I F D E I1
I2
I3
Fetch – 1 Clock cycle
I4
Decoding – 1 Clock cycle
Execution – 2 (I1), 1 (I2), 1 (I3) and 2 (I4) Clock cycles

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 463


MODEL-2 : PIPELINE PROBLEM-2

Non - Pipeline method


I4
I3
F(1) D(1) E(2)
I2
F(1) D(1) E(1)
I1 F(1) D(1) E(1)
F(1) D(1) E(2)
Time

No of Clock Cycle required to execute 4 instructions is


= 4 + 3 + 3 + 4 = 14 Clock cycles

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 464


MODEL-2 : PIPELINE PROBLEM-2

Pipeline method
F D E E
F D E
F D E
F D E E

No of Clock Cycle

0 1 2 3 4 5 6 7 8

No of Clock Cycle required to execute 4 instructions is


= 8 Clock Cycles (From diagram)

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 465


MODEL-2 : PIPELINE PROBLEM-2

Pipeline method
F D E E
F D E
F D E
F D E E

No of Clock Cycle

0 1 2 3 4 5 6 7 8
Maximum operating frequency is given by

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 466


MODEL-2 : PIPELINE PROBLEM ASSIGNMENT-1

Assignment - 1: Find the number of clock cycles required to execute 5 instructions with pipeline method
and without pipeline method for the following instruction structure? If microcontroller frequency is 2 GHz
then also find the max operating frequency?

I1
I F D E
I2
I3
Fetch – 1 Clock cycle I4
Decoding – 1 Clock cycle I5

Execution – 3 (I1), 4 (I2), 2 (I3), 1 (I4) and 2 (I5) Clock cycles

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 467


MODEL-3 PROBLEMS

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 468


MODEL-3 : PIPELINE PROBLEM-1

Problem - 1: If the pipeline is flushed for every 3 instructions then find the number of clock cycles required
to execute 9 instructions with pipeline method?
I1
I2
I F (1) D (1) E (4) I3
I4
Fetch - 1 Clock cycle I5
I6
Decoding - 1 Clock cycle
I7
Execution - 4 Clock cycle I8
I9

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 469


MODEL-3 : PIPELINE PROBLEM-1

Pipeline method

F D E E E E
F D E E E E So on …
F D E E E E
F D E E E E
F D E E E E
F D E E E E
No of Clock Cycle

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 470


MODEL-3 : PIPELINE PROBLEM-1

Pipeline method
F D E E E E
F D E E E E
F D E E E E

No of Clock Cycle

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
No of Clock Cycle required to execute 3 instructions is
= (No of clocks required for 1st instruction)+ ((no of instruction -1) x (difference between two instruction))
= 6 + ((3-1) x 4) = 6 + (2 x 4) = 6 + 8 = 14 Clock cycles

Total = 14 + 14 + 14 = 42 Clock cycles

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 471


MODEL-3 : PIPELINE PROBLEM-2

Problem - 2: If the pipeline is flushed for every 10 instructions then find the number of clock cycles
required to execute 40 instructions with pipeline method? I1
I2
I F (1) D (1) E (4) I3
I4
I5
Fetch - 1 Clock cycle
I6
Decoding - 1 Clock cycle I7
Execution - 4 Clock cycle I8
I9
I10

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 472


MODEL-3 : PIPELINE PROBLEM-2

Pipeline method

F D E E E E
F D E E E E So on …
F D E E E E

F D E E E E
F D E E E E
F D E E E E
No of Clock Cycle

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 473


MODEL-3 : PIPELINE PROBLEM-2

Pipeline method
F D E E E E
F D E E E E Up to 10 instructions
F D E E E E

No of Clock Cycle

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
No of Clock Cycle required to execute 10 instructions is
= (No of clocks required for 1st instruction)+ ((no of instruction -1) x (difference between two instruction))
= 6 + ((10-1) x 4) = 6 + (9 x 4) = 6 + 36 = 42 Clock cycles

Total = 42 + 42 + 42 + 42 = 168 Clock cycles

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 474


MODEL-3 : PIPELINE PROBLEM-3

Problem - 3: If the pipeline is flushed for every 10 instructions then find the number of clock cycles
required to execute 41 instructions with pipeline method? I1
I2
I F (1) D (1) E (4) I3
I4
I5
Fetch - 1 Clock cycle
I6
Decoding - 1 Clock cycle I7
Execution - 4 Clock cycle I8
I9
I10

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 475


MODEL-3 : PIPELINE PROBLEM-3

Pipeline method

F D E E E E
F D E E E E So on …
F D E E E E

F D E E E E
F D E E E E
F D E E E E
No of Clock Cycle

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 476


MODEL-3 : PIPELINE PROBLEM-3

Pipeline method
F D E E E E
F D E E E E Up to 10 instructions
F D E E E E

No of Clock Cycle

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
No of Clock Cycle required to execute 10 instructions is
= (No of clocks required for 1st instruction)+ ((no of instruction -1) x (difference between two instruction))
= 6 + ((10-1) x 4) = 6 + (9 x 4) = 6 + 36 = 42 Clock cycles

Total = 4 x 42 + 6 = 168 + 6 = 174 Clock cycles

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 477


MODEL-3 : PIPELINE PROBLEM-4

Problem - 4: If the pipeline is flushed for every 10 instructions then find the number of clock cycles
required to execute 141 instructions with pipeline method? I1
I2
I F (2) D (1) E (3) I3
I4
I5
Fetch - 2 Clock cycle
I6
Decoding - 1 Clock cycle I7
Execution - 3 Clock cycle I8
I9
I10

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 478


MODEL-3 : PIPELINE PROBLEM-4

Pipeline method

F F D E E E
F F D E E E So on …
F F D E E E

F F D E E E
F F D E E E
F F D E E E
No of Clock Cycle

0 1 2 3 4 5 6 7 8 9 10 11 12

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 479


MODEL-3 : PIPELINE PROBLEM-4

Pipeline method
F F D E E E
F F D E E E
Up to 10 instructions
F F D E E E

No of Clock Cycle

0 1 2 3 4 5 6 7 8 9 10 11 12
No of Clock Cycle required to execute 10 instructions is
= (No of clocks required for 1st instruction)+ ((no of instruction -1) x (difference between two instruction))
= 6 + ((10-1) x 3) = 6 + (9 x 3) = 6 + 27 = 33 Clock cycles

No of Clock Cycle required to execute 141 instructions = 14 x 33 + 6 = 462 + 6 = 468 Clock cycles

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 480


MODEL-3 : PIPELINE PROBLEM ASSIGNMENT-1

Problem - 1: If the pipeline is flushed for every 15 instructions then find the number of clock cycles
required to execute 1501 instructions with pipeline method? I1
I2
I F (2) D (1) E (4) I3
I4
I5
Fetch - 2 Clock cycle
I6
Decoding - 1 Clock cycle I7
Execution - 4 Clock cycle I8
I9
I10

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 481


MODEL-4 PROBLEMS

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 482


MODEL-4 : PIPELINE PROBLEM-1

Problem 1: Find the number of clock cycles required to execute 10 instructions with
pipeline method and without pipeline method for the following instruction structure? I1
Improve the pipeline structure. I2
I3
I F (1) D (1) E (4) I4
I5
Fetch - 1 Clock cycle I6
I7
Decoding - 1 Clock cycle
I8
Execution - 4 Clock cycle
I9
I10

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 483


MODEL-4 : PIPELINE PROBLEM-1

Non - Pipeline method I5


I4
F(1) D(1) E(4)
I3
I2 F(1) D(1) E(4) So on …

I1 F(1) D(1) E(4)


F(1) D(1) E(4)
F(1) D(1) E(4)
Time

No of Clock Cycle required to execute 10 instructions is


= (No of instructions) x (Total no of required for single instruction)
= 10 x 6 = 60 Clock cycles

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 484


MODEL-4 : PIPELINE PROBLEM-1
Less efficient design of pipeline
Pipeline method I F (1) D (1) E (4)

F D E E E E
F D E E E E So on …

F D E E E E

No of Clock Cycle

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

No of Clock Cycle required to execute 10 instructions is


= (No of clocks required for 1st instruction)+ ((no of instruction -1) x (difference between two instruction))
= 6 + ((10 - 1) x 4) = 6 + (9 x 4) = 6 + 36 = 42 Clock cycles

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 485


MODEL-4 : PIPELINE PROBLEM-1
More efficient design of pipeline:
Pipeline method I F(1) D(1) Ex1 (2) Ex2 (2) Break the execution in two parts
of two cycle each

F D Ex1 Ex1 Ex2 Ex2


F D Ex1 Ex1 Ex2 Ex2 So on …

F D Ex1 Ex1 Ex2 Ex2

No of Clock Cycle

0 1 2 3 4 5 6 7 8 9 10

No of Clock Cycle required to execute 10 instructions is


= (No of clocks required for 1st instruction)+ ((no of instruction -1) x (difference between two instruction))
= 6 + ((10 - 1) x 2) = 6 + (9 x 2) = 6 + 18 = 24 Clock cycles

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 486


MODEL-4 : PIPELINE PROBLEM-2

Problem 2: Find the number of clock cycles required to execute 100 instructions with
pipeline method and without pipeline method for the following instruction structure? I1
Improve the pipeline structure. I2
I3
I F (2) D (1) E (6) I4
I5
Fetch - 2 Clock cycle I6
I7
Decoding - 1 Clock cycle
I8
Execution - 6 Clock cycle
I9
I10

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 487


MODEL-4 : PIPELINE PROBLEM-2

Non - Pipeline method I5


I4
F(2) D(1) E(6)
I3
I2 F(2) D(1) E(6) So on …

I1 F(2) D(1) E(6)


F(2) D(1) E(6)
F(2) D(1) E(6)
Time

No of Clock Cycle required to execute 100 instructions is


= (No of instructions) x (Total no of required for single instruction)
= 100 x 9 = 900 Clock cycles

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 488


MODEL-4 : PIPELINE PROBLEM-2

Pipeline method I F (2) D (1) E (6)

F F D E E E E E E
F F D E E E E E E So on …

No of Clock Cycle

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

No of Clock Cycle required to execute 100 instructions is


= (No of clocks required for 1st instruction)+ ((no of instruction -1) x (difference between two instruction))
= 9 + ((100 - 1) x 6) = 9 + (99 x 6) = 9 + 594 = 603 Clock cycles

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 489


MODEL-4 : PIPELINE PROBLEM-2

Pipeline method I F(2) D(1) Ex1 (3) Ex2 (3)

F F D Ex1 Ex1 Ex1 Ex2 Ex2 Ex2

F F D Ex1 Ex1 Ex1 Ex2 Ex2 Ex2 So on …

No of Clock Cycle

0 1 2 3 4 5 6 7 8 9 10 11 12

No of Clock Cycle required to execute 100 instructions is


= (No of clocks required for 1st instruction)+ ((no of instruction -1) x (difference between two instruction))
= 9 + ((100 - 1) x 3) = 9 + (99 x 3) = 9 + 297 = 306 Clock cycles

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 490


MODEL-4 : PIPELINE PROBLEM ASSIGNMENT-1

Problem - 1: Find the number of clock cycles required to execute 5432 instructions with pipeline
method and without pipeline method for the following instruction structure ? Improve the
I1
pipeline structure.
I2
I3
I F (2) D (1) E (8) I4
I5
I6
Fetch - 2 Clock cycle
I7
Decoding - 1 Clock cycle
I8
Execution - 8 Clock cycle I9
I10

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 491


MODEL-5 PROBLEMS

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 492


Problem-:

Consider a pipeline having 4 phases with duration 60, 50, 90 and 80 ns. Given latch delay is 10 ns.
Calculate-
1. Pipeline cycle time
2. Non-pipeline execution time
3. Speed up ratio
4. Pipeline time for 1000 instructions
5. Sequential time for 1000 instructions
6. Throughput
Solution:

Given-
o Four stage pipeline is used
o Delay of stages = 60, 50, 90 and 80 ns
o Latch delay or delay due to each register = 10 ns

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 493


Clock S1 60ns

S1 60ns Latch 10ns

S2 50ns S2 50ns Every stage output


will be saved in
S3 90ns Latch 10ns the Latch or register

S3 90ns
S4 80ns

Latch 10ns

S4 80ns

Non-Pipelined Architecture Latch 10ns

Pipelined Architecture
Note: In any stage of pipeline, the output of each stage will be moved to the
next state after the 100 ns (max(60,50,90,80) + 10 ns)

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 494


Part-01: Pipeline Cycle Time-

Cycle time = Maximum delay due to any stage + Delay due to its register (Latch)
= Max { 60, 50, 90, 80 } + 10 ns
= 90 ns + 10 ns
= 100 ns

Part-02: Non-Pipeline Execution Time-

Non-pipeline execution time for one instruction = 60 ns + 50 ns + 90 ns + 80 ns


= 280 ns

Part-03: Speed Up Ratio-

Speed up = Non-pipeline execution time / Pipeline execution time


= 280 ns / Cycle time
= 280 ns / 100 ns
= 2.8

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 495


Part-04: Pipeline Time For 1000 Instructions-

Pipeline time for 1000 instructions


= Time taken for 1st instruction + Time taken for remaining 999 instructions
= 1 x 4 clock cycles + 999 x 1 clock cycle
= 4 x cycle time + 999 x cycle time
= 4 x 100 ns + 999 x 100 ns
= 400 ns + 99900 ns
= 100300 ns

Part-05: Sequential Time For 1000 Instructions-

Non-pipeline time for 1000 tasks


= 1000 x Time taken for one instruction
= 1000 x 280 ns
= 280000 ns

Part-06: Throughput-
Throughput for pipelined execution = Number of instructions executed per unit time
= 1000 instructions / 100300 ns

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 496


DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 497
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 498
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 499
Scalar to Superscalar pipeline

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 500


Diversified Pipeline

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 501


DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 502
Loosely Coupled Multiprocessor system
It is a type of multiprocessing system in which, There is distributed memory instead of shared
memory.
In loosely coupled multiprocessor system, data rate is low rather than tightly coupled
multiprocessor system.
In loosely coupled multiprocessor system, modules are connected through MTS (Message
transfer system) network.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 503


Loosely Coupled Multiprocessor system

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 504


Tightly Coupled Multiprocessor system
It is a type of multiprocessing system in which there is shared memory.
In tightly coupled multiprocessor system, data rate is high rather than loosely coupled
multiprocessor system.
In tightly coupled multiprocessor system, modules are connected through PMIN, IOPIN and ISIN
networks.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 505


Tightly Coupled Multiprocessor system

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 506


Differences
Loosely coupled Tightly coupled
There is distributed memory in loosely coupled There is shared memory, in tightly coupled
multiprocessor system multiprocessor system
Has low data rate Has high data rate
The cost of this system is less It is more costly
Modules are connected through Message transfer While there is PMIN, IOPIN and ISIN networks
system network
Memory conflicts don’t take place This system have memory conflicts
It has low degree of interaction between tasks It has high degree of interaction between tasks
there is direct connection between processor and IOPIN helps connection between processor and I/O
I/O devices devices
Applications of loosely coupled multiprocessor are Applications of tightly coupled multiprocessor are in
in distributed computing systems parallel processing systems

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 507


Symmetric Multiprocessor system
SMP systems have centralized shared memory called main memory (MM) operating under a
single operating system with two or more homogeneous processors.
Usually each processor has an associated private high-speed memory known as cache
memory (or cache) to speed up the main memory data access and to reduce the system bus
traffic.
Processors may be interconnected using buses, crossbar switches or on-chip mesh networks.
The bottleneck in the scalability of SMP using buses or crossbar switches is the bandwidth and
power consumption of the interconnect among the various processors, the memory, and the
disk arrays.
Mesh architectures avoid these bottlenecks, and provide nearly linear scalability to much
higher processor counts at the sacrifice of programmability

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 508


Symmetric Multiprocessor system
SMP systems allow any processor to work on
any task no matter where the data for that task
is located in memory, provided that each task
in the system is not in execution on two or
more processors at the same time.
With proper operating system support, SMP
systems can easily move tasks between
processors to balance the workload efficiently.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 509


UMA
UMA stands for Uniform Memory Access; it is a shared memory architecture
for the multiprocessors.
Single memory controller is used and accessed by all the processors with the
help of the interconnection network.
Each processor has equal memory accessing time (latency) and access
speed.
It can employ either of the single bus, multiple bus or crossbar switch.
As it provides balanced shared memory access, it is also known as SMP
(Symmetric multiprocessor) systems.
Uniform Memory Access is slower than non-uniform Memory Access.
Uniform Memory Access has limited bandwidth.
Uniform Memory Access is applicable for general purpose applications and
time-sharing applications.
In uniform Memory Access, memory access time is balanced or equal.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 510


NUMA
NUMA stands for Non-Uniform Memory Access; it is a multiprocessor model
in which each processor is connected with a dedicated memory.
However, these small parts of the memory combine to make a single
address space.
Unlike UMA, the access time of the memory relies on the distance where
the processor is placed  which means varying memory access time.
It allows access to any of the memory location by using the physical address.
NUMA is intended to increase the available bandwidth to the memory and
for which it uses multiple memory controllers.
It combines numerous machine cores into “nodes” where each core has a
memory controller.
To access the local memory in a NUMA machine the core retrieves the
memory managed by the memory controller by its node.
While to access the remote memory which is handled by the other memory
controller, the core sends the memory request through the interconnection
links.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 511


Memories: Specification
How much?
◦ Capacity

How fast?
◦ Time is money

How expensive?

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 512


Hierarchy List

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 513


DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 514
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 515
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 516
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 517
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 518
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 519
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 520
So you want fast?
It is possible to build a computer which uses only static RAM (see later)
This would be very fast
This would need no cache
◦ How can you cache cache?

This would cost a very large amount

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 521


Locality of Reference
During the course of the execution of a program, memory references tend to cluster
e.g. loops

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 522


Cache
Small amount of fast memory
Sits between normal main memory and CPU
May be located on CPU chip or module

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 523


Cache memory
Cache memory is intended to give
memory speed approaching that of the
fastest memories available
At the same time provide a large
memory size at the price of less
expensive types of semiconductor
memories.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 524


Cache/Main Memory Structure

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 525


DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 526
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 527
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 528
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 529
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 530
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 531
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 532
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 533
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 534
Exercise
Consider the main memory size is 8GB and cache memory is 64MB. Let the size of each block be
4KB. Perform fully associative mapping with suitable diagrams assuming that the cache is full
with values from the RAM. If CPU is trying is access the memory location 4357, find out whether
there will be a cache hit or miss.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 535


DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 536
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 537
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 538
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 539
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 540
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 541
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 542
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 543
Exercise
Consider the main memory size is 32GB and cache memory is 16MB. Let the size of each block
be 1MB. Perform direct mapping with suitable diagrams assuming that the cache is full with
values from the RAM. If CPU is trying is access the memory location 9893, find out whether
there will be a cache hit or miss.

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 544


DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 545
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 546
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 547
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 548
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 549
WEB Materials-1
Direct Mapping
https://www.youtube.com/watch?v=VePK5TNgQU8&t=1579s

Fully Associative Mapping


https://www.youtube.com/watch?v=3eriC-pIQKg

Set-Associative Mapping
https://www.youtube.com/watch?v=mCF5XNn_xfA
https://www.youtube.com/watch?v=j5PUJllPPVE&t=893s
https://www.youtube.com/watch?v=1J_DhymCJok

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 550


WEB Materials-2
Direct Mapping
https://www.youtube.com/watch?v=QcAaP5V2Gpc&authuser=1

Fully Associative Mapping


https://www.youtube.com/watch?v=vWxtmci1Nko&authuser=1

Set-Associative Mapping
https://www.youtube.com/watch?v=vWxtmci1Nko&authuser=1

DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 551


DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 552
DR. ELLISON | COMPUTER ARCHITECTURE (ECE2015) | VIT-AP 553

You might also like