Computer Architecture and Organization
Computer Architecture and Organization
Objectives
1
Overview
• A modern computer is an electronic, digital, general purpose
computing machine that automatically follows a step-by-step
list of instructions to solve a problem. This step-by step list of
instructions that a computer follows is also called an algorithm
or a computer program.
• Why study computer organization and architecture?
– Design better programs, including system software such as compilers,
operating systems, and device drivers.
– Optimize program behavior.
– Evaluate (benchmark) computer system performance.
– Understand time, space, and price tradeoffs.
• Computer organization
– Encompasses all physical aspects of computer systems.
– E.g., circuit design, control signals, memory types.
– How does a computer work?
2
Computer architecture
• Focuses on the structure(the way in which the components are interrelated) and
behavior of the computer system and refers to the logical aspects of system
implementation as seen by the programmer
• Computer architecture includes many elements such as
– instruction sets and formats, operation codes, data types, the number and types
of registers, addressing modes, main memory access methods, and
various I/O mechanisms.
• The architecture of a system directly affects the logical execution of
programs.
• The computer architecture for a given machine is the combination of its
hardware components plus its instruction set architecture (ISA).
• The ISA is the interface between all the software that runs on the machine
and the hard
• Studying computer architecture helps us to answer the question: How
do I design a computer?
3
Overview
• In the case of the IBM, SUN and Intel ISAs, it is possible to
purchase processors which execute the same instructions
from more than one manufacturer
• All these processors may have quite different internal
organizations but they all appear identical to a programmer,
because their instruction sets are the same
• Organization & Architecture enables a family of computer
models
– Same Architecture, but with differences in Organization
– Different price and performance characteristics
• When technology changes, only organization changes
– This gives code compatibility (backwards)
4
Principle of Equivalence
• No clear distinction between matters related to computer
organization and matters relevant to computer architecture.
• Principle of Equivalence of Hardware and Software
– Anything that can be done with software can also be
done with hardware, and anything that can be done
with hardware can also be done with software.
5
Principle of Equivalence
Since hardware and software are equivalent, what is the
advantage of building digital circuits to perform
specific operations where the circuits, once created,
are frozen?
(Speed)
While computers are extremely fast, every instruction must
be fetched, decoded, and executed. If a program is
constructed out of circuits, then the speed of execution is
equal to the speed that the current flows across the circuits.
6
Principle of Equivalence
Since hardware is so fast, why do we spend so
much time in our society with computers and
software engineering?
Flexibility
Specialized circuits, but once constructed, the programs
are frozen in place.
We have too many general-purpose needs and our most of
the programs that we use tend to evolve over time requiring
replacements.
Replacing software is far cheaper and easier than having to
manufacture and install new chips
7
1.2 Computer Components
8
Measures of capacity and speed
Whether a metric
• Kilo- (K) = 1 thousand = 103 and 210
• Mega- (M) = 1 million = 106 and 220
• Giga- (G) = 1 billion = 109 and 230 refers to a power of
• Tera- (T) = 1 trillion = 1012 and 240
• Peta- (P) = 1 quadrillion = 1015 and 250 10 or a power of 2
•
•
Exa- (E)
Zetta-(Z)
=
=
1
1
quintillion = 1018 and 260
sextillion = 1021 and 270
typically depends
• Yotta-(Y) = 1 septillion = 1024 and 280 upon what is being
measured.
• Hertz = clock cycles per second (frequency)
– 1MHz = 1,000,000Hz
– Processor speeds are measured in MHz or GHz.
• Byte = a unit of storage
– 1KB = 210 = 1024 Bytes
– 1MB = 220 = 1,048,576 Bytes
– Main memory (RAM) is measured in MB
– Disk storage is measured in GB for small systems, TB for large systems.
10
1.3 An Example System
• We note that cycle
Measures of time and space: time is the reciprocal
• Milli- (m) = 1 thousandth = 10 -3 of clock frequency.
• Micro- () = 1 millionth = 10 -6
• Nano- (n) = 1 billionth = 10 -9 • A bus operating at
• Pico- (p) = 1 trillionth = 10 -12 133MHz has a cycle
time of 7.52
• Femto- (f) = 1 quadrillionth = 10 -15 nanoseconds
• Atto- (a) = 1 quintillionth = 10 -18
• Zepto- (z) = 1 sextillionth = 10 -21
• Yocto- (y) = 1 septillionth = 10 -24
13
1.3 An Example System
• Serial ports send data as a series of pulses along
one or two data lines.
• Parallel ports send data as a single pulse along at
least eight data lines.
• USB, Universal Serial Bus, is an intelligent serial
interface that is self-configuring. (It supports “plug
and play.”)
18
1st Generation Computers
– Used vacuum tubes for logic and storage (very little storage
available)
– A vacuum-tube circuit storing 1 byte
– Programmed in machine language
– Often programmed by physical connection (hardwiring)
– Slow, unreliable, expensive
– The ENIAC – often thought of as the first programmable
electronic computer – 1946
– 17468 vacuum tubes, 1800 square feet, 30 tons
21
2nd Generation Computers
• Transistors replaced vacuum tubes
• Magnetic core memory introduced
– Changes in technology brought about cheaper and more reliable
computers (vacuum tubes were very unreliable)
– Because these units were smaller, they were closer together providing
a speedup over vacuum tubes
– Various programming languages introduced (assembly, high-level)
– Rudimentary OS developed
• The first supercomputer was introduced, CDC 6600 ($10 million)
22
3rd Generation Computers
Integrated circuit (IC)
The ability to place circuits onto silicon chips
– Replaced both transistors and magnetic core memory
– Result was easily mass-produced components reducing the
cost of computer manufacturing significantly
– Also increased speed and memory capacity
– Computer families introduced
– Minicomputers introduced
– More sophisticated programming languages and OS developed
• Popular computers included PDP-8, PDP-11, IBM 360 and Cray
produced their first supercomputer, Cray-1
– Silicon chips now contained both logic (CPU) and
memory
– Large-scale computer usage led to time-sharing OS
23
4th Generation Computers
1971-Present: Microprocessors
• Miniaturization took over
– From SSI (10-100 components per chip) to
– MSI (100-1000), LSI (1,000-10,000), VLSI (10,000+)
• Thousands of ICs were built onto a single silicon chip(VLSI),
which allowed Intel, in 1971, to
– create the world’s first microprocessor, the 4004, which was a fully
functional, 4-bit system that ran at 108KHz.
– Intel also introduced the RAM chip, accommodating 4Kb of
memory on a single chip. This allowed computers of the 4th
generation to become smaller and faster than their solid-
state predecessors
– Computers also saw the development of GUIs, the mouse
and handheld devices
24
Moore’s Law
• How small can we make transistors?
• How densely can we pack chips?
• No one can say for sure
• In 1965, Intel founder Gordon Moore stated,
“The density of transistors in an integrated circuit will
double every year.”
• The current version of this prediction is usually conveyed as “the
density of silicon chips doubles every 18 months”
• Using current technology, Moore’s Law cannot hold forever
• There are physical and financial limitations
• At the current rate of miniaturization, it would take about 500
years to put the entire solar system on a chip
• Cost may be the ultimate constraint
25
Rock’s Law
• Arthur Rock, is a corollary to Moore’s law:
“The cost of capital equipment to build semiconductor will
double every four years”
• Rock’s Law arises from the observations of a financier who has seen
the price tag of new chip facilities escalate from about $12,000 in
1968 to $12 million in the late 1990s.
• At this rate, by the year 2035, not only will the size of a memory
element be smaller than an atom, but it would also require the entire
wealth of the world to build a single chip!
• So even if we continue to make chips smaller and faster, the ultimate
question may be whether we can afford to build them
26
The Computer Level Hierarchy
• Through the principle of abstraction, we can imagine the machine to
be built from a hierarchy of levels, in which each level has a specific
function and exists as a distinct hypothetical Machine
• Abstraction is the ability to focus on important aspects of a
situation at a higher level while ignoring the underlying complex
details
• We call the hypothetical computer at each level a virtual machine.
• Each level’s virtual machine executes its own particular set of
instructions, calling upon machines at lower levels to carry out the
tasks when necessary
27
1.6 The Computer Level Hierarchy
Level 6: The User Level
• Composed of applications and is the level with which everyone is
most familiar.
• At this level, we run programs such as word processors, graphics
packages, or games. The lower levels are nearly invisible from the
User Level.
28
Level 5: High-Level Language Level
– The level with which we interact when we write
programs in languages such as C, Pascal, Lisp, and
Java
– These languages must be translated to a language the
machine can understand. (using compiler / interpreter)
– Compiled languages are translated into assembly
language and then assembled into machine code. (They
are translated to the next lower level.)
– The user at this level sees very little of the lower levels
29
Level 4: Assembly Language Level
– Acts upon assembly language produced from Level 5,
as well as instructions programmed directly at this level
– As previously mentioned, compiled higher-level
languages are first translated to assembly, which is then
directly translated to machine language. This is a one-to-
one translation, meaning that one assembly language
instruction is translated to exactly one machine language
instruction.
– By having separate levels, we reduce the semantic gap
between a high-level language and the actual machine
language
30
Level 3: System Software Level
31
Level 2: Machine Level
– Consists of instructions (ISA)that are particular to
the architecture of the machine
– Programs written in machine language need no
compilers, interpreters, or assemblers
33
The Von Neumann Architecture
Named after John von Neumann, Princeton, he designed
a computer architecture whereby data and instructions
would be retrieved from memory, operated on by an
ALU, and moved back to memory (or I/O)
This architecture is the basis for most modern computers
(only parallel processors and a few other unique
architectures use a different model)
34
Hardware consists of 3 units
CPU (control unit, ALU, registers)
Memory (stores programs and data)
I/O System (including secondary storage)
Instructions in memory are executed sequentially unless a
program instruction explicitly changes the order
35
Von Neumann Architectures
• There is a single pathway used to move both data
and instructions between memory, I/O and CPU
– the pathway is implemented as a bus
– the single pathway creates a bottleneck
• known as the von Neumann bottleneck
36
Fetch-execute cycle
• The von Neumann architecture operates on the
fetch-execute cycle
– Fetch an instruction from memory as indicated by the
Program Counter register
– Decode the instruction in the control unit
– Data operands needed for the instruction are fetched
from memory
– Execute the instruction in the ALU storing the result in
a register
– Move the result back to memory if needed
37
1.7 The von Neumann Model
• This is a general
depiction of a von
Neumann system:
• These computers
employ a fetch-
decode-execute
cycle to run
programs as
follows . . .
38
The von Neumann Model
• The control unit fetches the next instruction from memory using
the program counter to determine where the instruction is located
39
The von Neumann Model
• The instruction is decoded into a language that the ALU
can understand.
40
The von Neumann Model
• Any data operands required to execute the instruction
are fetched from memory and placed into registers
within the CPU.
41
The von Neumann Model
• The ALU executes the instruction and places results in
registers or memory.
42
Non-von Neumann Models
• Conventional stored-program computers have
undergone many incremental improvements over the
years
– specialized buses
– floating-point units
– cache memories
• But enormous improvements in computational power
require departure from the classic von Neumann
architecture
– Adding processors is one approach
43
Non-von Neumann Models
• In the late 1960s, high-performance computer systems were equipped with dual
processors to increase computational throughput.
• In the 1970s supercomputer systems were introduced with 32 processors.
• Supercomputers with 1,000 processors were built in the 1980s.
• In 1999, IBM announced its Blue Gene system containing over 1 million processors.
44
Computer Performance Measures
Program Execution Time
For a specific program compiled to run on a specific machine “A”, the following
parameters are provided:
– The total instruction count of the program.
– The average number of cycles per instruction (average CPI).
– Clock cycle of machine “A”
How can one measure the performance of this machine running this program?
– The machine is said to be faster or has better performance running this program if the
total execution time is shorter.
– Thus the inverse of the total measured program execution time is a possible
performance measure or metric:
• CPU execution time is the product of the above three parameters as follows:
CPU time = Seconds = Instructions x Cycles x Seconds
Program Program Instruction Cycle
T = I x CPI x C
55
Example
• A Program is running on a specific machine with the following
parameters:
– Total executed instruction count: 10,000,000 instructions
– Average CPI for the program: 2.5 cycles/instruction.
– CPU clock rate: 200 MHz.
What is the execution time for this program?
56
Example
• From the previous example: A Program is running on a specific machine
with the following parameters:
– Total executed instruction count, I: 10,000,000 instructions
– Average CPI for the program: 2.5 cycles/instruction.
– CPU clock rate: 200 MHz.
• Using the same program with these changes:
– A new compiler used: New instruction count 9,500,000
New CPI: 3.0
– Faster CPU implementation: New clock rate = 300 MHZ
• What is the speedup with the changes?
– Speedup = Old Execution Time = Iold x CPIold x Clock cycleold
New Execution Time Inew x CPInew x Clock Cyclenew
1
CENTRAL PROCESSING UNIT
• Introduction
• General Register Organization
• Stack Organization
• Instruction Formats
• Addressing Modes
• Data Transfer and Manipulation
• Program Control
• Reduced Instruction Set Computer (RISC)
2
MAJOR COMPONENTS OF CPU
• Storage Components:
Registers
Flip-flops
• Execution (Processing)
Components:
• Arithmetic Logic Unit (ALU): Arithmetic
calculations, Logical computations,
Shifts/Rotates
• Transfer Components: Bus
• Control Components: Control Unit
3
GENERAL REGISTER ORGANIZATION
Clock Input
R1
R2
R3
R4
R5
R6
R7
Load
(7 lines)
SELA { MUX MUX } SELB
3x8 A bus B bus
decoder
SELD
OPR ALU
Output
4
OPERATION OF CONTROL UNIT
The control unit directs the information flow through ALU by:
- Selecting various Components in the system
- Selecting the Function of ALU
Example: R1 <- R2 + R3
[1] MUX A selector (SELA): BUS A R2
[2] MUX B selector (SELB): BUS B R3
[3] ALU operation selector (OPR): ALU to ADD
[4] Decoder destination selector (SELD): R1 Out Bus
3 3 3 5
Control Word SELA SELB SELD OPR
ALU CONTROL
Encoding of ALU operations OPR
Select Operation Symbol
00000 Transfer A TSFA
00001 Increment A INCA
00010 ADD A + B ADD
00101 Subtract A - B SUB
00110 Decrement A DECA
01000 AND A and B AND
01010 OR A and B OR
01100 XOR A and B XOR
01110 Complement A COMA
10000 Shift right A SHRA
11000 Shift left A SHLA
FULL EMPTY
Stack pointer 4
SP C 3
B 2
A 1
Push, Pop operations 0
DR
/* Initially, SP = 0, EMPTY = 1, FULL = 0 */
PUSH POP
SP SP + 1 DR M[SP]
M[SP] DR SP SP - 1
If (SP = 0) then (FULL 1) If (SP = 0) then (EMPTY 1)
EMPTY 0 FULL 0
7
MEMORY STACK ORGANIZATION
1000
Program
Memory with Program, Data, PC (instructions)
and Stack Segments
Data
AR (operands)
SP 3000
stack
3997
3998
3999
4000
- A portion of memory is used as a stack with a 4001
processor register as a stack pointer DR
- PUSH: SP SP - 1
M[SP] DR
- POP: DR M[SP]
SP SP + 1
8
REVERSE POLISH NOTATION
Arithmetic Expressions: A + B
A+B Infix notation
+AB Prefix or Polish notation
AB+ Postfix or reverse Polish notation
- The reverse Polish notation is very suitable for stack
manipulation
(3 * 4) + (5 * 6) 34*56*+
6
4 5 5 30
3 3 12 12 12 12 42
3 4 * 5 6 * +
9
Instruction Format
INSTRUCTION FORMAT
Instruction Fields
OP-code field - specifies the operation to be performed
Address field - designates memory address(s) or a processor register(s)
Mode field - specifies the way the operand or the effective address is
determined
Program to evaluate X = (A + B) * (C + D) :
ADD R1, A, B /* R1 M[A] + M[B] */
ADD R2, C, D /* R2 M[C] + M[D] */
MUL X, R1, R2 /* M[X] R1 * R2 */
Two-Address Instructions:
Program to evaluate X = (A + B) * (C + D) :
Addressing Modes:
13
TYPES OF ADDRESSING MODES
Implied Mode
Address of the operands are specified implicitly
in the definition of the instruction
- No need to specify address in the instruction
- EA = AC, or EA = Stack[SP], EA: Effective Address.
Immediate Mode
Instead of specifying the address of the operand,
operand itself is specified
- No need to specify address in the instruction
- However, operand itself needs to be specified
- Sometimes, require more bits than the address
- Fast to acquire an operand
Register Mode
Address specified in the instruction is the register address
- Designated operand need to be in a register
- Shorter address than the memory address
- Saving address field in the instruction
- Faster to acquire an operand than the memory addressing
- EA = IR(R) (IR(R): Register field of IR)
14
TYPES OF ADDRESSING MODES
Register Indirect Mode
Instruction specifies a register which contains
the memory address of the operand
- Saving instruction bits since register address
is shorter than the memory address
- Slower to acquire an operand than both the
register addressing or memory addressing
- EA = [IR(R)] ([x]: Content of x)
15
TYPES OF ADDRESSING MODES
Direct Address Mode
Instruction specifies the memory address which
can be used directly to the physical memory
- Faster than the other memory addressing modes
- Too many bits are needed to specify the address
for a large physical memory space
- EA = IR(address), (IR(address): address field of IR)
16
TYPES OF ADDRESSING MODES
Relative Addressing Modes
The Address fields of an instruction specifies the part of the address
(abbreviated address) which can be used along with a
designated register to calculate the address of the operand
17
ADDRESSING MODES - EXAMPLES
Address Memory
200 Load to AC Mode
PC = 200 201 Address = 500
202 Next instruction
R1 = 400
399 450
XR = 100
400 700
AC
500 800
600 900
Addressing Effective Content
Mode Address of AC
Direct address 500 /* AC (500) */ 800 702 325
Immediate operand - /* AC 500 */ 500
Indirect address 800 /* AC ((500)) */ 300
Relative address 702 /* AC (PC+500) */ 325 800 300
Indexed address 600 /* AC (XR+500) */ 900
Register - /* AC R1 */ 400
Register indirect 400 /* AC (R1) */ 700
Autoincrement 400 /* AC (R1)+ */ 700
Autodecrement 399 /* AC -(R) */ 450
18
DATA TRANSFER INSTRUCTIONS
Typical Data Transfer Instructions
Name Mnemonic
Load LD
Store ST
Move MOV
Exchange XCH
Input IN
Output OUT
Push PUSH
Pop POP
Data Transfer Instructions with Different Addressing Modes
Assembly
Mode Convention Register Transfer
Direct address LD ADR AC M[ADR]
Indirect address LD @ADR AC M[M[ADR]]
Relative address LD $ADR AC M[PC + ADR]
Immediate operand LD #NBR AC NBR
Index addressing LD ADR(X) AC M[ADR + XR]
Register LD R1 AC R1
Register indirect LD (R1) AC M[R1]
Autoincrement LD (R1)+ AC M[R1], R1 R1 + 1
Autodecrement LD -(R1) R1 R1 - 1, AC M[R1]
19
DATA MANIPULATION INSTRUCTIONS
Three Basic Types: Arithmetic instructions
Logical and bit manipulation instructions
Shift instructions
Arithmetic Instructions
Name Mnemonic
Increment INC
Decrement DEC
Add ADD
Subtract SUB
Multiply MUL
Divide DIV
Add with Carry ADDC
Subtract with Borrow SUBB
Negate(2’s Complement) NEG
22
SUBROUTINE CALL AND RETURN
SUBROUTINE CALL Call subroutine
Jump to subroutine
Branch to subroutine
Branch and save return address
Two Most Important Operations are Implied;
23
PROGRAM INTERRUPT
Types of Interrupts:
External interrupts
External Interrupts initiated from the outside of CPU and Memory
- I/O Device -> Data transfer request or Data transfer complete
- Timing Device -> Timeout
- Power Failure
Software Interrupts
Both External and Internal Interrupts are initiated by the computer Hardware.
Software Interrupts are initiated by texecuting an instruction.
- Supervisor Call -> Switching from a user mode to the supervisor mode
-> Allows to execute a certain class of operations
which are not allowed in the user mode
24
INTERRUPT PROCEDURE
Interrupt Procedure and Subroutine Call
- The interrupt is usually initiated by an internal or
an external signal rather than from the execution of
an instruction (except for the software interrupt)
25
RISC: REDUCED INSTRUCTION SET
COMPUTERS
Historical Background
IBM System/360, 1964
- The real beginning of modern computer architecture
- Distinction between Architecture and Implementation
- Architecture: The abstract structure of a computer
seen by an assembly-language programmer
Compiler -program
High-Level Instruction
Language Hardware
Set
Architecture Implementation
27
PHYLOSOPHY OF RISC
Reduce the semantic gap between
machine instruction and microinstruction
1-Cycle instruction
Most of the instructions complete their execution
in 1 CPU clock cycle - like a microoperation
* Functions of the instruction (contrast to CISC)
- Very simple functions
- Very simple instruction format
- Similar to microinstructions
=> No need for microprogrammed control
* Register-Register Instructions
- Avoid memory reference instructions except
Load and Store instructions
- Most of the operands can be found in the
registers instead of main memory
=> Shorter instructions
=> Uniform instruction cycle
=> Requirement of large number of registers
* Employ instruction pipeline
28
CHARACTERISTICS OF RISC
Common RISC Characteristics
- Operations are register-to-register, with only LOAD and
STORE accessing memory
- The operations and addressing modes are reduced
Instruction formats are simple
29
CHARACTERISTICS OF RISC
RISC Characteristics
- Relatively few instructions
- Relatively few addressing modes
- Memory access limited to load and store instructions
- All operations done within the registers of the CPU
- Fixed-length, easily decoded instruction format
- Single-cycle instruction format
- Hardwired rather than microprogrammed control
30
Memory Organization
Outline
Memory Hierarchy
Cache
Cache performance
Memory Hierarchy
The memory unit is an essential component
in any digital computer since it is needed for
storing programs and data
Not all accumulated information is needed by
the CPU at the same time
Therefore, it is more economical to use low-
cost storage devices to serve as a backup for
storing the information that is not currently
used by CPU
Memory Hierarchy
Since 1980, CPU has outpaced DRAM
Contents
● Direct Memory Access
● Strobe Control
● Handshaking
● Priority Interrupt
Input-Output Organization ● Input-Output Processor
● Serial Communication
● I/O Controllers
2
I/O Interface
3
Need for I/O Interface
● Input–output interface provides a method for transferring information between internal storage and external
I/O devices.
● Peripherals connected to a computer need special communication links for interfacing them with the CPU
and each peripheral.
● The purpose of the communication link is to resolve the differences that exist between the central computer
and each peripheral.
● The major differences are:
1. Peripherals are electromechanical and electromagnetic devices and their manner of operation is
different from the operation of the CPU and memory, which are electronic devices. Therefore, a
conversion of signal values may be required.
2. The data transfer rate of peripherals is usually slower than the transfer rate of the CPU, and
consequently, a synchronization mechanism may be needed.
3. Data codes and formats in peripherals differ from the word format in the CPU and memory.
4. The operating modes of peripherals are different from each other and each must be controlled so as
not to disturb the operation of other peripherals connected to the CPU.
4
I/O Interface
● To resolve these differences, computer systems include special hardware components between the CPU and
peripherals to supervise and synchronize all input and output transfers.
● These components are called interface units because they interface between the processor bus and the
peripheral device.
● To attach an input–output device to CPU and input–output interface, circuit is placed between the device and
the system bus.
● The main function of input–output interface circuit are data conversion, synchronization and device
selection.
5
I/O Bus and Interface Modules
● The I/O bus consists of data lines, address lines, and control lines.
● Each peripheral device has associated with it an interface unit.
● Each interface decodes the address and control received from the I/O bus, interprets them for the peripheral,
and provides signals for the peripheral controller.
● It also synchronizes the data flow and supervises the transfer between peripheral and processor.
6
I/O Bus and Interface Modules
● Each peripheral has its own controller that operates the particular electromechanical device.
● To communicate with a particular device, the processor places a device address on the address lines.
● Each interface attached to the I/O bus contains an address decoder that monitors the address lines.
● When the interface detects its own address, it activates the path between the bus lines and the device that it
controls.
● All other peripherals are disabled by their interface.
7
I/O Command
● At the same time that the address is made available in the address lines, the processor provides a function
code in the control lines.
● The function code (I/O command) is an instruction that is executed in the interface and its attached
peripheral unit.
● There are four types of commands that an interface may receive.
1. A control command is issued to activate the peripheral and to inform it what to do.
2. A status command is used to test various status conditions in the interface and the peripheral.
3. A data output command causes the interface to respond by transferring data from the bus into one of
its registers.
4. The data input command allows the interface receives an item of data from the peripheral and places it
in its buffer register.
● The processor checks if data are available by means of a status command and then issues a data input
command.
● The interface places the data on the data lines, where they are accepted by the processor.
8
I/O versus Memory Bus
● In addition to communicating with I/O, the processor must communicate with the memory unit.
● Like the I/O bus, the memory bus contains data, address, and read/write control lines.
● There are three ways that computer buses can be used to communicate with memory and I/O:
1. Use two separate buses, one for memory and the other for I/O. (IOP)
2. Use one common bus for both memory and I/O but have separate control lines for each.(Isolated I/O)
3. Use one common bus for memory and I/O with common control lines.(Memory Mapped I/O)
9
Isolated I/O
● Isolated I/O uses one common bus to transfer information between memory or I/O and the CPU.
● The distinction between a memory transfer and I/O transfer is made through separate read and write lines.
● The I/O read and I/O write control lines are enabled during an I/O transfer. The memory read and memory
write control lines are enabled during a memory transfer.
● This configuration isolates all I/O interface addresses from the addresses assigned to memory and is referred
to as the isolated I/O method for assigning addresses in a common bus.
● The isolated I/O method isolates memory and I/O addresses so that memory address values are not affected
by interface address assignment since each has its own address space
10
Memory Mapped I/O
● Memory mapped I/O use the same address space for both memory and I/O.
● It employ only one set of read and write signals and do not distinguish between memory and I/O addresses.
● The computer treats an interface register as being part of the memory system.
● The assigned addresses for interface registers cannot be used for memory words, which reduces the memory
address range available.
● There are no specific input or output instructions.
● The CPU can manipulate I/O data residing in interface registers with the same instructions that are used to
manipulate memory words.
● The advantage is that the load and store instructions used for reading and writing from memory can be used
to input and output data from I/O registers.
11
Modes Of Transfer
12
Modes Of Transfer
● Data transfer between the central computer and I/O devices may be handled in a variety of modes.
● Data transfer to and from peripherals may be handled in one of three possible modes:
1. Programmed I/O
2. Interrupt-initiated I/O
3. Direct memory access (DMA)
13
Programmed I/O
● Programmed I/O operations are the result of I/O instructions
written in the computer program.
● Each data item transfer is initiated by an instruction in the program.
● Transferring data under program control requires constant
monitoring of the peripheral by the CPU.
● Once a data transfer is initiated, the CPU is required to monitor the
interface to see when a transfer can again be made.
● Disadvantage : the CPU stays in a program loop until the I/O unit
indicates that it is ready for data transfer. This is a time-consuming
process since it keeps the processor busy needlessly.
14
Interrupt-Initiated I/O
● An alternative to the CPU constantly monitoring the flag is to let the interface inform the computer when it is
ready to transfer data.
● This mode of transfer uses the interrupt facility.
● While the CPU is running a program, it does not check the flag.
● However, when the flag is set, the computer is momentarily interrupted from proceeding with the current
program and is informed of the fact that the flag has been set.
● The CPU deviates from what it is doing to take care of the input or output transfer.
● After the transfer is completed, the computer returns to the previous program to continue what it was doing
before the interrupt.
● The CPU responds to the interrupt signal by storing the return address from the program counter into a
memory stack and then control branches to a service routine that processes the required I/O transfer.
● There are two methods for getting the branch address.
● One is called vectored interrupt and the other, non vectored interrupt.
● In a nonvectored interrupt, the branch address is assigned to a fixed location in memory.
● In a vectored interrupt, the interrupt source supplies the branch information to the computer. This
information is called the interrupt vector.
15
Direct Memory Access
16
Direct Memory Access
● In DMA, the CPU is idle and the peripheral device manage the memory buses directly.
● DMA controller takes over the buses to manage the transfer directly between the I/O device and memory.
● The DMA transfer can be made in several ways.
● Burst Transfer - A block of memory words is transferred in a continuous burst at a time. Used for fast devices
such as magnetic disks, where data transmission cannot be stopped or slowed down until an entire block is
transferred.
● Cycle Stealing - Allows the DMA controller to transfer one data word at a time, after which it must return
control of the buses to the CPU. The CPU merely delays its operation for one memory cycle to allow the
direct memory I/O transfer to “steal” one memory cycle.
17
Working of DMA
● The Bus Request (BR) input is used by the DMA controller to request the CPU to relinquish control of the
buses.
● When this input is active, the CPU terminates the execution of the current instruction and places the address
bus, the data bus, and the read and write lines into a high-impedance state.(like an open circuit)
● The CPU activates the Bus Grant (BG) output to inform the external DMA that the buses are in the
high-impedance state.(available)
● The DMA takes control of the buses to conduct memory transfers without processor intervention.
● When the DMA terminates the transfer, it disables the bus request line and the CPU disables the bus grant,
takes control of the buses, and returns to its normal operation.
18
DMA Controller
● The registers in the DMA are selected by the CPU
through the address bus by enabling the DMA Select
(DS) and Register Select (RS) inputs.
● When the BG (bus grant) = 0, the CPU can
communicate with the DMA registers.
● When BG = 1 DMA can communicate directly with the
memory.
● The DMA controller has three registers.
● The Address register contains an address to specify the
desired location in memory.
● The Word Count Register holds the number of words
to be transferred.
● The Control Register specifies the mode of transfer.
● The DMA communicates with the external peripheral
through the request and acknowledge lines.
19
DMA Transfer
➜ I/O Device sends a DMA request.
➜ DMA Controller activates the BR line.
➜ CPU responds with BG line.
➜ The DMA puts the current value of its address
register into the address bus, initiates the RD
or WR signal, and sends a DMA acknowledge
to the I/O device.
➜ I/O device puts a word in the data bus (for
write) or receives a word from the data bus
(for read).
➜ The peripheral unit can then communicate
with memory through the data bus for direct
transfer between the two units while the CPU
is momentarily disabled.
20
DMA Transfer
➜ For each word that is transferred, the DMA
increments its address register and
decrements its word count register.
➜ If the word count does not reach zero, the
DMA checks the request line coming from the
I/O Device.
➜ If there is no request ,the DMA disables BR so
that the CPU continues to execute its own
program.
➜ When CPU requests another transfer, DMA
requests bus again.
➜ If the word count register reaches zero, the
DMA stops any further transfer and removes
its bus request. It also informs the CPU of the
termination by an interrupt.
21
Data Transfer
22
Synchronous v/s Asynchronous Data Transfer
23
Strobe Control and Handshaking
● Asynchronous data transfer between two independent units requires that control signals be transmitted
between the communicating units to indicate the time at which data is being transmitted.
● One way of achieving this is by means of a strobe pulse supplied by one of the units to indicate to the other
unit when the transfer has to occur.
● Another method commonly used is to accompany each data item being transferred with a control signal that
indicates the presence of data in the bus.
● The unit receiving the data item responds with another control signal to acknowledge receipt of the data.
● This type of agreement between two independent units is referred to as handshaking.
● The strobe pulse method and the handshaking method of asynchronous data transfer are not restricted to
I/O transfers.
24
Strobe Control
● It employs a single control line to time each transfer.
● The strobe may be activated by either the source or the destination unit.
● The data bus carries the binary information from source to the destination.
● The strobe is a single line that informs the destination unit when a valid
data word is available in the bus.
● The source unit first places the data on the data bus.
● After a brief delay to ensure that the data settle to a steady value, the
source activates the strobe pulse.
● The information on the data bus and the strobe signal remain in the active state for a sufficient time period
to allow the destination unit to receive the data.
● The source removes the data from the bus a brief period after it disables its strobe pulse.
● The destination unit activates the strobe pulse, informing the source to provide the data.
● The source unit responds by placing the requested binary information on the data bus.
● The data must be valid and remain in the bus long enough for the destination unit to accept it.
● The destination unit then disables the strobe.
● The source removes the data from the bus after a predetermined time interval.
25
Hand Shaking
The disadvantage of the strobe method
● The source unit that initiates the transfer has no way of knowing whether the destination unit has actually
received the data item that was placed in the bus.
● A destination unit that initiates the transfer has no way of knowing whether the source unit has actually
placed the data on the bus.
Two-wire control
● The handshake method introduced a second control signal that provides a reply to the unit that initiates the
transfer.
● One control line is in the same direction as the data flow in the bus from the source to the destination.
● It is used by the source unit to inform the destination unit whether there are valid data in the bus.
● The other control line is in the other direction from the destination to the source.
● It is used by the destination unit to inform the source whether it can accept data.
● The sequence of control during the transfer depends on the unit that initiates the transfer.
● This provides a high degree of flexibility and reliability because the successful completion of a data transfer
relies on active participation by both units.
● If one unit is faulty, the data transfer will not be completed.(Detected by a time-out mechanism)
26
Source-initiated transfer using strobe control.
● The two handshaking lines are data valid(source unit) and data accepted (destination unit).
● The source unit initiates the transfer by placing the data on the bus and enabling its data valid signal.
● The data accepted signal is activated by the destination unit after it accepts the data from the bus.
● The source unit then disables its data valid signal, which invalidates the data on the bus.
● The destination unit then disables its data accepted signal and the system goes into its initial state.
● The source does not send the next data item until after the destination unit shows its readiness to accept
new data by disabling its data accepted signal.
● This scheme allows delays from one state to the next and permits each unit to respond at its own data
transfer rate.
27
Destination-initiated transfer using strobe control.
● The source unit in this case does not place data on the bus until after it receives the ready for data signal
from the destination unit.
● From there on, the handshaking procedure follows the same pattern as in the source-initiated case.
● The sequence of events in both cases would be identical if we consider the ready for data signal as the
complement of data accepted.
● The only difference between the source-initiated and the destination-initiated transfer is in their choice of
initial state.
28
Priority Interrupt
29
Priority Interrupt
● In a typical application a number of I/O devices are attached to the computer, with each device being able to
originate an interrupt request.
● The first task of the interrupt system is to identify the source of the interrupt.
● Several sources may request service simultaneously. In this case the system must also decide which device to
service first.
● A priority interrupt is a system that establishes a priority over the various sources to determine which
condition is to be serviced first when two or more requests arrive simultaneously.
● The system may also determine which conditions are permitted to interrupt the computer while another
interrupt is being serviced.
● Devices with high speed transfers such as magnetic disks are given high priority, and slow devices such as
keyboards receive low priority.
● When two devices interrupt the computer at the same time, the computer services the device, with the
higher priority first.
● Methods:
1. Software - Polling
2. Hardware - Daisy chaining, Parallel Priority.
30
Polling
● A polling procedure is used to identify the highest-priority source by software means.
● In this method there is one common branch address for all interrupts.
● The program that takes care of interrupts begins at the branch address and polls the interrupt sources in
sequence.
● The order in which they are tested determines the priority of each interrupt.
● The highest priority source is tested first, and if its interrupt signal is on, control branches to a service routine
for this source.
● Otherwise, the next-lower-priority source is tested, and so on.
● Disadvantage: If there are many interrupts, the time required to poll them can exceed the time available to
service the I/O device. In this situation a hardware priority-interrupt unit can be used to speed up the
operation.
● A hardware priority-interrupt unit accepts interrupt requests from many sources, determines which of the
incoming requests has the highest priority, and issues an interrupt request to the computer based on this
determination.
● To speed up the operation, each interrupt source has its own interrupt vector to access its own service
routine directly. Thus no polling is required.
31
Daisy Chaining
● The hardware priority function can be established by either a serial or a parallel connection of interrupt lines.
● The serial connection is also known as the daisy chaining method.
● It consists of a serial connection of all devices that request an interrupt.
● The device with the highest priority is placed in the first position, followed by lower-priority devices up to the
device with the lowest priority, which is placed last in the chain.
32
Daisy Chaining
● The interrupt request line is common to all devices and forms a wired logic connection.
● If any device has its interrupt signal in the low-level state, the interrupt line goes to the low-level state and
enables the interrupt input in the CPU.
● When no interrupts arc pending, the interrupt line stays in the high-level state and no interrupts are
recognized by the CPU.
● The CPU responds to an interrupt request by enabling the interrupt acknowledge line.
● This signal is received by device 1 at its PI (priority in) input.
● The acknowledge signal passes on to the next device through the PO (priority out) output only if device 1 is
not requesting an interrupt.
33
Daisy Chaining
● If device 1 has a pending interrupt, it blocks the acknowledge signal from the next device by placing a 0 in the
PO output.
● It then proceeds to insert its own interrupt vector address (VAD) into the data bus for the CPU to use during
the interrupt cycle.
● A device with a 0 in its PI input generates a 0 in its PO output to inform the next-lower-priority device that
the acknowledge signal has been blocked.
● A device that is requesting an interrupt and has a 1 in its PI input will intercept the acknowledge signal by
placing a 0 in its PO output.
● If the device does not have pending interrupts, it transmits the acknowledge signal to the next device.
34
Input Output Processor (IOP)
35
Input Output Processor
● The computer system with IOP can be divided into 3 separate modules:
1. The Memory unit
2. The CPU and
3. one or more IOPS.
● The IOP can fetch and execute its own instructions.
● IOP instructions are specifically designed to facilitate I/O transfers.
● The IOP can also perform other processing tasks, such as arithmetic, logic, branching, and code translation.
36
CPU-IOP Communication
● Instructions that are read from memory by an IOP are
sometimes called commands, to distinguish them from
instructions that are read by the CPU.
● The CPU sends an instruction to test the IOP path.
● The IOP responds by inserting a status word in memory for
the CPU to check.
● The bits of the status word indicate the condition of the IOP
and I/O device (IOP overload condition, device busy with
another transfer, or device ready for I/O transfer).
● The CPU refers to the status word in memory to decide
what to do next.
● If all is in order, the CPU sends the instruction to start I/O
transfer.
● The memory address received with this instruction tells the
IOP where to find program.
37
CPU-IOP Communication
● The CPU can now continue with another program while the
IOP is busy with the I/O program.
● Both programs refer to memory by means of DMA transfer.
● When the IOP terminates the execution of its program, it
sends an interrupt request to the CPU.
● The CPU responds to the interrupt by issuing an instruction
to read the status from the IOP.
● The IOP responds by placing the contents of its status
report into a specified memory location.
● The status word indicates whether the transfer has been
completed or if any errors occurred during the transfer.
● From inspection of the bits in the status word, the CPU
determines if the I/O operation was completed
satisfactorily without errors.
38
www.teachics.org
The Computer Science Learning Platform.