0% found this document useful (0 votes)
7 views151 pages

Computer Architecture and Organization

Chapter 1 outlines the objectives of understanding computer organization and architecture, emphasizing the differences between the two concepts. It discusses the evolution of computers from vacuum tubes to microprocessors, highlighting key technological advancements and their impact on performance and programming. The chapter also introduces the von Neumann architecture and the fetch-execute cycle, which are foundational to modern computing systems.

Uploaded by

itechecho6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views151 pages

Computer Architecture and Organization

Chapter 1 outlines the objectives of understanding computer organization and architecture, emphasizing the differences between the two concepts. It discusses the evolution of computers from vacuum tubes to microprocessors, highlighting key technological advancements and their impact on performance and programming. The chapter also introduces the von Neumann architecture and the fetch-execute cycle, which are foundational to modern computing systems.

Uploaded by

itechecho6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 151

Chapter 1

Objectives

• Know the difference between computer organization and


computer architecture.
• Understand units of measure common to computer systems
• Appreciate the evolution of computers.
• Understand the computer as a layered system.
• Be able to explain the von Neumann architecture and
the function of basic computer components.

1
Overview
• A modern computer is an electronic, digital, general purpose
computing machine that automatically follows a step-by-step
list of instructions to solve a problem. This step-by step list of
instructions that a computer follows is also called an algorithm
or a computer program.
• Why study computer organization and architecture?
– Design better programs, including system software such as compilers,
operating systems, and device drivers.
– Optimize program behavior.
– Evaluate (benchmark) computer system performance.
– Understand time, space, and price tradeoffs.
• Computer organization
– Encompasses all physical aspects of computer systems.
– E.g., circuit design, control signals, memory types.
– How does a computer work?
2
Computer architecture
• Focuses on the structure(the way in which the components are interrelated) and
behavior of the computer system and refers to the logical aspects of system
implementation as seen by the programmer
• Computer architecture includes many elements such as
– instruction sets and formats, operation codes, data types, the number and types
of registers, addressing modes, main memory access methods, and
various I/O mechanisms.
• The architecture of a system directly affects the logical execution of
programs.
• The computer architecture for a given machine is the combination of its
hardware components plus its instruction set architecture (ISA).
• The ISA is the interface between all the software that runs on the machine
and the hard
• Studying computer architecture helps us to answer the question: How
do I design a computer?

3
Overview
• In the case of the IBM, SUN and Intel ISAs, it is possible to
purchase processors which execute the same instructions
from more than one manufacturer
• All these processors may have quite different internal
organizations but they all appear identical to a programmer,
because their instruction sets are the same
• Organization & Architecture enables a family of computer
models
– Same Architecture, but with differences in Organization
– Different price and performance characteristics
• When technology changes, only organization changes
– This gives code compatibility (backwards)

4
Principle of Equivalence
• No clear distinction between matters related to computer
organization and matters relevant to computer architecture.
• Principle of Equivalence of Hardware and Software
– Anything that can be done with software can also be
done with hardware, and anything that can be done
with hardware can also be done with software.

5
Principle of Equivalence
Since hardware and software are equivalent, what is the
advantage of building digital circuits to perform
specific operations where the circuits, once created,
are frozen?

(Speed)
While computers are extremely fast, every instruction must
be fetched, decoded, and executed. If a program is
constructed out of circuits, then the speed of execution is
equal to the speed that the current flows across the circuits.

6
Principle of Equivalence
Since hardware is so fast, why do we spend so
much time in our society with computers and
software engineering?

Flexibility
 Specialized circuits, but once constructed, the programs
are frozen in place.
 We have too many general-purpose needs and our most of
the programs that we use tend to evolve over time requiring
replacements.
 Replacing software is far cheaper and easier than having to
manufacture and install new chips

7
1.2 Computer Components

At the most basic level, a computer is a device consisting of 3 pieces


– A processor to interpret and execute programs
– A memory ( Includes Cache, RAM, ROM)
– to store both data and program instructions
– A mechanism for transferring data to and from the outside world.
– I/O to communicate between computer and the world
– Bus to move info from one computer component to another

8
Measures of capacity and speed
Whether a metric
• Kilo- (K) = 1 thousand = 103 and 210
• Mega- (M) = 1 million = 106 and 220
• Giga- (G) = 1 billion = 109 and 230 refers to a power of
• Tera- (T) = 1 trillion = 1012 and 240
• Peta- (P) = 1 quadrillion = 1015 and 250 10 or a power of 2


Exa- (E)
Zetta-(Z)
=
=
1
1
quintillion = 1018 and 260
sextillion = 1021 and 270
typically depends
• Yotta-(Y) = 1 septillion = 1024 and 280 upon what is being
measured.
• Hertz = clock cycles per second (frequency)
– 1MHz = 1,000,000Hz
– Processor speeds are measured in MHz or GHz.
• Byte = a unit of storage
– 1KB = 210 = 1024 Bytes
– 1MB = 220 = 1,048,576 Bytes
– Main memory (RAM) is measured in MB
– Disk storage is measured in GB for small systems, TB for large systems.

10
1.3 An Example System
• We note that cycle
Measures of time and space: time is the reciprocal
• Milli- (m) = 1 thousandth = 10 -3 of clock frequency.
• Micro- () = 1 millionth = 10 -6
• Nano- (n) = 1 billionth = 10 -9 • A bus operating at
• Pico- (p) = 1 trillionth = 10 -12 133MHz has a cycle
time of 7.52
• Femto- (f) = 1 quadrillionth = 10 -15 nanoseconds
• Atto- (a) = 1 quintillionth = 10 -18
• Zepto- (z) = 1 sextillionth = 10 -21
• Yocto- (y) = 1 septillionth = 10 -24

• Millisecond = 1 thousandth of a second


– Hard disk drive access times are often 10 to 20 milliseconds.
• Nanosecond = 1 billionth of a second
– Main memory access times are often 50 to 70 nanoseconds.
• Micron (micrometer) = 1 millionth of a meter
– Circuits on computer chips are measured in microns.
11
1.3 An Example System

• Computers with large main memory capacity can


run larger programs with greater speed than
computers having small memories.
• RAM is an acronym for random access memory.
Random access means that memory contents can
be accessed directly if you know its location.
• Cache is a type of temporary memory that can be
accessed faster than RAM.

13
1.3 An Example System
• Serial ports send data as a series of pulses along
one or two data lines.
• Parallel ports send data as a single pulse along at
least eight data lines.
• USB, Universal Serial Bus, is an intelligent serial
interface that is self-configuring. (It supports “plug
and play.”)

18
1st Generation Computers
– Used vacuum tubes for logic and storage (very little storage
available)
– A vacuum-tube circuit storing 1 byte
– Programmed in machine language
– Often programmed by physical connection (hardwiring)
– Slow, unreliable, expensive
– The ENIAC – often thought of as the first programmable
electronic computer – 1946
– 17468 vacuum tubes, 1800 square feet, 30 tons

21
2nd Generation Computers
• Transistors replaced vacuum tubes
• Magnetic core memory introduced
– Changes in technology brought about cheaper and more reliable
computers (vacuum tubes were very unreliable)
– Because these units were smaller, they were closer together providing
a speedup over vacuum tubes
– Various programming languages introduced (assembly, high-level)
– Rudimentary OS developed
• The first supercomputer was introduced, CDC 6600 ($10 million)

22
3rd Generation Computers
Integrated circuit (IC)
The ability to place circuits onto silicon chips
– Replaced both transistors and magnetic core memory
– Result was easily mass-produced components reducing the
cost of computer manufacturing significantly
– Also increased speed and memory capacity
– Computer families introduced
– Minicomputers introduced
– More sophisticated programming languages and OS developed
• Popular computers included PDP-8, PDP-11, IBM 360 and Cray
produced their first supercomputer, Cray-1
– Silicon chips now contained both logic (CPU) and
memory
– Large-scale computer usage led to time-sharing OS

23
4th Generation Computers
1971-Present: Microprocessors
• Miniaturization took over
– From SSI (10-100 components per chip) to
– MSI (100-1000), LSI (1,000-10,000), VLSI (10,000+)
• Thousands of ICs were built onto a single silicon chip(VLSI),
which allowed Intel, in 1971, to
– create the world’s first microprocessor, the 4004, which was a fully
functional, 4-bit system that ran at 108KHz.
– Intel also introduced the RAM chip, accommodating 4Kb of
memory on a single chip. This allowed computers of the 4th
generation to become smaller and faster than their solid-
state predecessors
– Computers also saw the development of GUIs, the mouse
and handheld devices

24
Moore’s Law
• How small can we make transistors?
• How densely can we pack chips?
• No one can say for sure
• In 1965, Intel founder Gordon Moore stated,
“The density of transistors in an integrated circuit will
double every year.”
• The current version of this prediction is usually conveyed as “the
density of silicon chips doubles every 18 months”
• Using current technology, Moore’s Law cannot hold forever
• There are physical and financial limitations
• At the current rate of miniaturization, it would take about 500
years to put the entire solar system on a chip
• Cost may be the ultimate constraint
25
Rock’s Law
• Arthur Rock, is a corollary to Moore’s law:
“The cost of capital equipment to build semiconductor will
double every four years”
• Rock’s Law arises from the observations of a financier who has seen
the price tag of new chip facilities escalate from about $12,000 in
1968 to $12 million in the late 1990s.
• At this rate, by the year 2035, not only will the size of a memory
element be smaller than an atom, but it would also require the entire
wealth of the world to build a single chip!
• So even if we continue to make chips smaller and faster, the ultimate
question may be whether we can afford to build them

26
The Computer Level Hierarchy
• Through the principle of abstraction, we can imagine the machine to
be built from a hierarchy of levels, in which each level has a specific
function and exists as a distinct hypothetical Machine
• Abstraction is the ability to focus on important aspects of a
situation at a higher level while ignoring the underlying complex
details
• We call the hypothetical computer at each level a virtual machine.
• Each level’s virtual machine executes its own particular set of
instructions, calling upon machines at lower levels to carry out the
tasks when necessary

27
1.6 The Computer Level Hierarchy
Level 6: The User Level
• Composed of applications and is the level with which everyone is
most familiar.
• At this level, we run programs such as word processors, graphics
packages, or games. The lower levels are nearly invisible from the
User Level.

28
Level 5: High-Level Language Level
– The level with which we interact when we write
programs in languages such as C, Pascal, Lisp, and
Java
– These languages must be translated to a language the
machine can understand. (using compiler / interpreter)
– Compiled languages are translated into assembly
language and then assembled into machine code. (They
are translated to the next lower level.)
– The user at this level sees very little of the lower levels

29
Level 4: Assembly Language Level
– Acts upon assembly language produced from Level 5,
as well as instructions programmed directly at this level
– As previously mentioned, compiled higher-level
languages are first translated to assembly, which is then
directly translated to machine language. This is a one-to-
one translation, meaning that one assembly language
instruction is translated to exactly one machine language
instruction.
– By having separate levels, we reduce the semantic gap
between a high-level language and the actual machine
language

30
Level 3: System Software Level

– deals with operating system instructions.


– This level is responsible for multiprogramming,
protecting memory, synchronizing processes,
and various other important functions.
– Often, instructions translated from assembly
language to machine language are passed
through this level unmodified

31
Level 2: Machine Level
– Consists of instructions (ISA)that are particular to
the architecture of the machine
– Programs written in machine language need no
compilers, interpreters, or assemblers

Level 1: Control Level


– A control unit decodes and executes instructions
and moves data through the system.
– Control units can be microprogrammed or hardwired
– A microprogram is a program written in a low-level
language that is implemented by the hardware.
– Hardwired control units consist of hardware that
directly executes machine instruction 32
Level 0: Digital Logic Level
– This level is where we find digital circuits (the chips)
– Digital circuits consist of gates and wires.
– These components implement the mathematical
logic of all other levels

33
The Von Neumann Architecture
Named after John von Neumann, Princeton, he designed
a computer architecture whereby data and instructions
would be retrieved from memory, operated on by an
ALU, and moved back to memory (or I/O)
This architecture is the basis for most modern computers
(only parallel processors and a few other unique
architectures use a different model)

34
Hardware consists of 3 units
 CPU (control unit, ALU, registers)
 Memory (stores programs and data)
 I/O System (including secondary storage)
Instructions in memory are executed sequentially unless a
program instruction explicitly changes the order

35
Von Neumann Architectures
• There is a single pathway used to move both data
and instructions between memory, I/O and CPU
– the pathway is implemented as a bus
– the single pathway creates a bottleneck
• known as the von Neumann bottleneck

– A variation of this architecture is the Harvard architecture


which separates data and instructions into two pathways
– Another variation, used in most computers, is the
system bus version in which there are different buses
between CPU and memory and memory and I/O

36
Fetch-execute cycle
• The von Neumann architecture operates on the
fetch-execute cycle
– Fetch an instruction from memory as indicated by the
Program Counter register
– Decode the instruction in the control unit
– Data operands needed for the instruction are fetched
from memory
– Execute the instruction in the ALU storing the result in
a register
– Move the result back to memory if needed

37
1.7 The von Neumann Model
• This is a general
depiction of a von
Neumann system:

• These computers
employ a fetch-
decode-execute
cycle to run
programs as
follows . . .

38
The von Neumann Model
• The control unit fetches the next instruction from memory using
the program counter to determine where the instruction is located

39
The von Neumann Model
• The instruction is decoded into a language that the ALU
can understand.

40
The von Neumann Model
• Any data operands required to execute the instruction
are fetched from memory and placed into registers
within the CPU.

41
The von Neumann Model
• The ALU executes the instruction and places results in
registers or memory.

42
Non-von Neumann Models
• Conventional stored-program computers have
undergone many incremental improvements over the
years
– specialized buses
– floating-point units
– cache memories
• But enormous improvements in computational power
require departure from the classic von Neumann
architecture
– Adding processors is one approach

43
Non-von Neumann Models
• In the late 1960s, high-performance computer systems were equipped with dual
processors to increase computational throughput.
• In the 1970s supercomputer systems were introduced with 32 processors.
• Supercomputers with 1,000 processors were built in the 1980s.
• In 1999, IBM announced its Blue Gene system containing over 1 million processors.

44
Computer Performance Measures
Program Execution Time
For a specific program compiled to run on a specific machine “A”, the following
parameters are provided:
– The total instruction count of the program.
– The average number of cycles per instruction (average CPI).
– Clock cycle of machine “A”
How can one measure the performance of this machine running this program?
– The machine is said to be faster or has better performance running this program if the
total execution time is shorter.
– Thus the inverse of the total measured program execution time is a possible
performance measure or metric:

PerformanceA = 1 / Execution TimeA

– How to compare performance of different machines?


– What factors affect performance? How to improve performance? 53
Comparing Computer Performance Using
Execution Time
• To compare the performance of two machines “A”, “B” running a given specific
program
PerformanceA = 1 / Execution TimeA
PerformanceB = 1 / Execution TimeB
• Machine A is n times faster than machine B means:
PerformanceA Execution TimeB
Speedup = n = =
PerformanceB Execution TimeA
• Example:
For a given program:
Execution time on machine A: ExecutionA = 1 second
Execution time on machine B: ExecutionB = 10 seconds
PerformanceA / PerformanceB = Execution TimeB / Execution TimeA
= 10 / 1 = 10
The performance of machine A is 10 times the performance of machine B when
running this program, or: Machine A is said to be 10 times faster than machine B
54
when running this program.
CPU Execution Time
The CPU Equation
• A program is comprised of a number of instructions executed , I
– Measured in: instructions/program

• The average instruction takes a number of cycles per instruction (CPI) to


be completed.
– Measured in: cycles/instruction, CPI

• CPU has a fixed clock cycle time C = 1/clock rate


– Measured in: seconds/cycle

• CPU execution time is the product of the above three parameters as follows:
CPU time = Seconds = Instructions x Cycles x Seconds
Program Program Instruction Cycle

T = I x CPI x C
55
Example
• A Program is running on a specific machine with the following
parameters:
– Total executed instruction count: 10,000,000 instructions
– Average CPI for the program: 2.5 cycles/instruction.
– CPU clock rate: 200 MHz.
What is the execution time for this program?

CPU time = Instruction count x CPI x Clock cycle


= 10,000,000 x 2.5 x 1 / clock rate
= 10,000,000 x 2.5 x 5x10-9
= .125 seconds

56
Example
• From the previous example: A Program is running on a specific machine
with the following parameters:
– Total executed instruction count, I: 10,000,000 instructions
– Average CPI for the program: 2.5 cycles/instruction.
– CPU clock rate: 200 MHz.
• Using the same program with these changes:
– A new compiler used: New instruction count 9,500,000
New CPI: 3.0
– Faster CPU implementation: New clock rate = 300 MHZ
• What is the speedup with the changes?
– Speedup = Old Execution Time = Iold x CPIold x Clock cycleold
New Execution Time Inew x CPInew x Clock Cyclenew

Speedup = (10,000,000 x 2.5 x 5x10-9) / (9,500,000 x 3 x 3.33x10-9 )


= .125 / .095 = 1.32
or 32 % faster after changes.
57
58
Chapter 2
Central Processing Unit

1
CENTRAL PROCESSING UNIT
• Introduction
• General Register Organization
• Stack Organization
• Instruction Formats
• Addressing Modes
• Data Transfer and Manipulation
• Program Control
• Reduced Instruction Set Computer (RISC)

2
MAJOR COMPONENTS OF CPU
• Storage Components:
 Registers
 Flip-flops
• Execution (Processing)
Components:
• Arithmetic Logic Unit (ALU): Arithmetic
calculations, Logical computations,
Shifts/Rotates
• Transfer Components: Bus
• Control Components: Control Unit
3
GENERAL REGISTER ORGANIZATION

Clock Input

R1
R2
R3
R4
R5
R6
R7
Load
(7 lines)
SELA { MUX MUX } SELB
3x8 A bus B bus
decoder

SELD
OPR ALU

Output

4
OPERATION OF CONTROL UNIT
The control unit directs the information flow through ALU by:
- Selecting various Components in the system
- Selecting the Function of ALU

Example: R1 <- R2 + R3
[1] MUX A selector (SELA): BUS A  R2
[2] MUX B selector (SELB): BUS B  R3
[3] ALU operation selector (OPR): ALU to ADD
[4] Decoder destination selector (SELD): R1  Out Bus
3 3 3 5
Control Word SELA SELB SELD OPR

Encoding of register selection fields


Binary
Code SELA SELB SELD
000 Input Input None
001 R1 R1 R1
010 R2 R2 R2
011 R3 R3 R3
100 R4 R4 R4
101 R5 R5 R5
110 R6 R6 R6
111 R7 R7 R7
5
Control

ALU CONTROL
Encoding of ALU operations OPR
Select Operation Symbol
00000 Transfer A TSFA
00001 Increment A INCA
00010 ADD A + B ADD
00101 Subtract A - B SUB
00110 Decrement A DECA
01000 AND A and B AND
01010 OR A and B OR
01100 XOR A and B XOR
01110 Complement A COMA
10000 Shift right A SHRA
11000 Shift left A SHLA

Examples of ALU Microoperations


Symbolic Designation
Microoperation SELA SELB SELD OPR Control Word
R1  R2 - R3 R2 R3 R1 SUB 010 011 001 00101
R4  R4  R5 R4 R5 R4 OR 100 101 100 01010
R6  R6 + 1 R6 - R6 INCA 110 000 110 00001
R7  R1 R1 - R7 TSFA 001 000 111 00000
Output  R2 R2 - None TSFA 010 000 000 00000
Output  Input Input - None TSFA 000 000 000 00000
R4  shl R4 R4 - R4 SHLA 100 000 100 11000
R5  0 R5 R5 R5 XOR 101 101 101 01100
6
REGISTER STACK ORGANIZATION
Stack
- Very useful feature for nested subroutines, nested loops control
- Also efficient for arithmetic expression evaluation
- Storage which can be accessed in LIFO
- Pointer: SP
- Only PUSH and POP operations are applicable
stack Address
Register Stack Flags 63

FULL EMPTY

Stack pointer 4
SP C 3
B 2
A 1
Push, Pop operations 0
DR
/* Initially, SP = 0, EMPTY = 1, FULL = 0 */

PUSH POP
SP  SP + 1 DR  M[SP]
M[SP]  DR SP  SP - 1
If (SP = 0) then (FULL  1) If (SP = 0) then (EMPTY  1)
EMPTY  0 FULL  0
7
MEMORY STACK ORGANIZATION
1000
Program
Memory with Program, Data, PC (instructions)
and Stack Segments
Data
AR (operands)

SP 3000
stack
3997
3998
3999
4000
- A portion of memory is used as a stack with a 4001
processor register as a stack pointer DR

- PUSH: SP  SP - 1
M[SP]  DR
- POP: DR  M[SP]
SP  SP + 1

- Most computers do not provide hardware to check


stack overflow (full stack) or underflow(empty stack)

8
REVERSE POLISH NOTATION
Arithmetic Expressions: A + B
A+B Infix notation
+AB Prefix or Polish notation
AB+ Postfix or reverse Polish notation
- The reverse Polish notation is very suitable for stack
manipulation

Evaluation of Arithmetic Expressions


Any arithmetic expression can be expressed in parenthesis-free
Polish notation, including reverse Polish notation

(3 * 4) + (5 * 6)  34*56*+

6
4 5 5 30
3 3 12 12 12 12 42
3 4 * 5 6 * +

9
Instruction Format

INSTRUCTION FORMAT
Instruction Fields
OP-code field - specifies the operation to be performed
Address field - designates memory address(s) or a processor register(s)
Mode field - specifies the way the operand or the effective address is
determined

The number of address fields in the instruction format


depends on the internal organization of CPU

- The three most common CPU organizations:


Single accumulator organization:
ADD X /* AC  AC + M[X] */
General register organization:
ADD R1, R2, R3 /* R1  R2 + R3 */
ADD R1, R2 /* R1  R1 + R2 */
MOV R1, R2 /* R1  R2 */
ADD R1, X /* R1  R1 + M[X] */
Stack organization:
PUSH X /* TOS  M[X] */
ADD
10
THREE, and TWO-ADDRESS INSTRUCTIONS
Three-Address Instructions:

Program to evaluate X = (A + B) * (C + D) :
ADD R1, A, B /* R1  M[A] + M[B] */
ADD R2, C, D /* R2  M[C] + M[D] */
MUL X, R1, R2 /* M[X]  R1 * R2 */

- Results in short programs


- Instruction becomes long (many bits)

Two-Address Instructions:
Program to evaluate X = (A + B) * (C + D) :

MOV R1, A /* R1  M[A] */


ADD R1, B /* R1  R1 + M[B] */
MOV R2, C /* R2  M[C] */
ADD R2, D /* R2  R2 + M[D] */
MUL R1, R2 /* R1  R1 * R2 */
MOV X, R1 /* M[X]  R1 */
11
ONE, and ZERO-ADDRESS INSTRUCTIONS
One-Address Instructions:
- Use an implied AC register for all data manipulation
- Program to evaluate X = (A + B) * (C + D) :
LOAD A /* AC  M[A] */
ADD B /* AC  AC + M[B] */
STORE T /* M[T]  AC */
LOAD C /* AC  M[C] */
ADD D /* AC  AC + M[D] */
MUL T /* AC  AC * M[T] */
STORE X /* M[X]  AC */
Zero-Address Instructions:
- Can be found in a stack-organized computer
- Program to evaluate X = (A + B) * (C + D) :
PUSH A /* TOS  A */
PUSH B /* TOS  B */
ADD /* TOS  (A + B) */
PUSH C /* TOS  C */
PUSH D /* TOS  D */
ADD /* TOS  (C + D) */
MUL /* TOS  (C + D) * (A + B) */
POP X /* M[X]  TOS */
12
ADDRESSING MODES

Addressing Modes:

* Specifies a rule for interpreting or modifying the


address field of the instruction (before the operand
is actually referenced)

* Variety of addressing modes

- to give programming flexibility to the user


- to use the bits in the address field of the
instruction efficiently

13
TYPES OF ADDRESSING MODES
Implied Mode
Address of the operands are specified implicitly
in the definition of the instruction
- No need to specify address in the instruction
- EA = AC, or EA = Stack[SP], EA: Effective Address.

Immediate Mode
Instead of specifying the address of the operand,
operand itself is specified
- No need to specify address in the instruction
- However, operand itself needs to be specified
- Sometimes, require more bits than the address
- Fast to acquire an operand

Register Mode
Address specified in the instruction is the register address
- Designated operand need to be in a register
- Shorter address than the memory address
- Saving address field in the instruction
- Faster to acquire an operand than the memory addressing
- EA = IR(R) (IR(R): Register field of IR)
14
TYPES OF ADDRESSING MODES
Register Indirect Mode
Instruction specifies a register which contains
the memory address of the operand
- Saving instruction bits since register address
is shorter than the memory address
- Slower to acquire an operand than both the
register addressing or memory addressing
- EA = [IR(R)] ([x]: Content of x)

Auto-increment or Auto-decrement features:


Same as the Register Indirect, but:
- When the address in the register is used to access memory, the
value in the register is incremented or decremented by 1 (after or
before the execution of the instruction)

15
TYPES OF ADDRESSING MODES
Direct Address Mode
Instruction specifies the memory address which
can be used directly to the physical memory
- Faster than the other memory addressing modes
- Too many bits are needed to specify the address
for a large physical memory space
- EA = IR(address), (IR(address): address field of IR)

Indirect Addressing Mode


The address field of an instruction specifies the address of a memory
location that contains the address of the operand
- When the abbreviated address is used, large physical memory can
be addressed with a relatively small number of bits
- Slow to acquire an operand because of an additional memory
access
- EA = M[IR(address)]

16
TYPES OF ADDRESSING MODES
Relative Addressing Modes
The Address fields of an instruction specifies the part of the address
(abbreviated address) which can be used along with a
designated register to calculate the address of the operand

PC Relative Addressing Mode(R = PC)


- EA = PC + IR(address)

- Address field of the instruction is short


- Large physical memory can be accessed with a small number of
address bits

Indexed Addressing Mode


XR: Index Register:
- EA = XR + IR(address)
Base Register Addressing Mode
BAR: Base Address Register:
- EA = BAR + IR(address)

17
ADDRESSING MODES - EXAMPLES
Address Memory
200 Load to AC Mode
PC = 200 201 Address = 500
202 Next instruction
R1 = 400

399 450
XR = 100
400 700

AC
500 800

600 900
Addressing Effective Content
Mode Address of AC
Direct address 500 /* AC  (500) */ 800 702 325
Immediate operand - /* AC  500 */ 500
Indirect address 800 /* AC  ((500)) */ 300
Relative address 702 /* AC  (PC+500) */ 325 800 300
Indexed address 600 /* AC  (XR+500) */ 900
Register - /* AC  R1 */ 400
Register indirect 400 /* AC  (R1) */ 700
Autoincrement 400 /* AC  (R1)+ */ 700
Autodecrement 399 /* AC  -(R) */ 450

18
DATA TRANSFER INSTRUCTIONS
Typical Data Transfer Instructions
Name Mnemonic
Load LD
Store ST
Move MOV
Exchange XCH
Input IN
Output OUT
Push PUSH
Pop POP
Data Transfer Instructions with Different Addressing Modes
Assembly
Mode Convention Register Transfer
Direct address LD ADR AC M[ADR]
Indirect address LD @ADR AC  M[M[ADR]]
Relative address LD $ADR AC  M[PC + ADR]
Immediate operand LD #NBR AC  NBR
Index addressing LD ADR(X) AC  M[ADR + XR]
Register LD R1 AC  R1
Register indirect LD (R1) AC  M[R1]
Autoincrement LD (R1)+ AC  M[R1], R1  R1 + 1
Autodecrement LD -(R1) R1  R1 - 1, AC  M[R1]

19
DATA MANIPULATION INSTRUCTIONS
Three Basic Types: Arithmetic instructions
Logical and bit manipulation instructions
Shift instructions
Arithmetic Instructions
Name Mnemonic
Increment INC
Decrement DEC
Add ADD
Subtract SUB
Multiply MUL
Divide DIV
Add with Carry ADDC
Subtract with Borrow SUBB
Negate(2’s Complement) NEG

Logical and Bit Manipulation Instructions Shift Instructions


Name Mnemonic Name Mnemonic
Clear CLR Logical shift right SHR
Complement COM Logical shift left SHL
AND AND Arithmetic shift right SHRA
OR OR Arithmetic shift left SHLA
Exclusive-OR XOR Rotate right ROR
Clear carry CLRC Rotate left ROL
Set carry SETC Rotate right thru carry RORC
Complement carry COMC Rotate left thru carry ROLC
Enable interrupt EI
Disable interrupt DI
20
PROGRAM CONTROL INSTRUCTIONS
+1
In-Line Sequencing
(Next instruction is fetched from the
PC next adjacent location in the memory)

Address from other source; Current Instruction, Stack, etc


Branch, Conditional Branch, Subroutine, etc
Program Control Instructions
Name Mnemonic
Branch BR
Jump JMP
Skip SKP
Call CALL * CMP and TST instructions do not retain their
Return RTN results of operations(- and AND, respectively).
Compare(by - ) CMP They only set or clear certain Flags.
Test (by AND) TST

Status Flag Circuit A B


8 8
c7
c8 8-bit ALU
F7 - F0
V Z S C
F7
Check for 8
zero output
F 21
CONDITIONAL BRANCH INSTRUCTIONS
Mnemonic Branch condition Tested condition
BZ Branch if zero Z=1
BNZ Branch if not zero Z=0
BC Branch if carry C=1
BNC Branch if no carry C=0
BP Branch if plus S=0
BM Branch if minus S=1
BV Branch if overflow V=1
BNV Branch if no overflow V=0
Unsigned compare conditions (A - B)
BHI Branch if higher A>B
BHE Branch if higher or equal AB
BLO Branch if lower A<B
BLOE Branch if lower or equal AB
BE Branch if equal A=B
BNE Branch if not equal AB
Signed compare conditions (A - B)
BGT Branch if greater than A>B
BGE Branch if greater or equal AB
BLT Branch if less than A<B
BLE Branch if less or equal AB
BE Branch if equal A=B
BNE Branch if not equal AB

22
SUBROUTINE CALL AND RETURN
SUBROUTINE CALL Call subroutine
Jump to subroutine
Branch to subroutine
Branch and save return address
Two Most Important Operations are Implied;

* Branch to the beginning of the Subroutine


- Same as the Branch or Conditional Branch

* Save the Return Address to get the address


of the location in the Calling Program upon
exit from the Subroutine
- Locations for storing Return Address: CALL
SP  SP - 1
• Fixed Location in the subroutine(Memory) M[SP]  PC
• Fixed Location in memory PC  EA
• In a processor Register
• In a memory stack RTN
- most efficient way PC  M[SP]
SP  SP + 1

23
PROGRAM INTERRUPT
Types of Interrupts:
External interrupts
External Interrupts initiated from the outside of CPU and Memory
- I/O Device -> Data transfer request or Data transfer complete
- Timing Device -> Timeout
- Power Failure

Internal interrupts (traps)


Internal Interrupts are caused by the currently running program
- Register, Stack Overflow
- Divide by zero
- OP-code Violation
- Protection Violation

Software Interrupts
Both External and Internal Interrupts are initiated by the computer Hardware.
Software Interrupts are initiated by texecuting an instruction.
- Supervisor Call -> Switching from a user mode to the supervisor mode
-> Allows to execute a certain class of operations
which are not allowed in the user mode

24
INTERRUPT PROCEDURE
Interrupt Procedure and Subroutine Call
- The interrupt is usually initiated by an internal or
an external signal rather than from the execution of
an instruction (except for the software interrupt)

- The address of the interrupt service program is


determined by the hardware rather than from the
address field of an instruction

- An interrupt procedure usually stores all the


information necessary to define the state of CPU
rather than storing only the PC.

The state of the CPU is determined from;


Content of the PC
Content of all processor registers
Content of status bits
Many ways of saving the CPU state depending on the CPU architectures

25
RISC: REDUCED INSTRUCTION SET
COMPUTERS
Historical Background
IBM System/360, 1964
- The real beginning of modern computer architecture
- Distinction between Architecture and Implementation
- Architecture: The abstract structure of a computer
seen by an assembly-language programmer

Compiler -program
High-Level Instruction
Language Hardware
Set
Architecture Implementation

Continuing growth in semiconductor memory and microprogramming


-> A much richer and complicated instruction sets
=> CISC(Complex Instruction Set Computer)
- Arguments advanced at that time
Richer instruction sets would simplify compilers
Richer instruction sets would alleviate the software crisis
- move as much functions to the hardware as possible
- close Semantic Gap between machine language
and the high-level language
Richer instruction sets would improve the architecture quality
26
COMPLEX INSTRUCTION SET COMPUTERS: CISC

High Performance General Purpose Instructions


Characteristics of CISC:
1. A large number of instructions (from 100-250 usually)
2. Some instructions that performs a certain tasks are not used frequently.
3. Many addressing modes are used (5 to 20)
4. Variable length instruction format.
5. Instructions that manipulate operands in memory.

27
PHYLOSOPHY OF RISC
Reduce the semantic gap between
machine instruction and microinstruction
1-Cycle instruction
Most of the instructions complete their execution
in 1 CPU clock cycle - like a microoperation
* Functions of the instruction (contrast to CISC)
- Very simple functions
- Very simple instruction format
- Similar to microinstructions
=> No need for microprogrammed control
* Register-Register Instructions
- Avoid memory reference instructions except
Load and Store instructions
- Most of the operands can be found in the
registers instead of main memory
=> Shorter instructions
=> Uniform instruction cycle
=> Requirement of large number of registers
* Employ instruction pipeline
28
CHARACTERISTICS OF RISC
Common RISC Characteristics
- Operations are register-to-register, with only LOAD and
STORE accessing memory
- The operations and addressing modes are reduced
Instruction formats are simple

29
CHARACTERISTICS OF RISC
RISC Characteristics
- Relatively few instructions
- Relatively few addressing modes
- Memory access limited to load and store instructions
- All operations done within the registers of the CPU
- Fixed-length, easily decoded instruction format
- Single-cycle instruction format
- Hardwired rather than microprogrammed control

More RISC Characteristics


-A relatively large numbers of registers in the processor unit.
-Efficient instruction pipeline
-Compiler support: provides efficient translation of high-level language
programs into machine language programs.
Advantages of RISC
- VLSI Realization
- Computing Speed
- Design Costs and Reliability
- High Level Language Support

30
Memory Organization
Outline
 Memory Hierarchy
 Cache
 Cache performance
Memory Hierarchy
 The memory unit is an essential component
in any digital computer since it is needed for
storing programs and data
 Not all accumulated information is needed by
the CPU at the same time
 Therefore, it is more economical to use low-
cost storage devices to serve as a backup for
storing the information that is not currently
used by CPU
Memory Hierarchy
 Since 1980, CPU has outpaced DRAM

Gap grew 50% per


year
Memory Hierarchy

Q. How do architects address this gap?

A. Put smaller, faster “cache” memories


between CPU and DRAM. Create a
“memory hierarchy”.
Memory Hierarchy
 The memory unit that directly communicate
with CPU is called the main memory
 Devices that provide backup storage are
called auxiliary memory

 The memory hierarchy system consists of all


storage devices employed in a computer
system from the slow by high-capacity
auxiliary memory to a relatively faster main
memory, to an even smaller and faster cache
memory
Memory Hierarchy
 The main memory occupies a central position by being able to
communicate directly with the CPU and with auxiliary memory
devices through an I/O processor
 A special very-high-speed memory called cache is used to
increase the speed of processing by making current programs
and data available to the CPU at a rapid rate
Memory Hierarchy
 CPU logic is usually faster than main memory access
time, with the result that processing speed is limited
primarily by the speed of main memory
 The cache is used for storing segments of programs
currently being executed in the CPU and temporary
data frequently needed in the present calculations
 The typical access time ratio between cache and
main memory is about 1 to 7~10
 Auxiliary memory access time is usually 1000 times
that of main memory
Main Memory
 Most of the main memory in a general
purpose computer is made up of RAM
integrated circuits chips, but a portion of the
memory may be constructed with ROM chips

 RAM– Random Access memory


 In tegated RAM are available in two possible
operating modes, Static and Dynamic
 ROM– Read Only memory
Random-Access Memory
(RAM)
 Static RAM (SRAM)
 Each cell stores bit with a six-transistor circuit.
 Retains value indefinitely, as long as it is kept powered.
 Relatively insensitive to disturbances such as electrical noise.
 Faster (8-16 times faster) and more expensive (8-16 times more
expensice as well) than DRAM.

 Dynamic RAM (DRAM)


 Each cell stores bit with a capacitor and transistor.
 Value must be refreshed every 10-100 ms.
 Sensitive to disturbances.
 Slower and cheaper than SRAM.
SRAM vs DRAM Summary
Tran. Access
per bit time Persist? Sensitive? Cost Applications

SRAM 6 1X Yes No 100x cache memories

DRAM 1 10X No Yes 1X Main memories,


frame buffers

 Virtually all desktop or server computers since


1975 used DRAMs for main memory and
SRAMs for cache
ROM
 ROM is used for storing programs that are
PERMENENTLY resident in the computer
and for tables of constants that do not
change in value once the production of the
computer is completed
 The ROM portion of main memory is needed
for storing an initial program called bootstrap
loader, witch is to start the computer
software operating when power is turned off
Main Memory
 A RAM chip is better suited for
communication with the CPU if it has one or
more control inputs that select the chip when
needed

 The Block diagram of a RAM chip is shown


next slide, the capacity of the memory is 128
words of 8 bits (one byte) per word
RAM
ROM
Memory Address Map
 Memory Address Map is a pictorial representation of
assigned address space for each chip in the system

 To demonstrate an example, assume that a computer


system needs 512 bytes of RAM and 512 bytes of
ROM

 The RAM have 128 byte and need seven address


lines, where the ROM have 512 bytes and need 9
address lines
Memory Address Map
Memory Address Map
 The hexadecimal address assigns a range of
hexadecimal equivalent address for each chip

 Line 8 and 9 represent four distinct binary


combination to specify which RAM we chose

 When line 10 is 0, CPU selects a RAM. And


when it’s 1, it selects the ROM
Outline
 Memory Hierarchy
 Cache
 Cache performance
Cache memory
 If the active portions of the program and data
are placed in a fast small memory, the
average memory access time can be reduced,
 Thus reducing the total execution time of the
program
 Such a fast small memory is referred to as
cache memory
 The cache is the fastest component in the
memory hierarchy and approaches the speed
of CPU component
Cache memory
 When CPU needs to access memory,
the cache is examined
 If the word is found in the cache, it is
read from the fast memory
 If the word addressed by the CPU is not
found in the cache, the main memory is
accessed to read the word
Cache memory
 When the CPU refers to memory and finds
the word in cache, it is said to produce a hit
 Otherwise, it is a miss

 The performance of cache memory is


frequently measured in terms of a quantity
called hit ratio
 Hit ratio = hit / (hit+miss)
Cache memory
 The basic characteristic of cache memory is its fast
access time,
 Therefore, very little or no time must be wasted
when searching the words in the cache
 The transformation of data from main memory to
cache memory is referred to as a mapping process,
there are three types of mapping:
 Associative mapping
 Direct mapping
 Set-associative mapping
Cache memory
 To help understand the mapping
procedure, we have the following
example:
Associative mapping
 The fastest and most flexible cache organization uses
an associative memory
 The associative memory stores both the address and
data of the memory word
 This permits any location in cache to store ant word
from main memory

 The address value of 15 bits is shown as a five-digit


octal number and its corresponding 12-bit word is
shown as a four-digit octal number
Associative mapping
Associative mapping
 A CPU address of 15 bits is places in the
argument register and the associative
memory us searched for a matching address
 If the address is found, the corresponding 12-
bits data is read and sent to the CPU
 If not, the main memory is accessed for the
word
 If the cache is full, an address-data pair must
be displaced to make room for a pair that is
needed and not presently in the cache
Direct Mapping
 Associative memory is expensive
compared to RAM
 In general case, there are 2^k words in
cache memory and 2^n words in main
memory (in our case, k=9, n=15)
 The n bit memory address is divided
into two fields: k-bits for the index and
n-k bits for the tag field
Direct Mapping
Direct Mapping
Set-Associative Mapping
 The disadvantage of direct mapping is that
two words with the same index in their
address but with different tag values cannot
reside in cache memory at the same time

 Set-Associative Mapping is an improvement


over the direct-mapping in that each word of
cache can store two or more word of memory
under the same index address
Set-Associative Mapping
Set-Associative Mapping
 In the slide, each index address refers
to two data words and their associated
tags
 Each tag requires six bits and each data
word has 12 bits, so the word length is
2*(6+12) = 36 bits
Outline
 Memory Hierarchy
 Cache
 Cache performance
Cache performance
 Although a single cache could try to supply
instruction and data, it can be a bottleneck.

 For example: when a load or store instruction is


executed, the pipelined processor will simultaneously
request both data AND instruction

 Hence, a single cache would present a structural


hazard for loads and stores, leading to a stall
Cache performance
 One simple way to conquer this
problem is to divide it:

 One cache is dedicated to instructions


and another to data.

 Separate caches are found in most


recent processors.
Average memory access time
 Average memory access time =
% instructions * (Hit_time + instruction miss rate*miss_penality)
+
% data * (Hit_time + data miss rate*miss_penality)
Average memory access time
 Assume 40% of the instructions are
data accessing instruction.
 Let a hit take 1 clock cycle and the miss
penalty is 100 clock cycle
 Assume instruction miss rate is 4% and
data access miss rate is 12%, what is
the average memory access time?
Average memory access time
60% * (1 + 4% * 100) +
40% * (1 + 12% * 100)

= 0.6 * (5) + 0.4 * (13)


= 8.2 (clock cycle)
● Peripheral devices
● I/O interface
● Modes of Transfer

Contents
● Direct Memory Access
● Strobe Control
● Handshaking
● Priority Interrupt
Input-Output Organization ● Input-Output Processor
● Serial Communication
● I/O Controllers

2
I/O Interface

3
Need for I/O Interface
● Input–output interface provides a method for transferring information between internal storage and external
I/O devices.
● Peripherals connected to a computer need special communication links for interfacing them with the CPU
and each peripheral.
● The purpose of the communication link is to resolve the differences that exist between the central computer
and each peripheral.
● The major differences are:
1. Peripherals are electromechanical and electromagnetic devices and their manner of operation is
different from the operation of the CPU and memory, which are electronic devices. Therefore, a
conversion of signal values may be required.
2. The data transfer rate of peripherals is usually slower than the transfer rate of the CPU, and
consequently, a synchronization mechanism may be needed.
3. Data codes and formats in peripherals differ from the word format in the CPU and memory.
4. The operating modes of peripherals are different from each other and each must be controlled so as
not to disturb the operation of other peripherals connected to the CPU.

4
I/O Interface
● To resolve these differences, computer systems include special hardware components between the CPU and
peripherals to supervise and synchronize all input and output transfers.
● These components are called interface units because they interface between the processor bus and the
peripheral device.
● To attach an input–output device to CPU and input–output interface, circuit is placed between the device and
the system bus.
● The main function of input–output interface circuit are data conversion, synchronization and device
selection.

5
I/O Bus and Interface Modules

● The I/O bus consists of data lines, address lines, and control lines.
● Each peripheral device has associated with it an interface unit.
● Each interface decodes the address and control received from the I/O bus, interprets them for the peripheral,
and provides signals for the peripheral controller.
● It also synchronizes the data flow and supervises the transfer between peripheral and processor.

6
I/O Bus and Interface Modules

● Each peripheral has its own controller that operates the particular electromechanical device.
● To communicate with a particular device, the processor places a device address on the address lines.
● Each interface attached to the I/O bus contains an address decoder that monitors the address lines.
● When the interface detects its own address, it activates the path between the bus lines and the device that it
controls.
● All other peripherals are disabled by their interface.
7
I/O Command
● At the same time that the address is made available in the address lines, the processor provides a function
code in the control lines.
● The function code (I/O command) is an instruction that is executed in the interface and its attached
peripheral unit.
● There are four types of commands that an interface may receive.
1. A control command is issued to activate the peripheral and to inform it what to do.
2. A status command is used to test various status conditions in the interface and the peripheral.
3. A data output command causes the interface to respond by transferring data from the bus into one of
its registers.
4. The data input command allows the interface receives an item of data from the peripheral and places it
in its buffer register.
● The processor checks if data are available by means of a status command and then issues a data input
command.
● The interface places the data on the data lines, where they are accepted by the processor.

8
I/O versus Memory Bus
● In addition to communicating with I/O, the processor must communicate with the memory unit.
● Like the I/O bus, the memory bus contains data, address, and read/write control lines.
● There are three ways that computer buses can be used to communicate with memory and I/O:
1. Use two separate buses, one for memory and the other for I/O. (IOP)
2. Use one common bus for both memory and I/O but have separate control lines for each.(Isolated I/O)
3. Use one common bus for memory and I/O with common control lines.(Memory Mapped I/O)

9
Isolated I/O
● Isolated I/O uses one common bus to transfer information between memory or I/O and the CPU.
● The distinction between a memory transfer and I/O transfer is made through separate read and write lines.
● The I/O read and I/O write control lines are enabled during an I/O transfer. The memory read and memory
write control lines are enabled during a memory transfer.
● This configuration isolates all I/O interface addresses from the addresses assigned to memory and is referred
to as the isolated I/O method for assigning addresses in a common bus.
● The isolated I/O method isolates memory and I/O addresses so that memory address values are not affected
by interface address assignment since each has its own address space

10
Memory Mapped I/O
● Memory mapped I/O use the same address space for both memory and I/O.
● It employ only one set of read and write signals and do not distinguish between memory and I/O addresses.
● The computer treats an interface register as being part of the memory system.
● The assigned addresses for interface registers cannot be used for memory words, which reduces the memory
address range available.
● There are no specific input or output instructions.
● The CPU can manipulate I/O data residing in interface registers with the same instructions that are used to
manipulate memory words.
● The advantage is that the load and store instructions used for reading and writing from memory can be used
to input and output data from I/O registers.

11
Modes Of Transfer

12
Modes Of Transfer
● Data transfer between the central computer and I/O devices may be handled in a variety of modes.
● Data transfer to and from peripherals may be handled in one of three possible modes:
1. Programmed I/O
2. Interrupt-initiated I/O
3. Direct memory access (DMA)

13
Programmed I/O
● Programmed I/O operations are the result of I/O instructions
written in the computer program.
● Each data item transfer is initiated by an instruction in the program.
● Transferring data under program control requires constant
monitoring of the peripheral by the CPU.
● Once a data transfer is initiated, the CPU is required to monitor the
interface to see when a transfer can again be made.

● Disadvantage : the CPU stays in a program loop until the I/O unit
indicates that it is ready for data transfer. This is a time-consuming
process since it keeps the processor busy needlessly.

14
Interrupt-Initiated I/O
● An alternative to the CPU constantly monitoring the flag is to let the interface inform the computer when it is
ready to transfer data.
● This mode of transfer uses the interrupt facility.
● While the CPU is running a program, it does not check the flag.
● However, when the flag is set, the computer is momentarily interrupted from proceeding with the current
program and is informed of the fact that the flag has been set.
● The CPU deviates from what it is doing to take care of the input or output transfer.
● After the transfer is completed, the computer returns to the previous program to continue what it was doing
before the interrupt.
● The CPU responds to the interrupt signal by storing the return address from the program counter into a
memory stack and then control branches to a service routine that processes the required I/O transfer.
● There are two methods for getting the branch address.
● One is called vectored interrupt and the other, non vectored interrupt.
● In a nonvectored interrupt, the branch address is assigned to a fixed location in memory.
● In a vectored interrupt, the interrupt source supplies the branch information to the computer. This
information is called the interrupt vector.
15
Direct Memory Access

16
Direct Memory Access
● In DMA, the CPU is idle and the peripheral device manage the memory buses directly.
● DMA controller takes over the buses to manage the transfer directly between the I/O device and memory.
● The DMA transfer can be made in several ways.
● Burst Transfer - A block of memory words is transferred in a continuous burst at a time. Used for fast devices
such as magnetic disks, where data transmission cannot be stopped or slowed down until an entire block is
transferred.
● Cycle Stealing - Allows the DMA controller to transfer one data word at a time, after which it must return
control of the buses to the CPU. The CPU merely delays its operation for one memory cycle to allow the
direct memory I/O transfer to “steal” one memory cycle.

17
Working of DMA
● The Bus Request (BR) input is used by the DMA controller to request the CPU to relinquish control of the
buses.
● When this input is active, the CPU terminates the execution of the current instruction and places the address
bus, the data bus, and the read and write lines into a high-impedance state.(like an open circuit)
● The CPU activates the Bus Grant (BG) output to inform the external DMA that the buses are in the
high-impedance state.(available)
● The DMA takes control of the buses to conduct memory transfers without processor intervention.
● When the DMA terminates the transfer, it disables the bus request line and the CPU disables the bus grant,
takes control of the buses, and returns to its normal operation.

18
DMA Controller
● The registers in the DMA are selected by the CPU
through the address bus by enabling the DMA Select
(DS) and Register Select (RS) inputs.
● When the BG (bus grant) = 0, the CPU can
communicate with the DMA registers.
● When BG = 1 DMA can communicate directly with the
memory.
● The DMA controller has three registers.
● The Address register contains an address to specify the
desired location in memory.
● The Word Count Register holds the number of words
to be transferred.
● The Control Register specifies the mode of transfer.
● The DMA communicates with the external peripheral
through the request and acknowledge lines.

19
DMA Transfer
➜ I/O Device sends a DMA request.
➜ DMA Controller activates the BR line.
➜ CPU responds with BG line.
➜ The DMA puts the current value of its address
register into the address bus, initiates the RD
or WR signal, and sends a DMA acknowledge
to the I/O device.
➜ I/O device puts a word in the data bus (for
write) or receives a word from the data bus
(for read).
➜ The peripheral unit can then communicate
with memory through the data bus for direct
transfer between the two units while the CPU
is momentarily disabled.

20
DMA Transfer
➜ For each word that is transferred, the DMA
increments its address register and
decrements its word count register.
➜ If the word count does not reach zero, the
DMA checks the request line coming from the
I/O Device.
➜ If there is no request ,the DMA disables BR so
that the CPU continues to execute its own
program.
➜ When CPU requests another transfer, DMA
requests bus again.
➜ If the word count register reaches zero, the
DMA stops any further transfer and removes
its bus request. It also informs the CPU of the
termination by an interrupt.

21
Data Transfer

22
Synchronous v/s Asynchronous Data Transfer

Synchronous Data Transfer


● Internal operations are synchronized by means of clock pulses.
● All data transfers among internal registers occur simultaneously during the occurrence of a clock pulse.
● The registers in the interface share a common clock with the CPU registers.

Asynchronous Data Transfer


● It requires that control signals be transmitted between the communicating units to indicate the time at which
data is being transmitted.
● Internal timing of each unit (interface and CPU) is independent.
● Each unit uses its own private clock for internal registers.

23
Strobe Control and Handshaking
● Asynchronous data transfer between two independent units requires that control signals be transmitted
between the communicating units to indicate the time at which data is being transmitted.
● One way of achieving this is by means of a strobe pulse supplied by one of the units to indicate to the other
unit when the transfer has to occur.
● Another method commonly used is to accompany each data item being transferred with a control signal that
indicates the presence of data in the bus.
● The unit receiving the data item responds with another control signal to acknowledge receipt of the data.
● This type of agreement between two independent units is referred to as handshaking.
● The strobe pulse method and the handshaking method of asynchronous data transfer are not restricted to
I/O transfers.

24
Strobe Control
● It employs a single control line to time each transfer.
● The strobe may be activated by either the source or the destination unit.
● The data bus carries the binary information from source to the destination.
● The strobe is a single line that informs the destination unit when a valid
data word is available in the bus.
● The source unit first places the data on the data bus.
● After a brief delay to ensure that the data settle to a steady value, the
source activates the strobe pulse.
● The information on the data bus and the strobe signal remain in the active state for a sufficient time period
to allow the destination unit to receive the data.
● The source removes the data from the bus a brief period after it disables its strobe pulse.
● The destination unit activates the strobe pulse, informing the source to provide the data.
● The source unit responds by placing the requested binary information on the data bus.
● The data must be valid and remain in the bus long enough for the destination unit to accept it.
● The destination unit then disables the strobe.
● The source removes the data from the bus after a predetermined time interval.
25
Hand Shaking
The disadvantage of the strobe method
● The source unit that initiates the transfer has no way of knowing whether the destination unit has actually
received the data item that was placed in the bus.
● A destination unit that initiates the transfer has no way of knowing whether the source unit has actually
placed the data on the bus.
Two-wire control
● The handshake method introduced a second control signal that provides a reply to the unit that initiates the
transfer.
● One control line is in the same direction as the data flow in the bus from the source to the destination.
● It is used by the source unit to inform the destination unit whether there are valid data in the bus.
● The other control line is in the other direction from the destination to the source.
● It is used by the destination unit to inform the source whether it can accept data.
● The sequence of control during the transfer depends on the unit that initiates the transfer.
● This provides a high degree of flexibility and reliability because the successful completion of a data transfer
relies on active participation by both units.
● If one unit is faulty, the data transfer will not be completed.(Detected by a time-out mechanism)
26
Source-initiated transfer using strobe control.

● The two handshaking lines are data valid(source unit) and data accepted (destination unit).
● The source unit initiates the transfer by placing the data on the bus and enabling its data valid signal.
● The data accepted signal is activated by the destination unit after it accepts the data from the bus.
● The source unit then disables its data valid signal, which invalidates the data on the bus.
● The destination unit then disables its data accepted signal and the system goes into its initial state.
● The source does not send the next data item until after the destination unit shows its readiness to accept
new data by disabling its data accepted signal.
● This scheme allows delays from one state to the next and permits each unit to respond at its own data
transfer rate.
27
Destination-initiated transfer using strobe control.

● The source unit in this case does not place data on the bus until after it receives the ready for data signal
from the destination unit.
● From there on, the handshaking procedure follows the same pattern as in the source-initiated case.
● The sequence of events in both cases would be identical if we consider the ready for data signal as the
complement of data accepted.
● The only difference between the source-initiated and the destination-initiated transfer is in their choice of
initial state.

28
Priority Interrupt

29
Priority Interrupt
● In a typical application a number of I/O devices are attached to the computer, with each device being able to
originate an interrupt request.
● The first task of the interrupt system is to identify the source of the interrupt.
● Several sources may request service simultaneously. In this case the system must also decide which device to
service first.
● A priority interrupt is a system that establishes a priority over the various sources to determine which
condition is to be serviced first when two or more requests arrive simultaneously.
● The system may also determine which conditions are permitted to interrupt the computer while another
interrupt is being serviced.
● Devices with high speed transfers such as magnetic disks are given high priority, and slow devices such as
keyboards receive low priority.
● When two devices interrupt the computer at the same time, the computer services the device, with the
higher priority first.
● Methods:
1. Software - Polling
2. Hardware - Daisy chaining, Parallel Priority.
30
Polling
● A polling procedure is used to identify the highest-priority source by software means.
● In this method there is one common branch address for all interrupts.
● The program that takes care of interrupts begins at the branch address and polls the interrupt sources in
sequence.
● The order in which they are tested determines the priority of each interrupt.
● The highest priority source is tested first, and if its interrupt signal is on, control branches to a service routine
for this source.
● Otherwise, the next-lower-priority source is tested, and so on.
● Disadvantage: If there are many interrupts, the time required to poll them can exceed the time available to
service the I/O device. In this situation a hardware priority-interrupt unit can be used to speed up the
operation.
● A hardware priority-interrupt unit accepts interrupt requests from many sources, determines which of the
incoming requests has the highest priority, and issues an interrupt request to the computer based on this
determination.
● To speed up the operation, each interrupt source has its own interrupt vector to access its own service
routine directly. Thus no polling is required.
31
Daisy Chaining
● The hardware priority function can be established by either a serial or a parallel connection of interrupt lines.
● The serial connection is also known as the daisy chaining method.
● It consists of a serial connection of all devices that request an interrupt.
● The device with the highest priority is placed in the first position, followed by lower-priority devices up to the
device with the lowest priority, which is placed last in the chain.

32
Daisy Chaining

● The interrupt request line is common to all devices and forms a wired logic connection.
● If any device has its interrupt signal in the low-level state, the interrupt line goes to the low-level state and
enables the interrupt input in the CPU.
● When no interrupts arc pending, the interrupt line stays in the high-level state and no interrupts are
recognized by the CPU.
● The CPU responds to an interrupt request by enabling the interrupt acknowledge line.
● This signal is received by device 1 at its PI (priority in) input.
● The acknowledge signal passes on to the next device through the PO (priority out) output only if device 1 is
not requesting an interrupt.
33
Daisy Chaining

● If device 1 has a pending interrupt, it blocks the acknowledge signal from the next device by placing a 0 in the
PO output.
● It then proceeds to insert its own interrupt vector address (VAD) into the data bus for the CPU to use during
the interrupt cycle.
● A device with a 0 in its PI input generates a 0 in its PO output to inform the next-lower-priority device that
the acknowledge signal has been blocked.
● A device that is requesting an interrupt and has a 1 in its PI input will intercept the acknowledge signal by
placing a 0 in its PO output.
● If the device does not have pending interrupts, it transmits the acknowledge signal to the next device.
34
Input Output Processor (IOP)

35
Input Output Processor
● The computer system with IOP can be divided into 3 separate modules:
1. The Memory unit
2. The CPU and
3. one or more IOPS.
● The IOP can fetch and execute its own instructions.
● IOP instructions are specifically designed to facilitate I/O transfers.
● The IOP can also perform other processing tasks, such as arithmetic, logic, branching, and code translation.

36
CPU-IOP Communication
● Instructions that are read from memory by an IOP are
sometimes called commands, to distinguish them from
instructions that are read by the CPU.
● The CPU sends an instruction to test the IOP path.
● The IOP responds by inserting a status word in memory for
the CPU to check.
● The bits of the status word indicate the condition of the IOP
and I/O device (IOP overload condition, device busy with
another transfer, or device ready for I/O transfer).
● The CPU refers to the status word in memory to decide
what to do next.
● If all is in order, the CPU sends the instruction to start I/O
transfer.
● The memory address received with this instruction tells the
IOP where to find program.

37
CPU-IOP Communication
● The CPU can now continue with another program while the
IOP is busy with the I/O program.
● Both programs refer to memory by means of DMA transfer.
● When the IOP terminates the execution of its program, it
sends an interrupt request to the CPU.
● The CPU responds to the interrupt by issuing an instruction
to read the status from the IOP.
● The IOP responds by placing the contents of its status
report into a specified memory location.
● The status word indicates whether the transfer has been
completed or if any errors occurred during the transfer.
● From inspection of the bits in the status word, the CPU
determines if the I/O operation was completed
satisfactorily without errors.

38
www.teachics.org
The Computer Science Learning Platform.

You might also like