0% found this document useful (0 votes)
9 views63 pages

COM 221 - Computer Organisation

Uploaded by

Daniel Lumasia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views63 pages

COM 221 - Computer Organisation

Uploaded by

Daniel Lumasia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 63

Lecture 1: Introduction

Course outline:
- Components of a modern computer: Introduction to computer System and its submodules
- Number System and Representation of information.
- Brief History of Comp. Evolution
- ALU; hardware implementation and operation.
- Memory organization: Types of memory technology and systems; caches. Registers
- Central Processing Unit (CPU) organisation: Basics; data representation; instruction sets; Control
unit
- I/O and system control; Polling, interrupts and DMA principles.

Reference Books:
- William stalling; Computer organization and architecture; Designing for performance
- David A. Patterson and John L. Hennessy: Computer Organization and Design –The Hardware /
Software Interface, 4th Edition
- Andrew S. Tanenbaum, Structured Computer Organization, 4th edition, Prentice Hall

Introduction
Representation of Basic Information
The basic functional units of computer are made of electronics circuit and it works with electrical
signal. The input is provided to the computer in form of electrical signal and get the output in
form of electrical signal.
There are two basic types of electrical signals
- Analog signals are continuous in nature
- Digital signals are discrete in nature.
The electronic device that works with continuous signals is known as analog device and the
electronic device that works with discrete signals is known as digital device.

Computer is a digital device, which works on two levels of signal.


- The High-level signal basically corresponds to some high-level signal (say 5 Volt )
- Low-level signal basically corresponds to Low-level signal (say 0 Volt)
Which in binary system High represent 1 and Low represent 0. With the symbol 0 and 1, we
have a mathematical system, which is knows as binary number system.
Binary number system is used to represent the information and manipulation of information in
computer. This information is basically strings of 0s and 1s.
The smallest unit of information that is represented in computer is known as Bit (Binary Digit ),
which is either 0 or 1. Four bits together is known as Nibble, and Eight bits together is known as
Byte.

Computer Organization and Architecture


Computer technology has made incredible improvement in the past half century. In the early part
of computer evolution, there were no stored-program computer, the computational power was
less and on the top of it the size of the computer was a very huge one.
This rapid rate of improvement has come both from advances in the technology used to build
computers and from innovation in computer design.
The task that the computer designer handles is a complex one: Determine what attributes are
important for a new machine, then design a machine to maximize performance while staying
within cost constraints. This task has many aspects, including instruction set design, functional
organization, logic design, and implementation.
While looking for the task for computer design, both the terms computer organization and
computer architecture come into picture.

Computer architecture refers to those parameters of a computer system that are visible to a
programmer or those parameters that have a direct impact on the logical execution of a program.
Examples of architectural attributes include the instruction set, the number of bits used to
represent different data types, I/O mechanisms, and techniques for addressing memory.

Computer organization refers to the operational units and their interconnections that realize the
architectural specifications. Examples of organizational attributes include those hardware details
transparent to the programmer, such as control signals, interfaces between the computer and
peripherals, and the memory technology used.

Basic Computer Model and different units of Computer


The model of a computer can be described by four basic units in high level abstraction. These
basic units are:
- Central Processor Unit
- Input Unit
- Output Unit
- Memory Unit

A. Central Processor Unit [CPU] :


Central processor unit consists of two basic blocks :
- The program control unit has a set of registers and control circuit to generate control
signals.
- The execution unit or data processing unit contains a set of registers for storing data and
an Arithmetic and Logic Unit (ALU) for execution of arithmetic and logical operations.
In addition, CPU may have some additional registers for temporary storage of data.

B. Input Unit :
With the help of input unit data from outside can be supplied to the computer. Program or data is
read into main storage from input device or secondary storage under the control of CPU input
instruction.
Example of input devices: Keyboard, Mouse, Hard disk, Floppy disk, CD-ROM drive etc.
C. Output Unit :
With the help of output unit computer results can be provided to the user or it can be stored in
storage device permanently for future use. Output data from main storage go to output device
under the control of CPU output instructions.
Example of output devices: Printer, Monitor, Plotter, Hard Disk, Floppy Disk etc.

D. Memory Unit :
Memory unit is used to store the data and program. CPU can work with the information stored in
memory unit. This memory unit is termed as primary memory or main memory module. These
are basically semiconductor memories.
There are two types of semiconductor memories –
- Volatile Memory : RAM (Random Access Memory).
- Non-Volatile Memory : ROM (Read only Memory), PROM (Programmable ROM)
EPROM (Erasable PROM), EEPROM (Electrically Erasable PROM).

Secondary Memory :
There is another kind of storage device, apart from primary or main memory, which is known as
secondary memory.
Secondary memories are non volatile memory and it is used for permanent storage of data and
program.
Example of secondary memories:
Hard Disk, Floppy Disk, Magenetic Tape ------ These are magnetic devices,
CD-ROM ------ is optical device
Thumb drive (or pen drive) ------ is semiconductor memory.

Buses
The processor, main memory, and I/O devices can be interconnected through common data
communication lines which are termed as common bus.
The primary function of a common bus is to provide a communication path between the
devices for the transfer of data.
The bus includes the control lines needed to support interrupts and arbitration.
The bus lines used for transferring data may be grouped into three categories:
- data
- address
- control lines.
The bus control signal also carry timing information to specify the times at which the
processor and the I/O devices may place data on the bus or receive data from the bus.
There are several schemes exist for handling the timing of data transfer over a bus.
These can be broadly classified as
I) Synchronous Bus :
In a synchronous bus, all the devices are synchronised by a common clock, so all
devices derive timing information from a common clock line of the bus. A clock pulse
on this common clock line defines equal time intervals.
In the simplest form of a synchronous bus, each of these clock pulse constitutes a bus
cycle during which one data transfer can take place.
II) Asynchronous Bus
In asynchronous mode of transfer, a handshake signal is used between master and slave.
In asynchronous bus, there is no common clock, and the common clock signal is
replaced by two timing control signals: master-ready and slave-ready.
Master-ready signal is assured by the master to indicate that it is ready for a transaction,
and slave ready signal is a response from the slave.

The handshaking protocol proceeds as follows:


- The master places the address and command information on the bus. Then it
indicates to all devices that it has done so by activating the master-ready signal.
- This causes all devices on the bus to decode the address.
- The selected target device performs the required operation and inform the
processor (or master) by activating the slave-ready line.
- The master waits for slave-ready to become asserted before it remove its signals
from the bus.
- In case of a read operation, it also strobes the data into its input buffer.

Stored Program
The present day digital computers are based on stored-program concept introduced by Von
Neumann. In this stored-program concept, programs and data are stored in separate storage unit
called memories.
Central Processing Unit, the main component of computer can work with the information stored
in storage unit only.
In 1946, Von Neumann and his colleagues began the design of a stored-program computer at the
Institute for Advanced Studies in Princeton.This computer is referred as the IAS computer.
- Assignment: What is the structure of IAS computer

NUMBER SYSTEM AND REPRESENTATION


Binary Number System
Computer can handle two type of signals, to represent any information in computer, we have to
take help of these two signals. These two signals corresponds to two levels of electrical signals,
and symbolically we represent them as 0 and 1.
In our day to day activities for arithmetic, we use the Decimal Number System. The decimal
number system is said to be of base, or radix 10, because it uses ten digits and the coefficients
are multiplied by power of 10.
A decimal number such as 5273 represents a quantity equal to 5 thousands plus 2 hundreds, plus
7 tens, plus 3 units.
The thousands, hundreds, etc. are powers of 10 implied by the position of the coefficients. To be
more precise, 5273 should be written as:

In decimal number system, we need 10 different symbols. But in computer we have provision to
represent only two symbols. So directly we cannot use decimal number system in computer
arithmetic.
For computer arithmetic, binary number system is used. The binary number system uses two
symbols to represent the number and these two symbols are 0 and 1.
The binary number system is said to be of base 2 or radix 2, because it uses two digits and the
coefficients are multiplied by power of 2.
The binary number 110011 represents the quantity equal to:
(in decimal)
We can use binary number system for computer arithmetic

Representation of Unsigned Integers


Any integer can be stored in computer in binary form. As for example:
The binary equivalent of integer 107 is 1101011, so 1101011 are stored to represent 107.

What is the size of Integer that can be stored in a Computer?


It depends on the word size of the Computer. If we are working with 8-bit computer, then we can
use only 8 bits to represent the number. The eight bit computer means the storage organization
for data is 8 bits.
In case of 8-bit numbers, the minimum number that can be stored in computer is 00000000 (0)
and maximum number is 11111111 (255).
The domain of number is restricted by the storage capacity of the computer.
In general, for n-bit number, the range for natural number is from

e.g addition:
Any arithmetic operation can be performed with the help of binary number system. Consider the
following two examples, where decimal and binary additions are shown side by side.
1101000 104 10000001 129
110001 49 10101010 178
--------------- ------ ----------------- ------
10011001 153 100101011 307

In a) the result is an 8-bit number, as it can be stored in the 8-bit computer, so we get the correct
results.
In b) the result is a 9-bit number, but we can store only 8 bits, and the most significant bit (msb)
cannot be stored. The result of this addition will be stored as (00101011) which is 43 and it is not
the desired result. This is known as the overflow case.

For our convenient, while writing in paper, we may take help of other number systems like octal
and hexadecimal. It reduce the burden of writing long strings of 0s and 1s.

Octal Number : The octal number system is said to be of base, or radix 8, because it uses 8
digits and the coefficients are multiplied by power of 8.
Eight digits used in octal system are: 0, 1, 2, 3, 4, 5, 6 and 7.

Hexadecimal number : The hexadecimal number system is said to be of base, or radix 16,
because it uses 16 symbols and the coefficients are multiplied by power of 16.
Sixteen digits used in hexadecimal system are: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E and F.
Consider the following addition example:

Signed Integer
We know that for n-bit number, the range for natural number is from .
For n-bit, we have all together different combination, and we use these different combination to
represent numbers, which ranges from .
If we want to include the negative number, naturally, the range will decrease. Half of the
combinations are used for positive number and other half is used for negative number.
For n-bit representation, the range is from .
For example, if we consider 8-bit number, then range for natural number is from 0 to 255; but for
signed integer the range is from -127 to +127.

Representation of signed integer


There are three different schemes to represent negative number:
- Signed-Magnitude form.
- 1’s complement form.
- 2’s complement form.

a) Signed magnitude form:


In signed-magnitude form, one particular bit is used to indicate the sign of the number, whether it
is a positive number or a negative number. Other bits are used to represent the magnitude of the
number.
For an n-bit number, one bit is used to indicate the signed information and remaining (n-1) bits
are used to represent the magnitude. Therefore, the range is from .
Generally, Most Significant Bit (MSB) is used to indicate the sign and it is termed as signed bit.
- 0 in signed bit indicates positive number
- 1 in signed bit indicates negative number.

For example, 01011001 represents + 169 and


11011001 represents - 169

What is 00000000 and 10000000 in signed magnitude form?

b) 1's complement form:


Consider the eight bit number 01011100, 1's complements of this number is 10100011. If we
perform the following addition:
If we add 1 to the number, the result is 100000000.
01011100
10100011
----------------------------
11111111
Since we are considering an eight bit number, so the 9th bit (MSB) of the result cannot be stored.
Therefore, the final result is 00000000.
Since the addition of two number is 0, so one can be treated as the negative of the other number.
So, 1's complement can be used to represent negative number.

c) 2's complement form:


Consider the eight bit number 01011100, 2's complements of this number is 10100100. If we
perform the following addition:

01011100
10100011
--------------------------------
100000000
Since we are considering an eight bit number, so the 9th bit (MSB) of the result cannot be stored.
Therefore, the final result is 00000000.
Since the addition of two number is 0, so one can be treated as the negative of the other number.
So, 2's complement can be used to represent negative number.

Representation of Real Number

Binary representation of 41.6875 is 101001.1011. Therefore any real number can be converted
to binary number system. There are two schemes to represent real number :
Fixed-point representation
Floating-point representation
a) Fixed-point representation:
Binary representation of 41.6875 is 101001.1011
To store this number, we have to store two information,
-- the part before decimal point and
-- the part after decimal point.
This is known as fixed-point representation where the position of decimal point is fixed and
number of bits before and after decimal point are also predefined.
If we use 16 bits before decimal point and 8 bits after decimal point, in signed magnitude form,
the range is
One bit is required for sign information, so the total size of the number is 25 bits that is
(1(sign) + 16(before decimal point) + 8(after decimal point) ).

b) Floating-point representation:
In this representation, numbers are represented by a mantissa comprising the significant digits
and an exponent part of Radix R. The format is:

Numbers are often normalized, such that the decimal point is placed to the right of the first
nonzero digit.
For example, the decimal number,
5236 is equivalent to
To store this number in floating point representation, we store 5236 in mantissa part and 3 in
exponent part.

Representation of Character
Since we are working with 0's and 1's only, to represent character in computer we use strings of
0's and 1's only.
To represent character we are using some coding scheme, which is are a mapping function.
Some of standard coding schemes are:.
ASCII : American Standard Code for Information Interchange.
It uses a 7-bit code. All together we have 128 combinations of 7 bits and we can represent 128
character.
As for example 65 = 1000001 represents character 'A'.
EBCDIC : Extended Binary Coded Decimal Interchange Code.
It uses 8-bit code and we can represent 256 character.
UNICODE : It is used to capture most of the languages of the world. It uses 16-bit
Unicode provides a unique number for every character, no matter what the platform, no matter
what the program, no matter what the language. The Unicode Standard has been adopted by such
industry leaders as Apple, HP, IBM, Microsoft, Oracle, SAP, Sun, Sybase, Unisys and many
others.

COMPUTER EVOLUTION AND PERFORMANCE

The Brief History of Computer Architecture

First Generation (1940-1950) :: Vacuum Tube


ENIAC [1945]: Designed by Mauchly & Echert, built by US army to calculate trajectories for ballistic
shells during Worls War II. Around 18000 vacuum tubes and 1500 relays were used to build ENIAC, and
it was programmed by manually setting switches
UNIVAC [1950]: the first commercial computer
John Von Neumann architecture: Goldstine and Von Neumann took the idea of ENIAC and developed
concept of storing a program in the memory. Known as the Von Neumann's architecture and has been the
basis for virtually every machine designed since then.

Features:
- Electron emitting devices
- Data and programs are stored in a single read-write memory
- Memory contents are addressable by location, regardless of the content itself
- Machine language/Assemble language
- Sequential execution

Second Generation (1950-1964) :: Transistors


William Shockley, John Bardeen, and Walter Brattain invent the transistor that reduce size of computers
and improve reliability. Vacuum tubes have been replaced by transistors.
- First operating Systems: handled one program at a time
- On-off switches controlled electronically.
- High level languages
- Floating point arithmetic

Third Generation (1964-1974) :: Integrated Circuits (IC)


- Microprocessor chips combines thousands of transistors, entire circuit on one computer chip.
- Semiconductor memory
- Multiple computer models with different performance characteristics
- The size of computers has been reduced drastically

Fourth Generation (1974-Present) :: Very Large-Scale Integration (VLSI) / Ultra Large Scale
Integration (ULSI)
- Combines millions of transistors
- Single-chip processor and the single-board computer emerged
- Creation of the Personal Computer (PC)
- Use of data communications
- Massively parallel machine
A Brief History of Computer Organization
If computer architecture is a view of the whole design with the important characteristics visible to
programmer, computer organization is how features are implemented with the specific building blocks
visible to designer, such as control signals, interfaces, memory technology, etc.

A stored program computer has the following basic units:


Processor -- center for manipulation and control
Memory -- storage for instructions and data for currently executing programs
I/O system -- controller which communicate with "external" devices: secondary memory, display
devices, networks
Data-path & control -- collection of parallel wires, transmits data, instructions, or control signal

Computer organization defines the ways in which these components are interconnected and controlled. It
is the capabilities and performance characteristics of those principal functional units.

Under a rapidly changing set of forces, computer technology keeps at dramatic change, for example:
Processor clock rate at about 20% increase a year
Logic capacity at about 30% increase a year
Memory speed at about 10% increase a year
Memory capacity at about 60% increase a year
Cost per bit improves about 25% a year
The disk capacity increase at 60% a year.

Computer organization has also made its historic progression accordingly.

The advance of microprocessor ( Intel)


1977: 8080 - the first general purpose microprocessor, 8 bit data path, used in first personal computer
1978: 8086 - much more powerful with 16 bit, 1MB addressable, instruction cache, prefetch few
instructions
1980: 8087 - the floating point coprocessor is added
1982: 80286 - 24 Mbyte addressable memory space, plus instructions
1985: 80386 - 32 bit, new addressing modes and support for multitasking
1989 -- 1995:
- 80486 - 25, 33, MHz, 1.2 M transistors, 5 stage pipeline, sophisticated powerful cache
and instruction pipelining, built in math co-processor.
- Pentium - 60, 66 MHz, 3.1 M transistor, branch predictor, pipelined floating point,
multiple instructions executed in parallel, first superscalar IA-32.
- PentiumPro - Increased superscalar, register renaming, branch prediction, data flow
analysis, and speculative execution
1995 -- 1997: Pentium II - 233, 166, 300 MHz, 7.5 M transistors, first compaction of micro- architecture,
MMX technology, graphics video and audio processing.
1999: Pentium III - additional floating point instructions for 3D graphics
2000: Pentium IV - Further floating point and multimedia enhancements

Evolution of Memory

1970: RAM /DRAM, 4.77 MHz


1987: FPM - fast page mode DRAM, 20 MHz
1995, EDO - Extended Data Output, which increases the read cycle between memory and CPU, 20 MHz
1997- 1998: SDRAM - Synchronous DRAM, which synchronizes itself with the CPU bus and runs at
higher clock speeds, PC66 at 66 MHz, PC100 at 100 MHz
1999: RDRAM - Rambus DRAM, which DRAM with a very high bandwidth, 800 MHz
1999-2000: SDRAM - PC133 at 133 MHz, DDR at 266 MHz.
2001: EDRAM - Enhanced DRAM, which is dynamic or power-refreshed RAM, also know as cached
DRAM.

Major buses and their features


A bus is a parallel circuit that connects the major components of a computer, allowing the transfer of
electric impulses form one connected component to any other.
VESA - Video Electronics Standard Association: 32 bit, relied on the 486 processor to function
ISA - Industry Standard Architecture: 8 bit or 16 bit with width 8 or 16 bits. 8.3 MHz speed, 7.9 or 15.9
bandwidth accordingly.
EISA - Extended Industry Standard Architecture:32 bits, 8.3 MHz, 31.8 bandwidth, the attempt to
compete with IBM's MCA
PCI - Peripheral Component Interconnect: 32 bits, 33 MHz, 127.2 bandwidth
PCI-X - Up to 133 MHz bus speed, 64 bits bandwidth, 1GB/sec throughput
AGP - Accelerated Graphics Port: 32 bits, 66 MHz, 254,3 bandwidth

Major ports and connectors/interface


IDE - Integrated Drive Electronics, also know as ATA, EIDE, Ultra ATA, Ultra DMA, most widely used
interface for hard disks
PS/2 port - mini Din plug with 6 pins for a mouse and keyboard
SCSI - Small Computer System Interface, 80 - 640 Mbs, capable of handling internal/external peripherals
Serial Port - adheres to RS-232c spec, uses DB9 or DB25 connector, capable of 115kb.sec speeds
Parallel port - as know as printer port, enhanced types: ECP- extended capabilities port, EPP - enhanced
parallel port
USB - universal serial bus, two types: 1.0 and 2.0, hot plug-and-play, at 12MB/s, up to 127 devices
chain. 2.0 data rate is at 480 bits/s.
Firewire - high speed serial port, 400 MB/s, hot plug-and-play, 30 times faster than USB 1.0
LECTURE 2: ALU (ARITHMETIC LOGIC UNIT)
ALU is responsible to perform the operation in the computer.
The basic operations are implemented in hardware level. ALU is having collection of two types of
operations:
- Arithmetic operations
- Logical operations
ALU consists of hardware implementations for basic operations. These basic operations can be used to
implement some complicated operations which are not feasible to implement directly in hardware.
There are several logic gates exists in digital logic circuit. These logic gates can be used to implement the
logical operation. Some of the common logic gates are:
a) AND gate: The output is high if both the inputs are high.

b) OR gate: The output is high if any one of the input is high.

c) EX-OR gate: The output is high if either of the input is high


If we want to construct a circuit which will perform the AND operation on two 4-bit number, the
implementation of the 4-bit AND operation is:

4-bit AND operator.

Arithmetic Circuit

Binary Adder : Binary adder is used to add two binary numbers.


In general, the adder circuit needs two binary inputs and two binary outputs. The input variables designate
the augends and addend bits; The output variables produce the sum and carry.
The binary addition operation of single bit is shown in the truth table beloww

C: Carry Bit S: Sum Bit


The simplified sum of products expressions are

The circuit implementation is

This circuit cannot handle the carry input, so it is termed as half adder.
Full Adder:
A full adder is a combinational circuit that forms the arithmetic sum of three bits. It consists of three
inputs and two outputs. Two of the input variables, denoted by x and y, represent the two bits to be added.
The third input Z, represents the carry from the previous lower position.
The two outputs are designated by the symbols S for sum and C for carry.

The circuit diagram full adder is shown in the figure. n-such single bit full adder blocks are used to make
n-bit full adder.

To demonstrate the binary addition of four bit numbers, let us consider a specific example.
Consider two binary numbers
A =1 0 0 1 B=0011
To get the four bit adder, we have to use 4 full adder block. The carry output the lower bit is used as a
carry input to the next higher bit.
The circuit of 4-bit adder shown in the figure.

Binary subtractor : The subtraction operation can be implemented with the help of binary adder circuit,
because

We know that 2's complement representation of a number is treated as a negative number of the given
number.
We can get the 2's complements of a given number by complementing each bit and adding 1 to it.
The circuit for subtracting A-B consist of an added with inverter placed between each data input B and the
corresponding input of the full adder. The input carry must be equal to 1 when performing subtraction.
The operation thus performed becomes A, plus the 1's complement of B , plus 1. This is equal to A plus 2's
complement of B. With this principle, a single circuit can be used for both addition and subtraction.

If , then
If then then (A+1's complement of B+1

4-bit adder subtractor

Multiplication
Multiplication of two numbers in binary representation can be performed by a process of SHIFT and
ADD operations. Since the binary number system allows only 0 and 1's, the digit multiplication can be
replaced by SHIFT and ADD operation only, because multiplying by 1 gives the number itself and
multiplying by 0 produces 0 only.
For example.

The process consists of looking at successive bits of the multiplier, least significant bit first. If the
multiplier bit is a 1, the multiplicand is copied down, otherwise, zeros are copied down. The numbers
copied down in successive lines are shifted one position to the left from the previous number. Finally, the
numbers are added and their sum forms the product.

When multiplication is implemented in a digital computer, the process is changed slightly.


Instead of providing registers to store and add simultaneously as many binary numbers as there are bits in
the multiplier, it is convenient to provide an adder for the summation of only two binary numbers and
successively accumulate the partial products in a register. It will reduce the requirements of registers.
Instead of sifting the multiplicand to the left, the partial product is shifted to right.When the
corresponding bit of the multiplier is 0, there is no need to add all zeros to the partial product.
An algorithm to multiply two binary numbers. Consider that the ALU does not provide the multiplication
operation, but it is having the addition operation and shifting operation. Then we can write a micro
program for multiplication operation and provide the micro program code in memory. When a
multiplication operation is encountered, it will execute this micro code to perform the multiplication.
Lecture 3: MEMORIES

The digital computer works on stored programmed concept introduced by Von Neumann.
Memory is used to store the information, which includes both program and data.
Due to several reasons, we have different kind of memories. We use different kind of memory
at different level.
The memory of computer is broadly categories into two categories:
- Internal
- external
Internal memory is used by CPU to perform task and external memory is used to store bulk
information, which includes large software and data.

Memory is used to store the information in digital form. The memory hierarchy is given by:
- Register
- Cache Memory
- Main Memory
- Magnetic Disk
- Removable media (Magnetic tape)

Register:
This is a part of Central Processor Unit, so they reside inside the CPU. The information from
main memory is brought to CPU and keep the information in register. Due to space and cost
constraints, we have got a limited number of registers in a CPU. These are basically faster
devices.

Cache Memory:
Cache memory is a storage device placed in between CPU and main memory. These are
semiconductor memories.
These are basically fast memory device, faster than main memory.
We cannot have a big volume of cache memory due to its higher cost and some constraints of
the CPU. Due to higher cost we cannot replace the whole main memory by faster memory.
Generally, the most recently used information is kept in the cache memory. It is brought from
the main memory and placed in the cache memory.

Main Memory:
Like cache memory, main memory is also semiconductor memory. But the main memory is
relatively slower memory. We have to first bring the information (whether it is data or
program), to main memory. CPU can work with the information available in main memory
only.

Magnetic Disk:
This is bulk storage device. We have to deal with huge amount of data in many application. But
we don't have so much semiconductor memory to keep these information in our computer. On
the other hand, semiconductor memories are volatile in nature. It loses its content once we
switch off the computer. For permanent storage, we use magnetic disk. The storage capacity of
magnetic disk is very high.
Removable media:
For different application, we use different data. It may not be possible to keep all the
information in magnetic disk. So, which ever data we are not using currently, can be kept in
removable media. Magnetic tape is one kind of removable medium. CD is also a removable
media, which is an optical device.

Internal Memories
a) Cache Memory:
A cache is a very fast memory which is put between the main memory and the CPU, and used
to hold segments of program and data of the main memory.

Cache Memory Features


- It is transparent to the programmers.
- It is only a small part of the program/data in the main memory which has its copy in the
cache (e.g., 8KB cache with 8MB memory).
- If the CPU wants to access program/data not in the cache (called a cache miss), the relevant
block of the main memory will be copied into the cache.
- The intermediate-future memory access will usually refer to the same word or words in the
neighborhood, and will not have to involve the main memory. This property of program
executions is denoted as locality of reference.
Locality of Reference

Temporal locality: If an item is referenced, it will tend to be referenced again soon.


Spatial locality: If an item is referenced, items whose addresses are close by will tend to be
referenced soon.
This access pattern is referred as locality of reference principle, which is an intrinsic features of
the von Neumann architecture: It is applied in Sequential instruction storage, Loops and
iterations (e.g., subroutine calls) and Sequential data storage (e.g., array).
Cache Design issues

a) Mapping function

i) Direct Mapping: Each block of the main memory is mapped into a fixed cache slot.e.g A 10,000
word MM and a 100 word cache.10 Memory cells are grouped into a block.

Pros and cons

- Simple to implement and therefore inexpensive since it is Fixed location for blocks.
- If a program accesses 2 blocks that map to the same cache slot repeatedly, cache miss rate
is very high.
ii) Associative Mapping: A main memory block can be loaded into any slot of the cache.
To determine if a block is in the cache, a mechanism is needed to simultaneously examine every
slot’s tag.
iii) Set Associative Organization: The cache is divided into a number of sets (K). Each set
contains a number of slots (W). A given block maps to any slot in a given set. e.g. block i can be
in any slot of set j.

a) Replacement Algorithms

With direct mapping, it is no need.

With associative mapping, a replacement algorithm is needed in order to determine which block
to replace: This can be :
-First-in-first-out (FIFO).
- Least-recently used (LRU) - replace the block that has been in the cache longest with not
reference to it.
-Least-frequently used (LFU) - replace the block that has experienced the fewest references.

- Random.

c) Write Policy

Mechanism use to keep cache content and main memory content consistent without losing too
much performance?

i) Write through:

All write operations are passed to main memory: If the addressed location is currently
in the cache, the cache is updated so that it is coherent with the main memory. For writes,
the processor always slows down to main memory speed.
Since the percentage of writes is small (ca. 15%), this scheme doesn’t lead to large performance
reduction

ii) Write through with buffered write:


The same as write-through, but instead of slowing the processor, it write directly to main
memory, the write address and data are stored in a high-speed write buffer; the write buffer
transfers data to main memory while the processor continues its task. Higher speed, but more
complex hardware.
iii) Write back:
Write operations update only the cache memory which is not kept coherent with main memory.
When the slot is replaced from the cache, its content has to be copied back to memory. Good
performance (usually several writes are performed on a cache block before it is replaced), but
more complex hardware is needed.

-Cache coherence problems are very complex and difficult to solve in multiprocessor systems

Internal memories are semiconductor memory. Semiconductor memories are categorized as


volatile memory and non-volatile memory.

b) Main Memory

The main memory of a computer is semiconductor memory. The main memory unit of
computer is basically consists of two kinds of memory:
RAM : Random access memory; which is volatile in nature.That is as soon as the computer is
switched off, the contents of memory are also lost.

ROM: Read only memory; which is non-volatile. The storage is permanent, but it is read only
memory. We cannot store new information in ROM.
The permanent information are kept in ROM and the user space is basically in RAM.

The maximum size of main memory that can be used in any computer is determined by the
addressing scheme.
A computer that generates 16-bit address is capable of addressing upto 216 which is equal to 64K
memory location. Similarly, for 32 bit addresses, the total capacity will be 232 which is equal to
4G memory location.

In some computer, the smallest addressable unit of information is a memory word and the
machine is called word addressable.

In some computer, individual address is assigned for each byte of information, and it is called
byte-addressable computer. In this computer, one memory word contains one or more memory
bytes which can be addressed individually.
The main memory is usually designed to store and retrieve data in word length quantities. The
word length of a computer is generally defined by the number of bits actually stored or
retrieved in one main memory access.

The data transfer between main memory and the CPU takes place through two CPU registers.
- MAR : Memory Address Register
- MDR : Memory Data Register.
If the MAR is k-bit long, then the total addressable memory location will be 2k.
If the MDR is n-bit long, then the n bit of data is transferred in one memory cycle.

The transfer of data takes place through memory bus, which consist of address bus and data
bus. In the above example, size of data bus is n-bit and size of address bus is k bit.
It also includes control lines like Read, Write and Memory Function Complete (MFC) for
coordinating data transfer. In the case of byte addressable computer, another control line to be
added to indicate the byte transfer instead of the whole word.

For memory operation, the CPU initiates a memory operation by loading the appropriate data
i.e., address to MAR.
If it is a memory read operation, then it sets the read memory control line to 1. Then the
contents of the memory location is brought to MDR and the memory control circuitry indicates
this to the CPU by setting MFC to 1.
If the operation is a memory write operation, then the CPU places the data into MDR and sets
the write memory control line to 1. Once the contents of MDR are stored in specified memory
location, then the memory control circuitry indicates the end of operation by setting MFC to 1.

A useful measure of the speed of memory unit is the time that elapses between the initiation of
an operation and the completion of the operation. This is referred to as Memory Access
Time. Another measure is memory cycle time. This is the minimum time delay between the
initiation of two independent memory operations. Memory cycle time is slightly larger than
memory access time.

Binary Storage Cell:


The binary storage cell is the basic building block of a memory unit. The binary storage cell that
stores one bit of information can be modelled by an SR latch with associated gates. This
model of binary storage cell is as follows:
The binary cell stores one bit of information

The memory constructed with the help of transistors is known as semiconductor memory. The
semiconductor memories are termed as Random Access Memory(RAM), because it is possible
to access any memory location inrandom.
Depending on the technology used to construct a RAM, there are two types of RAM -
SRAM: Static Random Access Memory.
DRAM: Dynamic Random Access Memory.

Dynamic Ram (DRAM):


A DRAM is made with cells that store data as charge on capacitors. The presence or absence of
charge in a capacitor is interpreted as binary 1 or 0.
Because capacitors have a natural tendency to discharge due to leakage current, dynamic RAM
require periodic charge refreshing to maintain data storage. The term dynamic refers to this
tendency of the stored charge to leak away, even with power continuously applied.
A typical DRAM structure for an individual cell that stores one bit information.

For the write operation, a voltage signal is applied to the bit line B, a high voltage represents 1
and a low voltage represents 0. A signal is then applied to the address line, which will turn on
the transistor T, allowing a charge to be transferred to the capacitor.
For the read operation, when a signal is applied to the address line, the transistor T turns on
and the charge stored on the capacitor is fed out onto the bit line B.

Static RAM (SRAM):


In an SRAM, binary values are stored using traditional flip-flop constructed with the help of
transistors. A static RAM will hold its data as long as power is supplied to it.

SRAM Versus DRAM :


- Both static and dynamic RAMs are volatile, that is, it will retain the information as long as
power supply is applied.
- A dynamic memory cell is simpler and smaller than a static memory cell. Thus a DRAM is
more dense, i.e., packing density is high (more cell per unit area). DRAM is less expensive than
corresponding SRAM.
- DRAM requires the supporting refresh circuitry. For larger memories, the fixed cost of the
refresh circuitry is more than compensated for by the less cost of DRAM cells
- SRAM cells are generally faster than the DRAM cells. Therefore, to construct faster memory
modules (like cache memory) SRAM is used.

Internal Organization of Memory Chips


A memory cell is capable of storing 1-bit of information. A number of memory cells are
organized in the form of a matrix to form the memory chip.
Example:

Each row of cells constitutes a memory word, and all cell of a row are connected to a common
line which is referred as word line. An address decoder is used to drive the word line. At a
particular instant, one word line is enabled depending on the address present in the address
bus. The cells in each column are connected by two lines.
These are known as bit lines. These bit lines are connected to data input line and data output
line through a Sense/ Write circuit.
During a Read operation, the Sense/Write circuit sense, or read the information stored in the
cells selected by a word line and transmit this information to the output data line. During a
write operation, the sense/write circuit receive information and store it in the cells of the
selected word.
E.g
128 x 8 memory chips:
If it is organised as a 128 x 8 memory chips, then it has got 128 memory words of size 8 bits. So
the size of data bus is 8 bits and the size of address bus is 7 bits (128=27 ).

1024 x 1 memory chips:


If it is organized as a 1024 x 1 memory chips, then it has got 1024 memory words of size 1 bit
only. Therefore, the size of data bus is 1 bit and the size of address bus is 10 bits ( 1024= 210).

ROM: Read only memories are non volatile in nature. The storage is permanent, but it is read
only memory. We cannot store new information in ROM.
Several types of ROM are available:
PROM: Programmable Read Only Memory; it can be programmed once as per user
requirements.
EPROM: Erasable Programmable Read Only Memory; the contents of the memory can be
erased and store new data into the memory. In this case, we have to erase whole information.
EEPROM: Electrically Erasable Programmable Read Only Memory; in this type of memory the
contents of a particular location can be changed without effecting the contents of other location.
EXTERNAL MEMORY

External memories include the following:


Magnetic disks are the foundation of external memory on virtually all computer systems.
The use of disk arrays to achieve greater performance, known as RAID (Redundant Array
of Independent Disks).
An increasingly important component of many computer systems is external optical
memory.
Magnetic tapes was the first kind of secondary memory. It is still widely used as the
lowest-cost, slowest-speed member of the memory hierarchy.

Magnetic Disk
A disk is a circular platter constructed of metal or of plastic coated with a magnetizable
material.
- Data are recorded on and later retrieved from the disk via a conducting coil named
the head.
- During a read or write operation, the head is stationary while the platter rotates
beneath it.
The write mechanism is based on the fact that electricity flowing through a coil produces a
magnetic field. Pulses are sent to the head, and magnetic patterns are recorded on the surface
below. The pattern depends on the positive or negative currents. The direction of current
depends on the information stored, i.e., positive current depends on the information '1' and
negative current for information '0'.
The read mechanism is based on the fact that a magnetic field moving relative to a coil
produces on electric current in the coil. When the surface of the disk passes under the head, it
generates a current of the same polarity as the one already recorded.

Read/ Write head detail

The head is a relatively small device capable of reading from or writing to a portion of the
platter rotating beneath it.
Data Organization and Formatting

The data on the disk are organized in a concentric set of rings, called track. Each track
has the same width as the head. Adjacent tracks are separated by gaps. This prevents
error due to misalignment of the head or interference of magnetic fields.



- Each track has the same width as the head. There are thousands of tracks per surface. 
- Adjacent tracks are separated by intertrack gaps. This prevents, or at least minimizes,
errors due to misalignment of the head or simply interference of magnetic fields. 
- Data are transferred to and from the disk in sectors. There are typically hundreds of
sectors per track, and these may be of either fixed or variable length. 
- In most contemporary systems, fixed-length sectors are used, with 512 bytes being
the nearly universal sector size. 
- Adjacent sectors are separated by intersector gaps.
- A bit near the center of a rotating disk travels past a fixed point (such as a read–write
head) slower than a bit on the outside. Therefore, some way must be found to
compensate for the variation in speed so that the head can read all the bits at the same
rate. This can be done by increasing the spacing between bits of information recorded
in segments of the disk.
- The information can then be scanned at the same rate by rotating the disk at a fixed
speed, known as the constant angular velocity (CAV).
- The disk is divided into a number of pie-shaped sectors and into a series of concentric
tracks

Physical characteristics of disk

-The major physical characteristics that differentiate among the various types of magnetic
disks are considered.

1.Head Motion
-The head may either be fixed (one per track) or movable (one per surface) with respect to
the radial direction of the platter.
-In a fixed-head disk, there is one read-write head per track. All of the heads are mounted on
a rigid arm that extends across all tracks; such systems are rare today.
-In a movable-head disk, there is only one read-write head. The head is mounted on an arm
and it is positioned on the tracks.

2.Disk Portability
-A non removable disk is permanently mounted in the disk drive. Hard disk is an example
for a non removable disk.
-A removable disk can be removed and replaced with another disk. The advantage is that
unlimited amounts of data are available with a limited number of disk systems.
-Furthermore, such a disk may be moved from one computer system to another. Floppy disks
and ZIP cartridge disks are examples of removable disks.

3.Sides
-For most disks, the magnetizable coating is applied to both sides of the platter, which is then
referred to as double-sided.
-Some less expensive disk systems use single-sided disks.
4.Platters
-Multiple-platter disks employ a movable head, with one read-write head per platter surface. All
of the heads are mechanically fixed so that all are at the same distance from the center of the disk
and move together.
-Thus, at any time, all of the heads are positioned over tracks that are of equal distance from the
center of the disk.
-Single-platter disks employ only a single platter.

5. Head mechanism
-The head mechanism provides a classification of disks into three types.
- Fixed-gap - the read-write head has been positioned a fixed distance above the platter,
allowing an air gap.
- head mechanism that actually comes into physical contact with the medium during a read or
write operation. This mechanism is used with the floppy disk.
-Aerodynamic heads (Winchester) are designed to operate closer to the disk’s surface than
conventional rigid disk heads, thus allowing greater data density. The head is actually an
aerodynamic foil that rests lightly on the platter’s surface when the disk is motionless. The air
pressure generated by a spinning disk is enough to make the foil rise above the surface.
Organization and accessing of data on a disk
The organization of data on a disk is shown in the figure below.

- Each surface is divided into concentric tracks and each track is divided into sectors.
The set of corresponding tracks on all surfaces of a stack of disks form a logical
cylinder. Data bits are stored serially on each track.
- Data on disks are addressed by specifying the surface number, the track number, and
the sector number. In most disk systems, read and write operations always start at
sector boundaries.
- The address of the disk contains track no., sector no., and surface no. If more than
one drive is present, then drive number must also be specified.

For moving head system, there are two components involved in the time delay between
receiving an address and the beginning of the actual data transfer.
i) Seek Time:
Seek time is the time required to move the read/write head to the proper track. This depends
on the initial position of the head relative to the track specified in the address.
ii) Rotational Delay:
Rotational delay, also called the latency time is the amount of time that elapses after the head
is positioned over the correct track until the starting position of the addressed sector comes
under the Read/write head.
iii) Access Time
The sum of the seek time and the rotational delay, which is the time it takes to get into
position to read or write.
-Once the head is in position, the read or write operation is then performed as the sector
moves under the head; this is the data transfer portion of the operation; the time required for
the transfer is the transfer time.
RAID
-The rate in improvement in secondary storage performance has been considerably less than
the rate for processors and main memory. This mismatch has made the disk storage system
perhaps the main focus of concern in improving overall computer system performance.
-This leads to the development of arrays of disks that operate independently and in parallel.
-With multiple disks, separate I/O requests can be handled in parallel, as long as the data
required reside on separate disks.
-Further, a single I/O request can be executed in parallel if the block of data to be accessed is
distributed across multiple disks.
-With the use of multiple disks, there is a wide variety of ways in which the data can be
organized and in which redundancy can be added to improve reliability.
-Industry has agreed on a standardized scheme for multiple-disk database design, known as
RAID (Redundant Array of Independent Disks).
-The RAID scheme consists of seven levels that share three common characteristics:
1. RAID is a set of physical disk drives viewed by operating system as a single logical drive.
2. Data are distributed across physical drives of an array known as stripping.
3. Redundant disk capacity is used to store parity information, which guarantees data
recoverability in case of a disk failure (except RAID level 0).

•The RAID strategy employs multiple disk drives and distributes data in such a way as to
enable simultaneous access to data from multiple drives, thereby improving I/O performance.
•Although allowing multiple heads and actuators to operate simultaneously achieves higher
I/O and transfer rates, the use of multiple devices increases the probability of failure.
•To compensate for this decreased reliability, RAID makes use of stored parity information
that enables the recovery of data lost due to a disk failure.

•RAID 0 (Level 0)
It is not a true member of RAID, because it does not include redundancy to improve
performance.
- The user and system data are striped across all of the disks in the array.
A set of logically consecutive strips that maps exactly one strip to each array member is
referred to as a stripe.
Array management software is used to map logical and physical disk space. This software
may execute either in the disk subsystem or in a host computer.
This has a notable advantage over the use of a single large disk: If two different I/O
requests are pending for two different blocks of data, then there is a good chance that the
requested blocks are on different disks.
Thus, the two requests can be issued in parallel, reducing the I/O queuing time.

RAID 1 (Mirrored)
Redundancy is achieved by the simply duplicating all the data.
Each logical strip is mapped to two separate physical disks so that every disk in the array
has a mirror disk that contains the same data.

There are a number of positive aspects to the RAID 1 organization:


1. A read request can be serviced by either of the two disks that contains the requested data,
whichever one involves the minimum seek time plus rotational latency.
2. A write request requires that both corresponding strips be updated, but this can be done in
parallel. Thus, the write performance is dictated by the slower of the two writes
3. Recovery from a failure is simple. When a drive fails, the data may still be accessed from the
second drive.


The principal disadvantage of RAID 1 is the cost. It requires twice the disk space of the
logical disk that it supports. 
RAID 1 can achieve high I/O request rates if the requests are mostly read operations. In this
situation, the performance of RAID 1 can approach double of that of RAID 0. However, if
the requests require write operations, then there may be no significant performance gain over
RAID 0. 

RAID 2 (Redundancy through Hamming Code)


Make use of a parallel access technique. In a parallel access array, all member disks participate
in the execution of every I/O request.
Typically, the spindles of the individual drives are synchronized so that each disk head is in the
same position on each disk at any given time.
RAID 2 uses very small strips, often as small as a single byte or word.
Error-correcting code is calculated across corresponding bits on each data disk, and the bits of
the code are stored in the corresponding bit positions on multiple parity disks.
Typically, Hamming code is used to correct single-bit errors and detect double-bit errors.
Although RAID 2 requires fewer disks than RAID 1, it is still rather costly. 
On a single read, all disks are simultaneously accessed. The requested data and the
associated error-correcting code are delivered to the array controller. 
On a single write, all data disks and parity disks must be accessed for the write operation. 
RAID 2 would only be an effective choice in an environment in which many disk errors
occur. 
RAID 2 has never been implemented.

RAID 3 (Bit-Interleaved Parity)


RAID 3 requires only a single redundant disk, no matter how large the disk array. It employs
parallel access, with data distributed in small strips.
Instead of an error-correcting code, a simple parity bit is computed for the set of individual bits
in the same position on all of the data disks.
In the event of a drive failure, the parity drive is accessed and data is reconstructed from the
remaining devices. Once the failed drive is replaced, the missing data can be restored on the new
drive and operation resumed.

RAID 4 (Block-Level Parity)


RAID 4 make use of an independent access technique. In an independent access array, each
member disk operates independently, so that separate I/O requests can be satisfied in parallel.
Because of this, independent access arrays are more suitable for applications that require high
I/O request rates and are relatively less suited for applications that require high data transfer rates.
ata striping is used where the strips are relatively large. A bit-by-bit parity strip is calculated across
corresponding strips on each data disk, and the parity bits are stored in the corresponding strip on the parity
disk.
RAID 5 (Block-Level Distributed Parity)
RAID 5 is organized in a similar fashion to RAID 4. The difference is that RAID 5
distributes the parity strips across all disks.
The distribution of parity strips across all drives avoids the potential I/O bottleneck found
in RAID 4.

RAID 6 (Dual Redundancy)


In the RAID 6 scheme, two different parity calculations are carried out and stored in
separate blocks on different disks.
Thus, a RAID 6 array whose user data require N disks consists of N+2 disks.
This makes it possible to regenerate data even if two disks containing user data fail.
Three disks would have to fail to cause data to be lost.

Optical Memory

•In 1983, the compact disk (CD) digital audio system was introduced.
•The CD is a nonerasable disk that can store more than 60 minutes of audio information on
one side.
•The huge commercial success of the CD enabled the development of low-cost optical-disk
storage technology that has revolutionized computer data storage.

Compact Disk Read-Only Memory (CD-ROM)


It is a non erasable disk used for storing computer data about 700 Mbytes.
Digitally recorded information is imprinted as a series of microscopic pits on surface of
polycarbonate by a laser.
The pitted surface is then coated with a highly reflective surface, usually aluminum or gold.
This shiny surface is protected against dust and scratches by a top coat of clear acrylic.
Finally, a label can be silkscreened onto the acrylic.

Compact Disk Read-Only Memory (CD-ROM)


Information is retrieved from a CD or CD-ROM by a low-powered laser housed in an
optical-disk player, or drive unit.
The laser shines through the clear polycarbonate while a motor spins the disk.
The intensity of the reflected light of the laser changes as it encounters a pit.
The areas between pits are called lands.
The change between pits and lands is detected by a photosensor and converted into a digital
signal.


CD Recordable (CD-R)
It is a write-once read-many CD. It is prepared in such a way that it can be subsequently
written once with a laser beam of modest intensity. 
Thus, with a somewhat more expensive disk controller than for CD-ROM, the customer
can write once as well as read the disk. 
For a CD-R, medium includes a dye layer which is used to change reflectivity and is
activated by a high-density laser. 
CD-R disk can be read on a CD-R drive or a CD-ROM drive.
The CD-R optical disk is attractive for archival storage of documents and files. It provides
a permanent record of large volumes of user data.

CD Rewritable (CD-RW)
Can be repeatedly written and overwritten. It uses an approach called phase change.
The phase change disk uses a material that has two significantly different reflectivity in two
different phase states.
A beam of laser light can change the material from one phase to the other.


Digital Versatile Disk (DVD)
The DVD has replaced the videotape used in video cassette recorders (VCRs) and the CD-
ROM in personal computers and servers. 
The DVD takes video into the digital age. DVDs come in writeable as well as read-only
versions.
The DVD’s greater capacity is due to three differences from CDs: 
Bits are packed more closely on a DVD, thus resulting in a capacity of 4.7GB. 
The DVD can employ a second layer of pits and lands on top of the first layer, known as dual
layer. It increased the capacity to 8.5GB. 
The DVD can be double sided. This brings total capacity up to 17 GB. 

Blu-ray Disk
There is a new technology named as Blu-ray disk, designed to store HD videos and provide
greater storage capacity compared to DVDs.
The higher bit density is achieved by using a laser with a shorter wavelength, in the blue-
violet range.
Blu-ray can store 25 GB on a single layer. Three versions are available: read only (BD-
ROM), recordable once (BD-R), and rerecordable (BD-RE).

Magnetic Tape
•The medium is flexible polyester tape coated with magnetizable material. Virtually all tapes
are housed in cartridges.
•A tape drive is a sequential-access device. If the tape head is positioned at record 1, then to
read record N, it is necessary to read physical records 1 through N-1, one at a time.
•If the head is currently positioned beyond the desired record, it is necessary to rewind the
tape a certain distance and begin reading forward.
•Unlike the disk, the tape is in motion only during a read or write operation.
•Magnetic tape was the first kind of secondary memory. It is still widely used as the lowest-
cost, slowest-speed member of the memory hierarchy.
•The dominant tape technology today is a cartridge system known as linear tape-open (LTO)
that has a capacity range between 200GB and 6.25TB.
Flash Memory
•Flash memories are also used very widely as external storage devices.
•Memory cards are the most commonly used implementation of flash memories. They are
used in lots of devices such as MP3 players, mobile phones and digital cameras.
•Card readers are used for writing and reading a memory card in computers.

•There are also flash memories that can be connected to the computer through the USB
interface
LECTURE 2: PROCESSOR
Introduction to CPU
The operation or task that must perform by CPU are:
 Fetch Instruction: The CPU reads an instruction from memory.
 Interprete Instruction: The instruction is decoded to determine what action is
required.
 Fetch Data: The execution of an instruction may require reading data from
memory or I/O module.
 Process data: The execution of an instruction may require performing some
arithmatic or logical operation on
 data.
 Write data: The result of an execution may require writing data to memory or an
I/O module.
To do these tasks, the CPU needs to store some data temporarily. It must remember the
location of the last instruction so that it can know where to get the next instruction. It
needs to
store instructions and data temporarily while an instruction is being executed.

The CPU is connected to the rest of the system through system bus. Through system
bus, data or information gets transferred between the CPU and the other component of
the system. The system bus may have three components:
 Data Bus:
Data bus is used to transfer the data between main memory and CPU.
 Address Bus:
Address bus is used to access a particular memory location by putting the
address of the memory location.
 Control Bus:
Control bus is used to provide the different control signal generated by CPU to
different part of the system.
Example,
Memory read a signal generated by CPU to indicate that a memory read operation has
to be performed. Through control bus this signal is transferred to memory module to
indicate the required operation.
Figure : CPU with the system Bus

Figure : Internal Structure of the CPU

Register Organization
A computer system employs a memory hierarchy. At the highest level of hierarchy,
memory is faster, smaller and more expensive. Within the CPU, there is a set of registers
which can be treated as a memory in the highest level of hierarchy. The registers in the
CPU can be categorized into two groups:
- User-visible registers: These enables the machine - or assembly-language
programmer to minimize main memory reference by optimizing use of registers.
- Control and status registers: These are used by the control unit to control the
operation of the CPU. Operating system programs may also use these in
privileged mode to control the execution of program.
User-visible Registers:
The user-visible registers can be categorized as follows:
i) General-purpose registers which can be assigned to a variety of functions by the
programmer. In some cases, general purpose registers can be used for addressing
functions (e.g., register indirect, displacement). In other cases, there is a partial or
clean separation between data registers and address registers.
ii) Data registers may be used to hold only data and cannot be employed in the
calculation of an operand address.
iii) Address registers may be somewhat general purpose, or they may be devoted to
a particular addressing mode. e.g
- Segment pointer: In a machine with segment addressing, a segment register
holds the address of the base of the segment. There may be multiple registers,
one for the code segment and one for the data segment.
- Index registers: These are used for indexed addressing and may be autoindexed.
- Stack pointer: If there is user visible stack addressing, then typically the stack is
in memory and there is a dedicated register that points to the top of the stack.

iv) Condition Codes (also referred to as flags) are bits set by the CPU hardware as
the result of the operations. For example, an arithmetic operation may produce a
positive, negative, zero or overflow result. In addition to the result itself begin
stored in a register or memory, a condition code is also set. The code may be
subsequently be tested as part of a condition branch operation. Condition code
bits are collected into one or more registers.
There are a variety of CPU registers that are employed to control the operation of the
CPU. Most of these, on most machines, are not visible to the user. Different machines
will have different register organizations and use different terminology. Most
commonly used registers which are part of most of the machines aer.
- Program Counter (PC): Contains the address of an instruction to be fetched.
Typically, the PC is updated by the CPU after each instruction fetched so that it
always points to the next instruction to be executed. A branch or skip instruction
will also modify the contents of the PC.
- Instruction Register (IR): Contains the instruction most recently fetched. The
fetched instruction is loaded into an IR, where the opcode and operand
specifiers are analyzed.
- Memory Address Register (MAR): Contains the address of a location of main
memory from where information has to be fetched or information has to be stored.
Contents of MAR is directly connected to the address bus.
- Memory Buffer Register (MBR): Contains a word of data to be written to
memory or the word most recently read.
Contents of MBR is directly connected to the data bus. It is also known as Memory
Data Register(MDR).
Concept of Program Execution
The instructions constituting a program to be executed by a computer are loaded in
sequential locations in its main memory. To execute this program, the CPU fetches one
instruction at a time and performs the functions specified.
Instructions are fetched from successive memory locations until the execution of a
branch or a jump instruction.
The CPU keeps track of the address of the memory location where the next instruction
is located through the use of a dedicated CPU register, referred to as the program
counter (PC). After fetching an instruction, the contents of the PC are updated to point
at the next instruction in sequence.

Example: Let us assume that each instruction occupies one memory word. Therefore,
execution of one instruction requires the following three steps to be performed by the
CPU:
1. Fetch the contents of the memory location pointed at by the PC. The contents of this
location are interpreted as an instruction to be executed. Hence, they are stored in the
instruction register (IR). Symbolically this can be written as:
IR = [ [PC] ]
2. Increment the contents of the PC by 1.
PC = [PC] + 1
3. Carry out the actions specified by the instruction stored in the IR.

The first two steps are usually referred to as the fetch phase and the step 3 is known as
the execution phase.
Fetch cycle basically involves read the next instruction from the memory into the CPU
and along with that update the contents of the program counter.
In the execution phase, it interpretes the opcode and perform the indicated operation.
The instruction fetch and execution phase together known as instruction cycle.
The basic instruction cycle is shown below

Figure B : Basic Instruction cycle


In cases, where an instruction occupies more than one word, step 1 and step 2 can be
repeated as many times as necessary to fetch the complete instruction. In these cases,
the execution of a instruction may involve one or more operands in memory, each of
which requires a memory access.

The fetched instruction is loaded into the instruction register. The instruction contains
bits that specify the action to be performed by the processor. The processor interpretes
the instruction and performs the required action. In general, the actions fall into four
categories:

- Processor-memory: Data may be transferred from processor to memory or from


memory to processor.
- Processor-I/O: Data may be transferred to or from a peripheral device by
transferring between the processor and an I/O module.
- Data processing: The processor may perform some arithmetic or logic operation
on data.
- Control: An instruction may specify that the sequence of execution be altered.
The main line of activity consists of alternating instruction fetch and instruction
execution activities. For any given instruction cycle, some states may be null and other
may be visited more than once. The states are:
- Instruction Address Calculation (IAC): Determine the address of the next
instruction to be executed. Usually, this involves adding a fixed number of the
address of the previous instruction.
- Instruction Fetch (IF): Read instruction from the memory location into the
processor.
- Instruction Operation Decoding (IOD): Analyze instruction to determine type
of operation to be performed and operand(s) to be used.
- Operand Address Calculation (OAC): If the operation involves reference to an
operand in memory or available via I/O, then determine the address of the
operand.
- Operand Fetch (OF): Fetch the operand from memory or read it from I/O.
- Data Operation (DO): Perform the operation indicated in the instruction.
- Operand Store (OS): Write the result into memory or out to I/O.

Interrupts
Virtually all computers provide a mechanism by which other module (I/O, memory
etc.) may interrupt the normal processing of the processor. The most common classes of
interrupts are:
- Program: Generated by some condition that occurs as a result of an instruction
execution, such as arithmatic overflow, division by zero, attempt to execute an
illegal machine instruction, and reference outside the user's allowed memory
space.
- Timer: Generated by a timer within the processor. This allows the operating
system to perform certain functions on a regular basis.
- I/O: Generated by an I/O controller, to signal normal completion of an operation
or to signal a variety of error conditions.
- Hardware failure: Generated by a failure such as power failure or memory
parity error.
Processor Organization
CPU consists of several components There are several ways to place these components
and inteconnect them. These are
1) Single Bus organization

The arithmetic and logic unit (ALU), and all CPU registers are connected via a single
common bus. This bus is internal to CPU and this internal bus is used to transfer the
information between different components of the CPU. This organization is termed as
single bus organization, since only one internal bus is used for transferring of
information between different components of CPU. We have external bus or buses to
CPU also to connect the CPU with the memory module and I/O devices.

In this organization, two registers, namely Y and Z are used which are transparent to
the user. Programmer cannot directly access these two registers. These are used as input
and output buffer to the ALU which will be used in ALU operations. They will be used
by CPU as temporary storage for some instructions.
2) Two bus Organization
An alternative structure is the two bus structure, where two different internal buses
are used in CPU. All register outputs are connected to bus A, add all registered
inputs are connected to bus B.
There is a special arrangement to transfer the data from one bus to the other bus.
The buses are connected through the bus tie G. When this tie is enabled data on bus
A is transfer to bus B. When G is disabled, the two buses are electrically isolated.
Since two buses are used here the temporary register Z is not required here which is
used in single bus organization to store the result of ALU. Now result can be directly
transferred to bus B, since one of the inputs is in bus A. With the bus tie disabled,
the result can directly be transferred to destination register.
For example, for the operation, [R3] ----- [R1] + [R2] can now be performed as
1. R1out, G enable, Yin
2. R2out, Add, ALUout, R3in

In this case source register R2 and destination register R3 has to be different, because
the two operations R2in and R2out cannot be performed together.

3) Three bus Organisation


Three internal CPU buses are used. In this organization each bus connected to only one
output and number of inputs. The elimination of the need for connecting more than one
output to the same bus leads to faster bus transfer and simple control.
A multiplexer is provided at the input to each of the two work registers A and B, which
allow them to be loaded from either the input data bus or the register data bus.
In this three bus organization, we are keeping two input data buses instead of one that
is used in two bus organization. Two separate input data buses are present – one is for
external data transfer, i.e. retrieving from memory and the second one is for internal
data transfer that is transferring data from general purpose register to other building
block inside the CPU.
Like two bus organization, we can use bus tie to connect the input bus and output bus

Design of Control Unit


To execute an instruction, the control unit of the CPU must generate the required
control signal in the proper sequence. As for example, during the fetch phase, CPU has
to generate PCout signal along with other required signal in the first clock pulse.
In the second clock pulse CPU has to generate PCin signal along with other required
signals.
So, during fetch phase, the proper sequence for generating the signal to retrieve from
and store to PC is PCout and PCin.
To generate the control signal in proper sequence, a wide variety of techniques exist.
Most of these techniques, however, fall into one of the two categories

1. Hardwired Control
2. Microprogrammed Control.
Hardwired Control
In hardwired control techniques, the control signals are generated by means of
hardwired circuit. The main objective of control unit is to generate the control signal in
proper sequence.
Since, the control unit is implemented by hardwire device and every device is having a
propagation delay, due to which it requires some time to get the stable output signal at
the output port after giving the input signal.
Example:
By looking into the design of the CPU, we may say that there are various instruction for
add operation. As for example,
- ADD NUM R1 : Add the contents of memory location specified by NUM to the
contents of register R1 .
- ADD R2 R1: Add the contents of register R2 to the contents of register R1.

The control sequence for execution of these two ADD instructions are different.
Therefore it is clear that control signals depend on the instruction, i.e., the contents of
the instruction register. It is also observed that execution of some of the instructions
depend on the contents of condition code or status flag register, where the control
sequence depends in conditional branch instruction.
The required control signals are uniquely determined by the following information:
- Contents of the control counter.
- Contents of the instruction register.
- Contents of the condition code and other status flags.
The external inputs represent the state of the CPU and various control lines connected
to it, such as MFC status signal. The condition codes/ status flags indicates the state of
the CPU. These includes the status flags like carry, overflow, zero, etc.
Control Unit Organization
The decoder/encoder block is simply a combinational circuit that generates the
required control outputs depending on the state of all its input.
The decoder/encoder part provide a separate signal line for each control step, or time
slot in
the control sequence. Similarly, the output of the instructor decoder consists of a
separate line for each machine instruction loaded in the IR, one of the output line INS1
to INSm is set to 1 and all other lines are set to 0. All input signals to the encoder block
should be combined to generate the individual control signals.

Microprogrammed Control
In microprogrammed control unit, the logic of the control unit is specified by a
microprogram.
A microprogram consists of a sequence of instructions in a microprogramming
language. These are instructions that specify microoperations.
A microprogrammed control unit is a relatively simple logic circuit that is capable of
(1) Sequencing through microinstructions
(2) Generating control signals to execute each microinstruction.
The concept of microprogram is similar to computer program. In computer program the
complete instructions of the program is stored in main memory and during execution it
fetches the instructions from main memory one after another. The sequence of
instruction fetch is controlled by program counter (PC).
Microprogram are stored in microprogram memory and the execution is controlled by
microprogram counter (µPC).
Microprogram consists of microinstructions which are strings of 0's and 1's. In a
particular instance, we read the contents of one location of microprogram memory,
which is a microinstruction.
Each output line (data line) of microprogram memory corresponds to one control
signal. If the contents of the memory cell is 0, it indicates that the signal is not generated
and if the contents of memory cell is 1, it indicates to generate the control signal at that
instant of time.

Control Word (CW) :


Control word is defined as a word whose individual bits represent the various control
signal. Therefore each of the control steps in the control sequence of an instruction
defines a unique combination of 0s and 1s in the CW.
A sequence of control words (CWs) corresponding to the control sequence of a machine
instruction constitutes the microprogram for that instruction. The individual control
words in this microprogram are referred to as microinstructions.
The microprograms corresponding to the instruction set of a computer are stored in a a
special memory which will be referred to as the microprogram memory. The control
words related to an instructions are stored in microprogram memory.
The control unit can generate the control signals for any instruction by sequentially
reading the CWs of the corresponding microprogram from the microprogram memory.
To read the control word sequentially from the microprogram memory a microprogram
counter (PC) is needed.
The basic organization of a microprogrammed control unit is shown in below

The "starting address generator" block is responsible for loading the starting address of
the microprogram into the PC every time a new instruction is loaded in the IR.
The PC is then automatically incremented by the clock, and it reads the successive
microinstruction from memory.

Each microinstruction basically provides the required control signal at that time step.
The microprogram counter ensures that the control signal will be delivered to the
various parts of the CPU in correct sequence.
In microprogrammed controlled control unit, a common microprogram is used to fetch
the instruction. This microprogram is stored in a specific location and execution of each
instruction start from that memory location.
At the end of fetch microprogram, the starting address generator unit calculate the
appropriate starting address of the microprogram for the instruction which is currently
present in IR. After the µPC controls the execution of microprogram which generates
the appropriate control signal in proper sequence.
During the execution of a microprogram, the µPC is always incremented every time a
new microinstruction is fetched from the microprogram memory, except in the
following situations:
1. When an End instruction is encountered, the µPC is loaded with the address of
the first CW in the microprogram for the instruction fetch cycle.
2. When a new instruction is loaded into the IR, the PC is loaded with the starting
address of the microprogram for that instruction.
3. When a branch microinstruction is encountered, and the branch condition is
satisfied, the PC is loaded with the branch address.
In microprogrammed controlled CU,
- Each machine instruction can be implemented by a micro routine.
- Each micro routine can be accessed initially by decoding the machine instruction
into the starting address to be loaded into the µPC.
This indicates that the microprogrammed control unit has to perform two basic tasks:
- Microinstruction sequencing: Get the next microinstruction from the control
memory.
- Microinstruction execution: Generate the control signals needed to execute the
microinstruction.
In designing a control unit, these tasks must be considered together, because both affect
the format of the microinstruction and the timing of control unit.
Two concerns are involved in the design of a microinstruction sequencing technique:
the size of the microinstruction and the address generation time.
Lecture: INPUT / OUTPUT

Computer system’s I/O architecture is its interface to outside world. This architecture provides a
systematic means of controlling interaction with the outside world and provides the operating
system with the information it needs to manage I/O activity effectively.
Each I/O module interfaces to system bus and controls one or more peripheral devices.

There are several reasons why an I/O device or peripheral device is not directly connected to the
system bus. Some of them are as follows –
- There are a wide variety of peripherals with various methods of operation. It would be
impractical to include the necessary logic within the processor to control several devices.
- The data transfer rate of peripherals is often much slower than that of the memory or
processor. Thus, it is impractical to use the high-speed system bus to communicate directly
with a peripheral.
- Peripherals often use different data formats and word lengths than the computer to which
they are attached.
Thus, an I/O module is required.
Two major functions of an I/O module are:
-Interface to CPU and memory via system bus or central switch.
-Interface to one or more peripheral devices by tailored data links.

Input / Output Modules


The major functions of an I/O module are categorized as follows –
- Control and timing: The I/O function includes a control and timing requirement to co-
ordinate the flow of traffic between internal resources and external devices.
- Processor Communication: During the I/O operation, the I/O module must communicate
with the processor and with the external device.
- Device Communication:
- Data Buffering: An essential task of an I/O module is data buffering. The data buffering is
required due to the mismatch of the speed of CPU, memory and other peripheral devices. In
general, the speed of CPU is higher than the speed of the other peripheral devices. So, the I/O
modules store the data in a data buffer and regulate the transfer of data as per the speed of the
devices. Thus the I/O module must be able to operate at both device and memory speed.
- Error Detection: I/O module must detect and correct or report error that occur to CPU
During any period of time, the processor may communicate with one or more external devices in
unpredictable manner, depending on the program's need for I/O.
The internal resources, such as main memory and the system bus, must be shared among a
number of activities, including data I/O.
There will be many I/O devices connected through I/O modules to the system. Each device will
be identified by a unique address. When the processor issues an I/O command, the command
contains the address of the device that is used by the command. The I/O module must interpret
the address lines to check if the command is for itself.
Generally in most of the processors, the processor, main memory and I/O share a common
bus(data address and control bus).

Two types of addressing are possible -


I) Memory-mapped I/O
There is a single address space for memory locations and I/O devices.
The processor treats the status and address register of the I/O modules as memory location.
For example, if the size of address bus of a processor is 16, then there are 216 combinations and all
together 216 address locations can be addressed with these 16 address lines.
Since I/O devices are included in the same memory address space, so the status and address registers
of I/O modules are treated as memory location by the processor.
II) Isolated or I/O mapped I/O
In this scheme, the full range of addresses may be available for both. The address refers to a memory
location or an I/O device is specified with the help of a command line. Since full range of address is
available for both memory and I/O devices, so, with 16 address lines, the system may now support
both 2 16 memory locations and 2 16 I/O addresses.

I/O Sub System


There are three techniques for I/O operations.
-Programmed I/O
- Interrupt-driven I/O
-Direct memory access (DMA)

1) Programmed I/O
With programmed I/O, data are exchanged between the processor and the I/O module. When the
processor issues a command to the I/O module, it must wait until the I/O operation is complete.
•If the processor is faster than the I/O module, this is wasteful of processor time.
The processor is responsible for extracting data from main memory for output and storing data in
main memory for input.
Overview of programmed I/O:
-When the processor is executing a program and encounters an instruction relating to I/O, it
executes that instruction by issuing a command to the appropriate I/O module.
-The I/O module will perform the requested action and then set the appropriate bits in the I/O
status register.
-The I/O module takes no further action to alert the processor.
-Thus, it is the responsibility of the processor periodically to check the status of the I/O module
until it finds that the operation is complete. This is called polling.

2) Interrupt-Driven I/O
The problem with programmed I/O is that the processor has to wait a long time for the I/O
module. The processor must repeatedly check the status of the I/O module. As a result, the level
of the performance of the entire system is severely degraded.
-An alternative is for the processor to issue an I/O command to a module and then go on to do
some other useful work.
-The I/O module will then interrupt the processor to request service when it is ready to exchange
data with the processor.

Two design issues in implementing interrupt-driven I/O should be considered.


-How does the processor determine which device issued the interrupt?
-If multiple interrupts have occurred, how does the processor decide which one to process?
There are four general techniques for device identification:
 Multiple interrupt lines
 Software poll
 Daisy chain
 Bus arbitration

Multiple interrupt lines


It provides multiple interrupt lines between the CPU and I/O modules. However, it is
impractical to dedicate more than a few bus lines. Even if multiple lines are used, it is likely that
each line will have multiple I/O modules attached to it. Thus, one of the other three techniques
must be used on each line.

Software poll
When the processor detects an interrupt, it branches to an interrupt-service routine whose job it is
to poll each I/O module to determine which module caused the interrupt. The poll could be in
the form of a separate command line. In this case, the processor raises TESTI/O command and
places the address of a particular I/O module on the address lines.
-The I/O module responds positively if it set the interrupt. Once the correct module is identified,
the processor branches to a device-service routine specific to that device. It is a time consuming
process.

Daisy chain
For interrupts, all I/O modules share a common interrupt request line. The interrupt
acknowledge line is daisy chained through the modules. When the processor senses an interrupt,
it sends out an interrupt acknowledge. This signal propagates through a series of I/O modules
until it gets to a requesting module.
-The requesting module typically responds by placing a word on the data lines. This word is
referred to as a vector and is either the address of the I/O module or some other unique
identifier.
-In either case, the processor uses the vector as a pointer to the appropriate device-service
routine. This avoids the need to execute a general interrupt-service routine first. This technique is
also called a vectored interrupt.

Bus arbitration
An I/O module must first gain control of the bus before it can raise the interrupt request line.
Only one module can raise the line at a time. When the processor detects the interrupt, it
responds on the interrupt acknowledge line. The requesting module then places its vector on the
data lines.

If multiple interrupts have occurred, how does the processor decide which one to process?
- With multiple lines, the processor picks the interrupt line with the highest priority.
- With software polling, the order in which modules are polled determines their priority.
- The order of modules on a daisy chain determines their priority.
- Bus arbitration can also employ a priority scheme.

3) Direct Memory Access


Both Programmed I/O and Interrupt I/O requires the active intervention of the processor to
transfer data between memory and the I/O module, and any data transfer must transverse a path
through the processor. Thus both these forms of I/O suffer from two inherent drawbacks.
- The I/O transfer rate is limited by the speed with which the processor can test and service a
device.
- The processor is tied up in managing an I/O transfer; a number of instructions must be
executed for each I/O transfer.
To transfer large block of data at high speed, a special control unit may be provided to allow
transfer of a block of data directly between an external device and the main memory, without
continuous intervention by the processor. This approach is called direct memory access or DMA.
DMA transfers are performed by a control circuit associated with the I/O device and this circuit
is referred as DMA controller. The DMA controller allows direct data transfer between the device
and the main memory without involving the processor.

To transfer data between memory and I/O devices, DMA controller takes over the control of the
system from the processor and transfer of data take place over the system bus. For this purpose,
the DMA controller must use the bus only when the processor does not need it, or it must force
the processor to suspend operation temporarily. The later technique is more common and is
referred to as cycle stealing, because the DMA module in effect steals a bus cycle.

DMA Function
When CPU wishes to read or write a block of data, it issues a command to DMA module
containing:
-Whether a read or write is requested, using read or write control lines.
-Address of the I/O device involved, communicated on data lines.
-Starting location in memory to read from or write to, communicated on data lines and stored by
DMA module in its address register.
-Number of words to be read or written, communicated via data lines and stored in data count
register.

In this way, CPU continues with other work. DMA module handles entire operation, directly to
or from memory without going through CPU. When the transfer is complete, the DMA module
sends an interrupt signal to the processor. Thus, the processor is involved only at the beginning
and end of the transfer.
•DMA module can force CPU to suspend operation while it transfers a word. This is not an
interrupt since CPU does not save anything, it is just a wait state.
•Overall effect is to slow operation of CPU, but still DMA is far more efficient than interrupt-
driven or programmed I/O.

The DMA mechanism can be configured in different ways. The most common amongst them are:
- Single bus, detached DMA - I/O configuration.
- Single bus, Integrated DMA - I/O configuration.
- Using separate I/O bus.

The evolution of the I/O function can be summarized as follows:


- The CPU directly controls a peripheral device.
- A controller or I/O module is added. The CPU uses programmed I/O without interrupts.
- Interrupts are employed and the CPU need not spend time waiting for an I/O operation to be
performed.
- The I/O module is given direct access to memory via DMA.
- The I/O module is enhanced to become a processor in its own right, with a specialized
instruction set tailored for I/O.
- The I/O module has a local memory of its own and is, in fact, a computer in its own right.

The External Interface


One major characteristic of the interface is whether it is serial or parallel.
•In a parallel interface, there are multiple lines connecting the I/O module and the peripheral,
and multiple bits are transferred simultaneously.
•In a serial interface, there is only one line used to transmit data, and bits must be transmitted
one at a time.

Key to the operation of an I/O module is an internal buffer that can store data being passed
between the peripheral and the rest of the system. With a new generation of high-speed serial
interfaces, parallel interfaces are becoming much less common.

The connection between an I/O module in a computer system and external devices can be:
A point-to-point interface provides a dedicated line between the I/O module and the external
device. On small systems (PCs, workstations), typical point-to-point links include those to the
keyboard, printer and external modem.

Of increasing importance are multipoint external interfaces, used to support external mass
storage devices (disk and tape drives) and multimedia devices (CD-ROMs, video, audio). These
multipoint interfaces are in effect external buses.
Two key examples are FireWire and InfiniBand for point-to-point and multipoint interfaces,
respectively.
FireWire
FireWire is a high-speed, low cost and easy to implement serial bus. In fact, FireWire is finding
favor not only for computer systems, but also in consumer electronics products, such as digital
cameras, DVD players/recorders, and televisions.
•In these products, FireWire is used to transport video images, which are increasingly coming
from digitized sources. One of the strengths of the FireWire interface is that it uses serial
transmission (bit at a time) rather than parallel. It does not require wide cables like parallel
architectures. Thus, it is physically small.
•FireWire provides hot plugging, which makes it possible to connect and disconnect peripherals
without having to power the computer system down or reconfigure the system.

InfiniBand
InfiniBand is a recent I/O specification aimed at the high-end server market. InfiniBand has
become a popular interface for storage area networking and other large storage configurations. In
essence, InfiniBand enables servers, remote storage, and other network devices to be attached in a
central fabric of switches and links. The switch-based architecture can connect up to 64,000
servers, storage systems, and networking devices.

The key elements are:


Host channel adapter (HCA) links the server to an InfiniBand switch. The HCA attaches to the
server at a memory controller, which has access to the system bus and controls traffic between
the processor and memory and between the HCA and memory.
Target channel adapter (TCA) is used to connect storage systems, routers and other peripheral
devices to an InfiniBand switch.
InfiniBand switch provides point-to-point physical connections to a variety of devices and
switches traffic from one link to another.

Fig: InfiniBand architecture:


- Links are used between a switch and a channel adapter, or between two switches.
- Subnet consists of one or more interconnected switches plus the links that connect other
devices to those switches.
- Router connects InfiniBand subnets, or connects an InfiniBand switch to a network, such as a
local area network, wide area network, or storage area network.
Lec 7: Machine instructions
Machine Instructions
The CPU can only execute machine code in binary format, called machine instructions.
A machine instruction specifies the following information:
– What has to be done (the operation code)
– To whom the operation applies (source operands)
– Where does the result go (destination/Result operand)
– How to continue after the operation is finished (next instruction address).
The next instruction to be fetched is located in main memory. But in case of virtual
memory system, it may be either in main memory or secondary memory (disk). In most
cases, the next instruction to be fetched immediately follow the current instruction. In
those cases, there is no explicit reference to the next instruction. When an explicit
reference is needed, then the main memory or virtual memory address must be given.
Source and result operands can be in one of the three areas:
- Main or virtual memory,
- CPU register or
- I/O device.
The steps involved in instruction execution is shown in the figure-

Instruction Representation
Within the computer, each instruction is represented by a sequence of bits. The
instruction is divided into fields, corresponding to the constituent elements of the
instruction. The instruction format is highly machine specific and it mainly depends on
the machine architecture.
It is difficult to deal with binary representation of machine instructions. Thus, it has
become common practice to use a symbolic representation of machine instructions.
Opcodes are represented by abbreviations, called mnemonics,that indicate the
operations.
Common examples include:
ADD Add
SUB Subtract
MULT Multiply
DIV Division
LOAD Load data from memory to CPU
STORE Store data to memory from CPU.

Operands are also represented symbolically. For example, the instruction


MULT R, X : R --------- R x X
Mean multiply the value contained in the data location X by the contents of register R
and put the result in register R
In this example, X refers to the address of a location in memory and R refers to a
particular register.

Instruction Format:
An instruction format defines the layout of the bits of an instruction, in terms of its
constituents parts.
A simple example of an instruction format is shown in the figure.

It is assume that it is a 16-bit CPU. 4 bits are used to provide the operation code. So, we
may have 16 (24 = 16) different set of instructions. With each instruction, there are two
operands. To specify each operands, 6 bits are used.
An instruction format must include an opcode and, implicitly or explicitly, zero or more
operands. Each explicit operand is referenced using one of the addressing mode that is
available for that machine. The format must, implicitly or explicitly, indicate the
addressing mode of each operand. For most instruction sets, more than one instruction
format is used. Four common instruction format are shown below
Instruction Types
The instruction set of a CPU can be categorized as follows:
i) Data Processing:
Arithmetic and Logic instructions Arithmetic instructions provide computational
capabilities for processing numeric data.
Logic (Boolean) instructions operate on the bits of a word as bits rather than as
numbers. Logic instructions thus provide capabilities for processing any other type of
data. There operations are performed primarily on data in CPU registers.

ii) Data Storage:


Memory instructions Memory instructions are used for moving data between memory
and CPU registers.
iii) Data Movement:
I/O instructions I/O instructions are needed to transfer program and data into memory
from storage device or input device and the results of computation back to the user.
iv) Control:
Test and branch instructions. Test instructions are used to test the value of a data word
or the status of a computation. Branch instructions are then used to branch to a different
set of instructions depending on the decision made.

Number of Addresses
What is the maximum number of addresses one might need in an instruction? Most of
the arithmetic and logic operations are either unary (one operand) or binary (two
operands). Thus we need a maximum of two addresses to reference operands. The
result of an operation must be stored, suggesting a third address. Finally after
completion of an instruction, the next instruction must be fetched, and its address is
needed. This reasoning suggests that an instruction may require to contain four address
references: two operands, one result, and the address of the next instruction. In practice,
four address instructions are rare. Most instructions have one, two or three operands
addresses, with the address of the next instruction being implicit (obtained from the
program counter).

Instruction Set Design


One of the most interesting, and most analyzed, aspects of computer design is
instruction set design. The instruction set defines the functions performed by the CPU.
The instruction set is the programmer's means of controlling the CPU. Thus
programmer requirements must be considered in designing the instruction set.
Most important and fundamental design issues:
- Operation repertoire : How many and which operations to provide, and how
complex operations should be.
- Data Types : The various type of data upon which operations are performed.
- Instruction format : Instruction length (in bits), number of addresses, size of
various fields and so on.
- Registers : Number of CPU registers that can be referenced by instructions and
their use.
- Addressing : The mode or modes by which the address of an operand is
specified.

Types of Operands
Machine instructions operate on data. Data can be categorised as follows :
- Addresses: It basically indicates the address of a memory location. Addresses are
nothing but the unsigned integer,but treated in a special way to indicate the
address of a memory location. Address arithmatic is somewhat different from
normal arithmatic and it is related to machine architecture.
- Numbers: All machine languages include numeric data types. Numeric data are
classified into two broad categories: integer or fixed point and floating point.
- Characters: A common form of data is text or character strings. Since computer
works with bits, so characters are represented by a sequence of bits. The most
commonly used coding scheme is ASCII (American Standard Code for
Information Interchange) code.
- Logical Data: Normally each word or other addressable unit (byte, halfword,
and so on) is treated as a single unit of data. It is sometime useful to consider an
n-bit unit as consisting of n 1-bit items of data, each item having the value 0 or 1.
When data are viewed this way, they are considered to be logical data. Generally
1 is treated as true and 0 is treated as false.
Types of Operations
The number of different opcodes and their types varies widely from machine to machine.
However, some general type of operations are found in most of the machine
architecture. Those operations can be categorized as follows:
- Data Transfer
- Arithmetic
- Logical
- Conversion
- Input Output [ I/O ]
- System Control
- Transfer Control
Addressing Modes
The most common addressing techniques are:
- Immediate
- Direct
- Indirect
- Register
- Register Indirect
- Displacement
- Stack
All computer architectures provide more than one of these addressing modes. The
question arises as to how the control unit can determine which addressing mode is
being used in a particular instruction. Several approaches are used. Often, different
opcodes will use different addressing modes. Also, one or more bits in the instruction
format can be used as a mode field. The value of the mode field determines which
addressing mode is to be used.
In a system without virtual memory, the effective address will be either a main memory
address or a register. In a virtual memory system, the effective address is a virtual
address or a register. The actual mapping to a physical address is a function of the
paging mechanism and is invisible to the programmer.

To explain the addressing modes, we use the following notation:


A = contents of an address field in the instruction that refers to a memory
R = contents of an address field in the instruction that refers to a register
EA = actual (effective) address of the location containing the referenced operand
(X) = contents of location X

i) Immediate Addressing:
The simplest form of addressing is immediate addressing, in which the operand is
actually present in the instruction:
OPERAND = A
This mode can be used to define and use constants or set initial values of variables. The
advantage of immediate addressing is that no memory reference other than the
instruction fetch is required to obtain the operand. The disadvantage is that the size of
the number is restricted to the size of the address field, which, in most instruction sets,
is small compared with the world length.
ii) Direct Addressing:
A very simple form of addressing is direct addressing, in which the address field
contains the effective address of the operand:
EA = A
It requires only one memory reference and no special calculation.
iii) Indirect Addressing:
With direct addressing, the length of the address field is usually less than the word
length, thus limiting the address range. One solution is to have the address field refer to
the address of a word in memory, which in turn contains a full-length address of the
operand. This is know as indirect addressing:
EA = (A)

iv) Register Addressing:


Register addressing is similar to direct addressing. The only difference is that the
address field referes to a register rather than a main memory address:
EA = R
The advantages of register addressing are that only a small address field is needed in
the instruction and no memory reference is required. The disadvantage of register
addressing is that the address space is very limited.

v) Register Indirect Addressing:


Register indirect addressing is similar to indirect addressing, except that the address
field refers to a register instead of a memory location.
It requires only one memory reference and no special calculation.
EA = (R)
Register indirect addressing uses one less memory reference than indirect addressing.
Because, the first information is available in a register which is nothing but a memory
address. From that memory location, we use to get the data or information. In general,
register access is much faster than the memory access.

vi) Displacement Addressing:


A very powerful mode of addressing combines the capabilities of direct addressing and
register indirect addressing, which is broadly categorized as displacement addressing:
EA = A + (R)
Displacement addressing requires that the instruction have two address fields, at least
one of which is explicit. The value contained in one address field (value = A) is used
directly. The other address field, or an implicit reference based on opcode, refers to a
register whose contents are added to A to produce the effective address.

Three of the most common use of displacement addressing are:


i) Relative Addressing:
For relative addressing, the implicitly referenced register is the program counter (PC).
That is, the current instruction address is added to the address field to produce the EA.
Thus, the effective address is a displacement relative to the address of the instruction.

ii) Base-Register Addressing:


The reference register contains a memory address, and the address field contains a
displacement from that address. The register reference may be explicit or implicit.
In some implementation, a single segment/base register is employed and is used
implicitly. In others, the programmer may choose a register to hold the base address of
a segment, and the instruction must reference it explicitly.

iii) Indexing:
The address field references a main memory address, and the reference register contains
a positive displacement from that address. In this case also the register reference is
sometimes explicit and sometimes implicit.
Generally index register are used for iterative tasks, it is typical that there is a need to
increment or decrement the index register after each reference to it.
Because this is such a common operation, some system will automatically do this as
part of the same instruction cycle.
This is known as auto-indexing. If general purpose register are used, the autoindex
operation may need to be signaled by a bit in the instruction.
Auto-indexing using increment can be depicted as follows:
EA = A + (R)
R = (R) + 1
Stack Addressing:
A stack is a linear array or list of locations. It is sometimes referred to as a pushdown list
or last-in-firstout queue. A stack is a reserved block of locations. Items are appended to
the top of the stack so that, at any given time, the block is partially filled. Associated
with the stack is a pointer whose value is the address of the top of the stack. The stack
pointer is maintained in a register. References to stack locations in memory are in fact
register indirect addresses.

You might also like