Csa Notes
Csa Notes
POLYTECHNIC
ROURKELA
COMPUTER SYSTEM
&
ARCHITECTURE
PREPARED BY
BIJAYALAXMI PADHIARY
SKDAV GOVT. POLYTECHNIC, ROURKELA
Th-1 COMPUTER SYSTEM ARCHITECTURE
Common to (CSE/IT)
Theory 4 Periods per week Internal Assessment 20 Marks
Total Periods 60 Periods End Sem Exam 80 Marks
Examination 3hours Total Marks 100Marks
B. RATIONAL: Now a days the usage of computer has become very essential in various
areas like education, entertainment, business, sports etc. This subject will expose the
learner to have an idea about the architecture of different components of a computer
system and their operation procedure. Further the learner will have idea how the different
components integrate to execute a task to get the result. It also gives an idea how to
improve the processing capability.
C. OBJECTIVE: After completion of this course the student will be able to:
Understand the basic structure of a computer with instructions.
Learn about machine instructions and program execution.
Learn about the internal functional units of a processor and how they are interconnected.
Understand how I/O transfer is performed.
Learn about basic memory circuit, organization and secondary storage.
Understand concept of parallel processing.
D. COURSE CONTENTS:
3. Processor System
3.1 Register Files
3.2 Complete instruction execution
Fetch
Decode
Execution
3.3 Hardware control
3.4 Micro program control
4. Memory System
4.1Memory characteristics
4.2 Memory hierarchy
4.3 RAM and ROM organization
4.4 Interleaved Memory
4.5 Cache memory
4.6 Virtual memory
5. Input – Output System
5.1 Input - Output Interface
5.2 Modes of Data transfer
5.3 Programmed I/O Transfer
5.4 Interrupt driven I/O
5.5 DMA
5.6 I/O Processor
7. Parallel Processing
7.1 Parallel Processing
7.2 Linear Pipeline
7.3 Multiprocessor
7.4 Flynn‟s Classification
Book Recommended:-
circuitry within a computer that carries out the instructions given by a computer
program by performing the basic arithmetic, logical, control and input/output (I/O)
operations specified by the instructions.
MEMORY UNIT
o The Memory unit can be referred to as the storage area in which programs are
kept which are running, and that contains data needed by the running programs.
o The Memory unit can be categorized in two ways namely, primary memory and
secondary memory.
o It enables a processor to access running execution applications and services that
are temporarily stored in a specific memory location.
o Primary storage is the fastest memory that operates at electronic speeds. Primary
memory contains a large number of semiconductor storage cells, capable of
storing a bit of information. The word length of a computer is between 16-64 bits.
o It is also known as the volatile form of memory, means when the computer is shut
down, anything contained in RAM is lost.
o Cache memory is also a kind of memory which is used to fetch the data very soon.
They are highly coupled with the processor.
o The most common examples of primary memory are RAM and ROM.
o Secondary memory is used when a large amount of data and programs have to be
stored for a long-term basis.
o It is also known as the Non-volatile memory form of memory, means the data is
stored permanently irrespective of shut down.
o The most common examples of secondary memory are magnetic disks, magnetic
tapes, and optical disks.
CONTROL UNIT
o The control unit is a component of a computer's central processing unit that
OUTPUT UNIT
o The primary function of the output unit is to send the processed results to the user.
Output devices display information in a way that the user can understand.
o Output devices are pieces of equipment that are used to generate information or
any other response processed by the computer. These devices display information
that has been held or generated within a computer.
o The most common example of an output device is a monitor.
COMPUTER COMPONENTS:
The main memory stored both data and instruction in binary many ways to store
binary data to perform operation on that data.
A configuration of logic component can be constructed to perform the particular
computation. This can be thought of as a form programming use in hardware and
named as “hardware program”.
The general-purpose hardware Interprets each instruction and generate the
control signal. This is called” software or microprogrammed control”
.
PERFORMANCE MEASURES: -
Response Time:-
Response time is the time spends to complete an event or an operation.
It is also called as execution time or latency.
Throughput:-
Throughput is the amount of work done per unit of time. i.e. the amount of
processing that can be accomplished during a given interval of time.
It is also called as bandwidth of the system.
In general, faster response time leads to better throughput.
Elapsed time:-
Elapsed time is a time spent from the start of execution of the program to its
completion is called elapsed time.
This performance measure is affected by the clock speed of the processor and the
concerned input output device.
Percentage of elapsed time = (user CPU time + system CPU time) / elapsed time
MIPS
A nearly measure of computer performance has the rate at which a given machine
executed instruction.
This is calculated by dividing the no. of instruction and the time required to run the
program
CPI/IPC
CPI – Clock cycle per Instruction
IPC – Instruction per cycle.
It is another measuring that which is calculated as the number of clock cycle
required to execute one instruction (cycle per instruction) by the instruction
executed per cycle.
Speed up:-
Computer architecture use the speed up to describe the performance of
architectural charges as different improvement is made to the system.
It is defined as ratio of execution time before to the execution time after the charge.
Amdahl’s law:-
This law states that “performance improvement to be gained by using a faster
mode of execution is limited by the fraction of time the faster made can be used”.
Amdahl’s law defines the term speed up.
Speed up = execution time without using enhancement /Execution time with using
enhancement
Performance parameter
The basic performance equation is given by
T = (N x S) / R
Where
T – Performance parameter of an application program.
N – No. of instruction required to complete the execution of a program.
R - Clock rate of the processor in cycles per second.
S - Avg. no. of basic step required to execute one machine instruction.
Clock rate:-
Clock rate is one of the important performance measures by improving in the clock
rate. There are two ways in which clock rate may be increased.
1) Improving IC technology which makes logic circuits faster thus reading
time taken to complete a basic step.
2) By reducing the processing amount in one basic step which by reduces
the clock period as
R=1/T
Big-endian is an order in which the "big end" (most significant value in the
sequence) is stored first (at the lowest storage address).
Little-endian is an order in which the "little end" (least significant value in the
sequence) is stored first.
For example, in a big-endian computer, the two bytes required for
the hexadecimal number 4F52 would be stored as 4F52 in storage (if 4F is stored
at storage address 1000, for example, 52 will be at address 1001).
In a little-endian system, it would be stored as 524F (52 at address 1000, 4F at
1001).
For people who use languages that read left-to-right, big endian seems like the
natural way to think of a storing a string of characters or numbers - in the same
order you expect to see it presented to you.
Many of us would thus think of big-endian as storing something in forward fashion,
just as we read.
An argument for little-endian order is that as you increase a numeric value, you
may need to add digits to the left (a higher non-exponential number has more
digits).
Thus, an addition of two numbers often requires moving all the digits of a big-
endian ordered number in storage, moving everything to the right.
In a number stored in little-endian fashion, the least significant bytes can stay
where they are and new digits can be added to the right at a higher address. This
means that some computer operations may be simpler and faster to perform.
Note that within both big-endian and little-endian byte orders, the bits within each
byte are big-endian. That is, there is no attempt to be big- or little-endian about the
entire bit stream represented by a given number of stored bytes.
For example, whether hexadecimal 4F is put in storage first or last with other bytes
in a given storage address range, the bit order within the byte will be:
IBM's 370 mainframes, most RISC-based computers, and Motorola
microprocessors use the big-endian approach. TCP/IP also uses the big-endian
approach (and thus big-endian is sometimes called network order).
On the other hand, Intel processors (CPUs) and DEC Alphas and at least some
programs that run on them are little-endian.
There are also mixed forms of endianness. For example, VAX floating point uses
mixed-endian (also referred to as middle-endian).
The ordering of bytes in a 16-bit word differs from the ordering of 16-bit words
within a 32-bit word.
Bi-endian processors can operate in either little-endian or big-endian mode, and
switch between the two.
Where,
l is the total address buses
N is the memory in bytes
For example, some storage can be described below in terms of bytes using the
above formula:
1kB= 210 Bytes
64 kB = 26 x 210 Bytes
= 216 Bytes
4 GB = 22 x 210(kB) x 210(MB) x 210 (GB)
= 232 Bytes
Memory Address Register (MAR) is the address register which is used to store the
address of the memory location where the operation is being performed.
Memory Data Register (MDR) is the data register which is used to store the data on
which the operation is being performed.
In the above diagram initially, MDR can contain any garbage value and
MAR is containing 2003 memory address.
After the execution of read instruction, the data of memory location 2003
will be read and the MDR will get updated by the value of the 2003 memory
location (3D).
In the above diagram, the MAR contains 2003 and MDR contains 3D. After
the execution of write instruction 3D will be written at 2003 memory location.
UNIT-2
INSTRUCTIONS & INSTRUCTION SEQUENCING
FUNDAMENTAL TO INSTRUCTION
The Register-reference instructions are represented by the Opcode 111 with a 0 in the
leftmost bit (bit 15) of the instruction.
Input-Output instruction
Just like the Register-reference instruction, an Input-Output instruction does not need a
reference to memory and is recognized by the operation code 111 with a 1 in the leftmost
bit of the instruction. The remaining 12 bits are used to specify the type of the input-output
operation or test performed.
Note
o The three operation code bits in positions 12 through 14 should be equal to 111.
Otherwise, the instruction is a memory-reference type, and the bit in position 15 is
taken as the addressing mode I.
o When the three operation code bits are equal to 111, control unit inspects the bit
in position 15. If the bit is 0, the instruction is a register-reference type. Otherwise,
the instruction is an input-output type having bit 1 at position 15.
Instruction Set Completeness:-
Arithmetic, logic and shift instructions provide computational capabilities for processing
the type of data the user may wish to employ.
A huge amount of binary information is stored in the memory unit, but all computations
are done in processor registers. Therefore, one must possess the capability of moving
information between these two units.
Program control instructions such as branch instructions are used change the sequence
in which the program is executed.
Input and Output instructions act as an interface between the computer and the user.
Programs and data must be transferred into memory, and the results of computations
must be transferred back to the user.
OPERANDS:-
Addresses
Numbers
Characters
Logical data
1. Address:- Address are the for of number that represent specific location in
memory.
2. Numbers:- All machine languages include numeric data types. Even in
nonnumeric data processing, there is a need for numbers to act as counters, field
widths, and so forth. An important distinction between numbers used in ordinary
mathematics and numbers stored in a computer is that the latter are limited. Thus,
the programmer is faced with understanding the consequences of rounding,
overflow and underflow.
Three types of numerical data are common in computers:
1) Data transfer
2) Arithmetic
3) Logical
4) Conversion
5) I/O
6) System control
7) Transfer of control
1) Data transfer:
The most fundamental type of machine instruction is the data transfer instruction. The
data transfer instruction must specify several things.
The location of the source and destination operands must be specified. Each
location could be memory. a register, or the lop of the stack.
The length of data to be transferred must be indicated.
As with all instructions with operands, the mode of addressing for each operand
must be specified.
In term of CPU action, data transfer operations are perhaps the simplest type. If both
source and destination are registers, then the CPU simply causes data to be transferred
from one register to another; this is an operation internal to the CPU. If one or both
operands are in memory, then (he CPU must perform some or all of following actions:
2. If the address refers to virtual memory, translate from virtual to actual memory
address.
Most machines provide the basic arithmetic operations of add, subtract, multiply, and
divide. These are invariably provided for signed integer (fixed-point) numbers, Often they
are also provided for floating-point and packed decimal numbers.
3) Logical:
Most machines also provide a variety of operations for manipulating individual bits of a
word or other addressable units, often referred to as "bit twiddling." They are based upon
Boolean operations.
Some of the basic logical operations that can be performed on Boolean or binary data are
AND, OR, NOT, XOR, …
These logical operations can be applied bitwise to n-bit logical data units. Thus, if two
registers contain the data
then
4) Conversion:
Conversion instructions are those that change the format or operate on the format of data.
An example is converting from decimal to binary.
5) Input/Output:
As we saw, there are a variety of approaches taken, including isolated programmed IO,
memory-mapped programmed I/O, DMA, and the use of an I/O processor. Many
implementations provide only a few I/O instructions, with the specific actions specified by
parameters, codes, or command words.
6) System Controls:
System control instructions are those that can he executed only while the processor is in
a certain privileged state or is executing a program in a special privileged area of memory,
typically, these instructions are reserved for the use of the operating system.
7) Transfer of control:
For all of the operation types discussed so far. The next instruction to be performed is the
one that immediately follows, in memory, the current instruction. However, a significant
fraction of the instructions in any program have as their function changing the sequence
of instruction execution. For these instructions, the operation performed by the CPU is to
update the program counter to contain the address of some instruction in memory.
INSTRUCTION FORMATS:-
A stack based computer do not use address field in instruction.To evaluate a expression
first it is converted to revere Polish Notation i.e. Post fix Notation.
Expression: X = (A+B)*(C+D)
Post fixed : X = AB+CD+*
TOP means top of stack
M[X] is any memory location
PUSH A TOP = A
PUSH B TOP = B
PUSH C TOP = C
PUSH D TOP = D
This use a implied ACCUMULATOR register for data manipulation. One operand is in
accumulator and other is in register or memory location. Implied means that the CPU
already knows that one operand is in accumulator so there is no need to specify it.
Expression: X = (A+B)*(C+D)
AC is accumulator
M[] is any memory location
M[T] is temporary location
LOAD A AC = M[A]
ADD B AC = AC + M[B]
STORE T M[T] = AC
LOAD C AC = M[C]
ADD D AC = AC + M[D]
MUL T AC = AC * M[T]
STORE X M[X] = AC
Expression: X = (A+B)*(C+D)
R1, R2 are registers
M[] is any memory location
MOV R2, C R2 = C
ADD R2, D R2 = R2 + D
MUL R1, R2 R1 = R1 * R2
MOV X, R1 M[X] = R1
This has three address fields to specify a register or a memory location. Program created
are much short in size but number of bits per instruction increase. These instructions
make creation of program much easier but it does not mean that program will run much
faster because now instruction only contain more information but each micro operation
(changing content of register, loading address in address bus etc.) will be performed in
one cycle only.
Expression: X = (A+B)*(C+D)
R1, R2 are registers
M[] is any memory location
ADDRESSING MODES
The addressing modes is a really important topic to be considered in
microprocessor or computer organization. The addressing modes in computer
architecture actually define how an operand is chosen to execute an
instruction. It is the way that is used to identify the location of an operand which
is specified in an instruction.
Whenever an instruction executes, it requires operands to be operated on. An
instruction field consisting of opcode and operand. Where operand means the data
and opcode means the instruction itself. In case of operations like addition or
subtraction, they require two data. So, they are called binary instruction. On the
other hand, the increment or decrement operations need only one data and are so
called unary instruction. Now the question is how these data can be obtained.
Implicit
Immediate
Direct
Indirect
Register
Register Indirect
Displacement
o Relative
o Base register
o Indexing
Stack
Advantages
Register File
A register file is a means of memory storage within a computer's central
processing unit (CPU). The computer's register files contain bits of data and
mapping locations. These locations specify certain addresses that are input
components of a register file. Other inputs include data, a read and write function
and execute function.
When a user installs a program on a computer, that software application writes a
register file on the CPU. Most software programs will contain more than one file.
Those files contain execution instructions that the CPU follows when the user
launches and uses the application. A register file also lets the CPU know where
program is located and what data is needed to perform certain functions.
Decoders are a part of a register file. When data is extracted from a register, the
computer's hard drive refers back to the bits that are contained in the file. Part of
the extraction process involves reading and deciphering the data bits that are
contained in the register. Once a program completes a function, it may write a code
or message indicating the results of the operation.
Register files utilize one of two technologies related to memory. The first is known
as static random access memory, or SRAM. With static random access memory
there are several bits of memory that are labeled according to binary code. The
status of each memory bit is labeled with a zero or one, indicating an active or
inactive state.
A second type of register memory is dynamic random access memory, or DRAM.
Each section of memory contains a capacitor and transistor. Data values are
equated with different charges and must be constantly updated by the memory
chip. The update or "refreshing" will typically take up to 2 percent of the total
processing time.
There are two components to the memory chip's ability to process data. They
include cycle and access time. The cycle time is the lowest amount of time that
occurs between data requests. Access time is the amount of seconds or minutes
it takes for the CPU to request data from a register file and the time it takes to
actually receive that information.
While SRAM is usually used with memory caches, its cycle time and access time
are the same. With DRAM technology, the cycle time is typically longer than its
access time. This is because memory reads or extractions involve a destroy and
re-write process.
1. Accumulator: This is the most common register, used to store data taken out
from the memory.
o MAR: Memory Address Register are those registers that holds the
address for memory unit.
o MBR: Memory Buffer Register stores instruction and data received from
the memory and sent from the memory.
Register Transfer
R2 ← R1
Instruction Cycle
A program residing in the memory unit of a computer consists of a sequence of
instructions. These instructions are executed by the processor by going through a cycle
for each instruction.
Step 1: The address in the program counter is transferred to the Memory Address
Register(MAR), as this is the only register that is connected to the system bus address
lines.
Step 2: The address in MAR is put on the address bus, now a Read order is provided
by the control unit on the control bus, and the result appears on the data bus and is then
copied into the memory buffer register. Program counter is incremented by one, to get
ready for the next instruction. These two acts can be carried out concurrently to save
time.
Example
ADD R , X
T1: MAR (IR(address))
T2: MBR Memory
T3: R (R) + (MBR)
Step 1: The address portion of IR is loaded into the MAR.
Step 2: The address field of the IR is updated from the MBR, so the reference
memory location is read.
Step 3: Now, the contents of R and MBR are added by the ALU.
Hardwired Control
Hardwired Control Unit is implemented using various electronic components such
as combinational logic units and gates. The circuit uses a fixed architecture. If the
instruction set is changed, the wiring should also be changed. As it is hardwired,
the instruction set is constant and does not change. Therefore, a Hardwired
Control Unit is used in processors that use simple instruction set known as the
Reduced Instruction Set Computers (RISC).
Usually, these control units execute faster. However, Hardwired Control Units are
difficult to modify and implement. It is also difficult to add new features to the
existing design. Therefore, it has minimum flexibility.
Memory system
A memory unit is the collection of storage units or devices together. The memory unit
stores the binary information in the form of bits. Generally, memory/storage is classified
into 2 categories:
Volatile Memory: This loses its data, when power is switched off.
Non-Volatile Memory: This is a permanent storage and does not lose any data
when power is switched off.
Memory Hierarchy
Main Memory
The memory unit that communicates directly within the CPU, Auxillary memory and
Cache memory, is called main memory. It is the central storage unit of the computer
system. It is a large and fast memory used to store data during computer operations. Main
memory is made up of RAM and ROM, with RAM integrated circuit chips holing the major
share.
o SRAM: Static RAM, has a six transistor circuit in each cell and retains data,
until powered off.
o NVRAM: Non-Volatile RAM, retains its data, even when turned off.
Example: Flash memory.
ROM: Read Only Memory, is non-volatile and is more like a permanent storage for
information. It also stores the bootstrap loader program, to load and start the
operating system when computer is turned on. PROM(Programmable
ROM), EPROM(Erasable PROM) and EEPROM(Electrically Erasable PROM) are
some commonly used ROMs.
Auxiliary Memory
Devices that provide backup storage are called auxiliary memory. For
example: Magnetic disks and tapes are commonly used auxiliary devices. Other
devices used as auxiliary memory are magnetic drums, magnetic bubble memory
and optical disks.
It is not directly accessible to the CPU, and is accessed using the Input / Output
channels.
Cache Memory
The data or contents of the main memory that are used again and again by CPU
are stored in the cache memory so that we can easily access that data in shorter
time.
Whenever the CPU needs to access memory, it first checks the cache memory. If
the data is not found in cache memory then the CPU moves onto the main memory.
It also transfers block of recent data into the cache and keeps on deleting the old
data in cache to accommodate the new one.
Hit Ratio
Main Memory
The main memory acts as the central storage unit in a computer system. It is a relatively
large and fast memory which is used to store programs and data during the run time
operations.
The primary technology used for the main memory is based on semiconductor integrated
circuits. The integrated circuits for the main memory are classified into two major units.
The primary compositions of a static RAM are flip-flops that store the binary information.
The nature of the stored information is volatile, i.e. it remains valid as long as power is
applied to the system. The static RAM is easy to use and takes less time performing read
and write operations as compared to dynamic RAM.
The dynamic RAM exhibits the binary information in the form of electric charges that are
applied to capacitors. The capacitors are integrated inside the chip by MOS transistors.
The dynamic RAM consumes less power and provides large storage capacity in a single
memory chip.
RAM chips are available in a variety of sizes and are used as per the system requirement.
The following block diagram demonstrates the chip interconnection in a 128 * 8 RAM chip.
A 128 * 8 RAM chip has a memory capacity of 128 words of eight bits (one byte)
per word. This requires a 7-bit address and an 8-bit bidirectional data bus.
The 8-bit bidirectional data bus allows the transfer of data either from memory to
CPU during a read operation or from CPU to memory during a write operation.
The read and write inputs specify the memory operation, and the two chip select
(CS) control inputs are for enabling the chip only when the microprocessor selects
it.
The bidirectional data bus is constructed using three-state buffers.
The output generated by three-state buffers can be placed in one of the three
possible states which include a signal equivalent to logic 1, a signal equal to logic
0, or a high-impedance state.
Note: The logic 1 and 0 are standard digital signals whereas the high-impedance state behaves like
an open circuit, which means that the output does not carry a signal and has no logic significance.
The following function table specifies the operations of a 128 * 8 RAM chip.
From the functional table, we can conclude that the unit is in operation only when CS1 =
1 and CS2 = 0. The bar on top of the second select variable indicates that this input is
enabled when it is equal to 0.
ROM integrated circuit
The primary component of the main memory is RAM integrated circuit chips, but a portion
of memory may be constructed with ROM chips.
A ROM memory is used for keeping programs and data that are permanently resident in
the computer.
Apart from the permanent storage of data, the ROM portion of main memory is needed
for storing an initial program called a bootstrap loader. The primary function of
the bootstrap loader program is to start the computer software operating when power is
turned on.
ROM chips are also available in a variety of sizes and are also used as per the system
requirement. The following block diagram demonstrates the chip interconnection in a 512
* 8 ROM chip.
A ROM chip has a similar organization as a RAM chip. However, a ROM can only
perform read operation; the data bus can only operate in an output mode.
The 9-bit address lines in the ROM chip specify any one of the 512 bytes stored in
it.
The value for chip select 1 and chip select 2 must be 1 and 0 for the unit to operate.
Otherwise, the data bus is said to be in a high-impedance state.
Types of Read Only Memory (ROM) –
1. PROM (Programmable read-only memory) –
It can be programmed by the user. Once programmed, the data and instructions in it
cannot be changed.
2. EPROM (Erasable Programmable read only memory) –
It can be reprogrammed. To erase data from it, expose it to ultraviolet light. To
reprogram it, erase all the previous data.
3. EEPROM (Electrically erasable programmable read only memory) –
The data can be erased by applying an electric field, with no need for ultraviolet light.
We can erase only portions of the chip.
4. MROM(Marked ROM) –
The very first ROMs were hard-wired devices that contained a pre-programmed set
of data or instructions. These kinds of ROMs are known as masked ROMs, which
are inexpensive.
Interleaved Memory
Interleaved memory is designed to compensate for the relatively slow speed of dynamic
random-access memory (DRAM) or core memory by spreading memory addresses
evenly across memory banks. In this way, contiguous memory reads and writes use each
memory bank, resulting in higher memory throughput due to reduced waiting for memory
banks to become ready for the operations.
It is an abstraction technique that divides memory into many modules such that
successive words in the address space are placed in different modules.
Suppose we have 4 memory banks, each containing 256 bytes, and then the Block
Oriented scheme (no interleaving) will assign virtual addresses 0 to 255 to the first bank
and 256 to 511 to the second bank. But in Interleaved memory, virtual address 0 will be
with the first bank, 1 with the second memory bank, 2 with the third bank and 3 with the
fourth, and then 4 with the first memory bank again.
Hence, the CPU can access alternate sections immediately without waiting for memory
to be cached. There are multiple memory banks that take turns for the supply of data.
In the above example of 4 memory banks, data with virtual addresses 0, 1, 2 and 3 can
be accessed simultaneously as they reside in separate memory banks. Hence we do not
have to wait to complete a data fetch to begin the next operation.
An interleaved memory with n banks is said to be n-way interleaved. There are still two
banks of DRAM in an interleaved memory system, but logically, the system seems one
bank of memory that is twice as large.
In the interleaved bank representation below with 2 memory banks, the first long word of
bank 0 is flowed by that of bank 1, followed by the second long word of bank 0, followed
by the second long word of bank 1 and so on.
The following image shows the organization of two physical banks of n long words. All
even long words of the logical bank are located in physical bank 0, and all odd long words
are located in physical bank 1.
For example, we can access all four modules at the same time, thus achieving parallelism.
The data can be acquired from the module using the higher bits. This method uses
memory effectively.
In high order memory interleaving, the most significant bits of the memory address
decides memory banks where a particular location resides. But, in low order interleaving
the least significant bits of the memory address decides the memory banks.
The least significant bits are sent as addresses to each chip. One problem is that
consecutive addresses tend to be in the same chip. The maximum rate of data transfer
is limited by the memory cycle time. It is also known as Memory Banking.
2. Low order interleaving: The least significant bits select the memory bank (module) in
low-order interleaving. In this, consecutive memory addresses are in different memory
modules, allowing memory access faster than the cycle time.
Benefits of Interleaved Memory
An instruction pipeline may require instruction and operands both at the same time from
main memory, which is not possible in the traditional method of memory access. Similarly,
an arithmetic pipeline requires two operands to be fetched simultaneously from the main
memory. So, to overcome this problem, memory interleaving comes to resolve this.
Levels of memory:
Level 1 or Register –
It is a type of memory in which data is stored and accepted that are immediately
stored in CPU. Most commonly used register is accumulator, Program counter,
address register etc.
Level 2 or Cache memory –
It is the fastest memory which has faster access time where data is temporarily
stored for faster access.
Level 3 or Main Memory –
It is memory on which computer works currently. It is small in size and once power
is off data no longer stays in this memory.
Level 4 or Secondary Memory –
It is external memory which is not as fast as main memory but data stays
permanently in this memory.
Cache Performance:
When the processor needs to read or write a location in main memory, it first checks for
a corresponding entry in the cache.
If the processor finds that the memory location is in the cache, a cache hit has
occurred and data is read from cache
If the processor does not find the memory location in the cache, a cache miss has
occurred. For a cache miss, the cache allocates a new entry and copies in data from
main memory, then the request is fulfilled from the contents of the cache.
The performance of cache memory is frequently measured in terms of a quantity
called Hit ratio.
Hit ratio = hit / (hit + miss) = no. of hits/total accesses
We can improve Cache performance using higher cache block size, higher associativity,
reduce miss rate, reduce miss penalty, and reduce the time to hit in the cache.
Cache Mapping:
There are three different types of mapping used for the purpose of cache memory which
are as follows: Direct mapping, Associative mapping, and Set-Associative mapping.
These are explained below.
1. Direct Mapping –
The simplest technique, known as direct mapping, maps each block of main memory
into only one possible cache line. or
In Direct mapping, assign each memory block to a specific line in the cache. If a line
is previously taken up by a memory block when a new block needs to be loaded, the
old block is trashed. An address space is split into two parts index field and a tag field.
The cache is used to store the tag field whereas the rest is stored in the main memory.
Direct mapping`s performance is directly proportional to the Hit ratio.
i = j modulo m
where
i=cache line number
j= main memory block number
m=number of lines in the cache
For purposes of cache access, each main memory address can be viewed as
consisting of three fields. The least significant w bits identify a unique word or byte
within a block of main memory. In most contemporary machines, the address is at the
byte level. The remaining s bits specify one of the 2s blocks of main memory. The
cache logic interprets these s bits as a tag of s-r bits (most significant portion) and a
line field of r bits. This latter field identifies one of the m=2r lines of the cache.
2. Associative Mapping –
In this type of mapping, the associative memory is used to store content and addresses
of the memory word. Any block can go into any line of the cache. This means that the
word id bits are used to identify which word in the block is needed, but the tag becomes
all of the remaining bits. This enables the placement of any word at any place in the
cache memory. It is considered to be the fastest and the most flexible mapping form.
3. Set-associative Mapping –
This form of mapping is an enhanced form of direct mapping where the drawbacks of
direct mapping are removed. Set associative addresses the problem of possible
thrashing in the direct mapping method. It does this by saying that instead of having
exactly one line that a block can map to in the cache, we will group a few lines together
creating a set. Then a block in memory can map to any one of the lines of a specific
set..Set-associative mapping allows that each word that is present in the cache can
have two or more words in the main memory for the same index address. Set
associative cache mapping combines the best of direct and associative cache
mapping techniques.
In this case, the cache consists of a number of sets, each of which consists of a
number of lines. The relationships are
m=v*k
i= j mod v
where
i=cache set number
j=main memory block number
v=number of sets
m=number of lines in the cache number of sets
k=number of lines in each set
Application of Cache Memory –
1. Usually, the cache memory can store a reasonable number of blocks at any given
time, but this number is small compared to the total number of blocks in the main
memory.
2. The correspondence between the main memory blocks and those in the cache is
specified by a mapping function.
Types of Cache –
Primary Cache –
A primary cache is always located on the processor chip. This cache is small and
its access time is comparable to that of processor registers.
Secondary Cache –
Secondary cache is placed between the primary cache and the rest of the memory.
It is referred to as the level 2 (L2) cache. Often, the Level 2 cache is also housed
on the processor chip.
VIRTUAL MEMORY
Virtual memory is the separation of logical memory from physical memory. This
separation provides large virtual memory for programmers when only small physical
memory is available.
Virtual memory is used to give programmers the illusion that they have a very large
memory even though the computer has a small main memory. It makes the task of
programming easier because the programmer no longer needs to worry about the amount
of physical memory available.
A virtual memory can be configured using any of the following technique:
1. Paging technique
2. Segmentation technique
PAGING:-
Each process is divided into parts where size of each part is same as page size.
The size of the last part may be less than the page size.
The pages of process are stored in the frames of main memory depending upon their
availability.
Example-
Consider a process is divided into 4 pages P0, P1, P2 and P3.
Depending upon the availability, these pages may be stored in the main memory frames
in a non-contiguous fashion as shown-
Translating Logical Address into Physical Address-
CPU always generates a logical address.
Page Number specifies the specific page of the process from which CPU wants to
read the data.
Page Offset specifies the specific word on the page that CPU wants to read.
Step-02:
For the page number generated by the CPU,
Page Table provides the corresponding frame number (base address of the frame)
where that page is stored in the main memory.
Step-03:
The frame number combined with the page offset forms the required physical address.
Frame number specifies the specific frame where the required page is stored.
Page Offset specifies the specific word that has to be read from that page.
Diagram-
The following diagram illustrates the above steps of translating logical address into
physical address-
Advantages-
It allows to store parts of a single process in a non-contiguous fashion.
It solves the problem of external fragmentation.
Disadvantages-
It suffers from internal fragmentation.
There is an overhead of maintaining a page table for each process.
The time taken to fetch the instruction increases since now two memory accesses are
required.
SEGMENTATION-
Like Paging, Segmentation is another non-contiguous memory allocation technique.
In segmentation, process is not divided blindly into fixed size pages.
Rather, the process is divided into modules for better visualization.
Characteristics-
Segmentation is a variable size partitioning scheme.
In segmentation, secondary memory and main memory are divided into partitions of
unequal size.
The size of partitions depends on the length of modules.
The partitions of secondary memory are called as segments.
Example-
Consider a program is divided into 5 segments as-
Segment Table-
Segment table is a table that stores the information about each segment of the process.
It has two columns.
First column stores the size or length of the segment.
Second column stores the base address or starting address of the segment in the main
memory.
Segment table is stored as a separate segment in the main memory.
Segment table base register (STBR) stores the base address of the segment table.
Segment Number specifies the specific segment of the process from which CPU wants
to read the data.
Segment Offset specifies the specific word in the segment that CPU wants to read.
Step-02:
For the generated segment number, corresponding entry is located in the segment table.
Then, segment offset is compared with the limit (size) of the segment.
Now, two cases are possible-
Case-01: Segment Offset >= Limit
If segment offset is found to be greater than or equal to the limit, a trap is generated.
Case-02: Segment Offset < Limit
If segment offset is found to be smaller than the limit, then request is treated as a valid
request.
The segment offset must always lie in the range [0, limit-1],
Then, segment offset is added with the base address of the segment.
The result obtained after addition is the address of the memory location storing the
required word.
Diagram-
The following diagram illustrates the above steps of translating logical address into
physical address-
Advantages-
It allows to divide the program into modules which provides better visualization.
Segment table consumes less space as compared to Page Table in paging.
It solves the problem of internal fragmentation.
Disadvantages-
There is an overhead of maintaining a segment table for each process.
The time taken to fetch the instruction increases since now two memory accesses are
required.
Segments of unequal size are not suited for swapping.
It suffers from external fragmentation as the free space gets broken down into smaller
pieces with the processes being loaded and removed from the main memory.
Advantages of Virtual Memory
1. The degree of Multiprogramming will be increased.
2. User can run large application with less real RAM.
3. There is no need to buy more memory RAMs.
For Example: A keyboard and mouse provide Input to the computer are called input
devices while a monitor and printer that provide output to the computer are called output
devices. Just like the external hard-drives, there is also availability of some peripheral
devices which are able to provide both input and output.
The I/O bus includes data lines, address lines, and control lines. In any general-purpose
computer, the magnetic disk, printer, and keyboard, and display terminal are commonly
employed. Each peripheral unit has an interface unit associated with it. Each interface
decodes the control and address received from the I/O bus.
It can describe the address and control received from the peripheral and supports signals
for the peripheral controller. It also conducts the transfer of information between
peripheral and processor and also integrates the data flow.
The I/O bus is linked to all peripheral interfaces from the processor. The processor locates
a device address on the address line to interact with a specific device. Each interface
contains an address decoder attached to the I/O bus that monitors the address lines.
When the address is recognized by the interface, it activates the direction between the
bus lines and the device that it controls. The interface disables the peripherals whose
address does not equivalent to the address in the bus.
Control − A command control is given to activate the peripheral and to inform its
next task. This control command depends on the peripheral, and each peripheral
receives its sequence of control commands, depending on its mode of operation.
Status − A status command can test multiple test conditions in the interface and
the peripheral.
Data Output − A data output command creates the interface counter to the
command by sending data from the bus to one of its registers.
Data Input − The data input command is opposite to the data output command. In
data input, the interface gets an element of data from the peripheral and places it
in its buffer register.
These components are called Interface Units because they interface between the
processor bus and the peripheral devices. I/O BUS and Interface Module It define the
typical link between the processor and several peripherals.
To communicate with I/O, the processor must communicate with the memory unit. Like
the I/O bus, the memory bus contains data, address and read/write control lines. There
are 3 ways that computer buses can be used to communicate with memory and I/O:
i. Use two Separate buses , one for memory and other for I/O.
ii. Use one common bus for both memory and I/O but separate control lines for each.
iii. Use one common bus for memory and I/O with common control lines.
Functions of Input-Output Interface:
Multiplexed - The address and data lines are sent on the same physical cable but
at different timings.
Width – The number of lines that carry Data and Address. More width on address
lines increases the addressing range. More width on data lines increases Data
Bandwidth.
Access Protocol - Synchronous or Asynchronous
Synchronous Asynchronous
All activities in the bus are synchronized to the The bus transfers information based on
clock and at predefined clock cycles handshake protocol.
Synchronous Asynchronous
Useful when all the devices in the bus are in Since there is a handshake, any device may
the same speed range. interact with any other device.
Internal system bus is an example of Generally used for slow speed device
synchronous bus. communications.
Arbitration - The protocol to gain access to the bus amongst the eligible
competing devices is called bus arbitration. During the bus operation, there is a
Bus Master (Initiator) and a Bus Slave (Responder). Thus a device wanting to
initiate communication has to become bus master. Always, the communication
happens between two devices. Since there are more devices connected on a bus,
there is a bus arbitrator and an identified arbitration mechanism.
Clock Rate – The speed of the bus is determined by the Synchronous clock. It is
a design decision to fix the clock rate.
There are few other physical parameters like how the wires are to be run, what type of
connectors to be used, length of the wires, etc. The electrical characteristics define the
voltage levels of the signals and the power source.
I/O is also a Slave component in the computer. An I/O operation is initiated by CPU and
the I/O controllers take care of transferring and completing the I/O operation.
I/O Controllers
The devices operate at wide-ranging data transfer speed and also many different
interface standards. The Keyboard, Mouse have very small data rates and are
asynchronous in data transfer to computer. Disk, Solid State Disks have high data
rates. USB has a mediocre data rate. And we know each one has a different
connector and interface standard.
It is overloading on the part of CPU to deal with these devices directly. I/O
controllers play a bridging role between CPU, Memory and I/O Device by taking
care of all kinds of communication.
Due to heterogeneity of the devices, each device /type of interface requires an I/O
Controller
I/O controllers also act as a buffer during data transfer
To do the above, each IO Controller will typically have Data Register(s), Status
Register(s), Control Register(s), Address decoding logic and Control Circuitry. The I/O
Controller is connected to the system bus. Whenever the I/O controller wants to use the
bus, it has to contend and obtain. All communication from the CPU and Memory happens
via these registers shown in the diagram. These registers are given a unique address for
each I/O controller.
I/O Controller
The address decoding logic is connected to the address bus. The value on the address
bus indicates the register to be accessed. The decoding logic converts this as an address
selection signal to one of these registers.
The control circuitry is connected to the control signals of the system bus. These signals
are MEMW, MEMR, IOR, IOW, INTERRUPT, BREQ, etc. These signals ensure the
synchronization and validation of address and Data on the bus, demanding the Bus for
data transfer, sending interrupt after the normal or abnormal end operation.
The Data Registers take care of Data Transfer. There may be Data In and/or Data Out
Registers depending on the device. Also, fast devices like Disk will have a Buffer so that
the fast bulk data from the disk is stored and then sent to Memory when the system bus
is available.
The Control Register has information like:
The registers are assigned port numbers and accessed via special instructions namely
IN and OUT. This is also Intel’s method. IN is for reading from the Data or Status
Registers. OUT is for writing onto the DATA or Command Registers.
Memory-Mapped I/O
Portions of the Memory address space are assigned to I/O device. This method reduces
the number of control signals required on the System Bus for READ/WRITE.
Exclusive I/O Address space for I/O addressing Part of memory address space is reserved for I/O
which are called port addresses addressing
Total addressable space is available for Available memory is addressable memory space
Memory less the reserved space I/O addressing
On the Control bus, I/O or Memory operation is On the Control bus, there is only one set of control
differentiated by a separate set of control signals i.e. MEMR/MEMW. Memory or I/O
signals like IOR/IOW, MEMR/MEMW. operation is differentiated by decoding the
address.
The number of lines required for addressing I/O The number of lines used for addressing is equal
ports is less. Hence address decoding logic on to the maximum addressable space; hence
the I/O controllers is smaller and simpler. address decoding logic on the I/O controller is
larger.
Separate I/O instructions like IN/OUT are used No such separate instructions used; the same set
for I/O communication. of STORE/LOAD instructions that are used for
I/O Mapped I/O Memory-Mapped I/O
More opcodes needed in the CPU Does not increase the opcodes.
Thus, I/O data transfer is about how the three subsystems i.e. CPU, Memory and I/O
Controller, are involved in achieving the data exchange with peripherals. The word Data
Exchange means successful Data in/Data Out with necessary handshake and
coordination.
Data Exchange with a peripheral involves few steps and not straight forward as devices
are mostly electromechanical and/or operate at a different speed than CPU and Memory.
The device to be initiated and checked for status whether it is ready for Data
Exchange.
The data transfer has to happen at the acceptable capsule and speed of the device
using the applicable protocol.
This may happen in more than one sequence, as necessitated.
If the device status is an Error status, it is to be suitably handled.
I/O Controller's role is to ensure seamless data exchange by ensuring the following :
Processor/Memory Communication
Device Communication
Control timing and Status processing – Coordination of the data traffic between
CPU, Memory and Io devices
Data Buffering – To manage the data transfer speed mismatch
Error management ( Transmission errors are corrected other status errors are
communicated)
Programmed I/O
Interrupt driven I/O
Direct Memory Access
Programmed I/O
A program controls the I/O Operation; hence CPU is fully involved. CPU Monitors the
status of the device and the I/O data transfer. This keeps the CPU hooked until the I/O
operation is completed for the desired number of bytes. In the case of READ from I/O
device, the final destination for Data is Memory. CPU writes the data collected from the
device into Memory. It happens vice versa in the case of WRITE onto the device.
Programmed I/O is used for tiny data transfer i.e. few bytes. Before an actual Data transfer
to a device, selecting, readying the device and verifying the status are to be done. Ex:
Selecting the Printer and initialization before printing. This is shown by the decision box
loop in figure 20.2.
In the diagram, at each step, the direction of information flow between the subsystems is
marked to reinforce your understanding.
The CPU loop during status check is eliminated in the Interrupt Driven I/O Operation.
The CPU initiates the command on the device and takes up other tasks.
Once the device is ready with data, the I/O controller generates an Interrupt.
On recognizing the pending Interrupt signal from I/O controller, the CPU
temporarily suspends the ongoing execution, goes in for interrupt service routine
and collects the data and stores in memory or vice versa and then resumes the
suspended execution. In this mode also, it is the CPU which interacts with memory
for storing the device data.
Most of the devices are electromechanical and have high latency. The period for which
the CPU is freed is proportional to this. Hence the benefit achieved is reasonable. The
interrupt is asynchronous. The I/O controller waits until the interrupt is serviced by CPU.
Thus, what we have achieved is the I/O controller is made to wait instead of the CPU.
CPU time is more precious. Details of Interrupt servicing will be dealt with in the next
chapter.
Interrupt service routine is also a complex and costly matter involving OS. This method is
also feasible for a small quantum of data transfer. Direct Memory access can resolve the
CPU availability in a hardware environment.
The CPU passes necessary information to DMA and goes on with any other work.
The DMA takes care of data transfer to Memory.
On completion, DMA generates an Interrupt to inform the CPU of the job done.
Note that the CPU as a boss gives the work and informed of Job done but not involved
for doing the job. It is independently taken care of by DMAC.
The DMA method improvises the interrupt method with optimal efficiency as the data is
not routed via CPU. Thus DMA unburdens CPU.
The sequence of events described below clarifies how an end to end data transfer
happens between I/O device and Memory using DMA.
CPU delegates the responsibility of data transfer to DMA by sending the following
details:
o A device on which I/O to be carried out
o What is the command (R/W) to be carried out
o The starting address of Memory location for Data Transfer
o Length of the Data Transfer (Byte Count) - The first two information is given
to the device controller while the last two information is stored in the channel
register in DMAC.
The I/O controller initiates the necessary actions with the device and requests
DMAC when it is ready with data.
DMAC raises the HOLD signal to CPU conveying its intention to take the System
bus for data transfer.
At the end of the current machine cycle, the CPU disengages itself from the system
bus. Then, the CPU responds with a HOLD ACKNOWLEDGE signal to DMAC,
indicating that the system bus is available for use.
DMAC places the memory address on the address bus, the location at which data
transfer is to take place.
A read or write signal is then generated by the DMAC, and the I/O device either
generates or latches the data. Then DMA transfers data to memory.
A register is used as a byte count, decremented for each byte transferred to
memory.
Increment the memory address by the number of bytes transferred and generate
new memory address for the next memory cycle.
Upon the byte count reaches zero, the DMAC generates an Interrupt to CPU.
As part of the Interrupt Service Routine, the CPU collects the status of Data
transfer.
Cycle Stealing: The DMAC steals the system bus (Memory cycles) from the CPU, during
its non-memory machine cycles like Decode or execute, to transfer a byte/word to/from
memory. Cycle stealing is possible when there is a separate arbitrator logic for the system
bus. It is to be noted that, in the case of HOLD and HOLD ACK sequence, CPU acts as
bus arbitrator. The DMAC steals the bus for every transaction, hence the name cycle
stealing.
Burst Mode: Once the DMAC takes the bus, it completes the data transfer with the
necessary number of cycles. Holds the system bus until the Byte Count reaches zero,
after which it releases the bus. The CPU is halted during the data transfer. Since the
intended block is transferred in one bus access, Burst mode is also called Block Transfer
Mode. In this case, the DMAC is assigned an equal priority with CPU.
Transparent or Hidden Mode: There are some internal states during which the CPU
frees the bus, disengaging itself. At these times, DMA takes up data transfers to memory.
In this mode, the DMA is assigned the lowest priority, rather it never contends for the bus.
The CPU speed is never affected by DMA. Not only DMA requires extra logic to detect
the floating bus but also CPU requires extra logic to indicate its status. The throughput of
DMA is poor and the data transfer is slowest of all the modes. In the below figure explains
when the CPU can recognise DMA or Interrupt. These points are called breakpoints.
Breakpoints in CPU
DMA Controller Functional Components
It has all the functional components of any I/O controller. Since DMA communicates on
the system bus, required additional logic is in-built. The bus control logic generates R/W
signal and Interrupt. Depending on the DMA data transfer mode, bus arbitration signals
are generated. The bus arbitration signals are DMAHOLD or Bus Request, DMA HOLD
ACK or BUS GRANT as the case may be.
Byte Count Register is an important register which holds and tracks the byte transfer
taking place with Memory. CPU sets the value. It is decremented for every byte
transferred and the Memory address register is updated to show the next location for data
transfer. When the byte count becomes zero, it indicates End of Data Transfer. This status
is used for generating an interrupt signal to CPU.
Buffered Data Transfer takes place at the device end. The required control signals are
generated. This part is very much similar to I/O Controller.
Generally, DMA has more than one channel to facilitate more I/O devices connection. In
such a case, we will have duplicity of Memory Address Register, Data Buffer, Status and
Control Registers and Byte Count register. The interface logics remain common.
Advantages of DMAC
Certainly speeds up the data transfer to memory with minimal bus holding
Relieves the CPU for its task
In the case of DMAC, the CPU is involved to initiate every single IO Operation. Although
the DMA method is better than the Programmed IO and Interrupt driven IO, overall system
performance can be improved if there is a design methodology which will take care of
total I/O with simple initiation from CPU. The IO Processor (IOP) design supports this
requirement.
I/O Processors
I/O processors (IOP) are an extension to the DMA concept of I/O Operation. As the name
implies, I/O processors have some processing capacity for serving the device controllers
connected to it. IOPs can decode and execute I/O instruction. These instructions are
brought from Main Memory by the IOP. Data transfer happens with Memory directly. Thus
an IOP has a fair amount of control over IO Operations with very minimal initiation from
CPU.
IOPs are also called Peripheral Processing Unit (PPU). I/O Channels is a name used
in Mainframe to mean IOP.
A System configuration with IOP
Characteristics of IOP
IO Program
A typical IO program is much like any other program, except that it has IO Instructions to
be executed on a select device.
IO Program in Memory
An I/O instruction received by IOP from memory as part of the IO Program has four parts
as below:
Any data transfer operation on DISK is preceeded by a SEEK command to position the
heads on the place where the file is located. Command chaining helps.
Also if data transfer is to happen from more than one area of disk, chaining of SEEK +
data transfer is done easily with IO program.
Many such examples in the case of printer, tape and other devices can be given.
For CPU it is too costly to get involved in these scenario. Thus, IOP improves the IO and
CPU performance.
Interrupts Categorization
Template of ISR
In the case of I/O interrupt, the cause is analysed in the ISR. Any remedial action to be
taken is also part of the ISR. For example, if an interrupt is caused for informing an error
in the device, the same needs to be informed to the user and any possible retry to be
done for recovery from the error.
Context switching
Essentially, an Interrupt alters the flow of the program execution. Context switching is
about the CPU taking necessary steps to store the current status of the CPU so that on
return the CPU status is restored for resumption. Context switching helps the CPU to
switch processes or tasks and is an essential feature supported by the Operating System.
Interrupt Identification
In the below figure the I/O Interrupt is conveyed to CPU by asserting the signal INTR.
When the CPU recognizes the INTR, it returns the signal INTR ACK as an
acknowledgement. From figure, you may understand that the line INTR is a summation
of interrupts from all the I/O controllers. The issue is how is the interrupting device
identified. The possibilities for a pending interrupt are
The hardware is designed to handle the second case of as many I/O controllers may raise
interrupts asynchronously but conveyed to CPU on a single line.
The CPU services all the interrupts one by one as it finds the chance to service the
interrupt.
Amongst the I/O controllers, Interrupt priority is assigned in the hardware. So the
highest priority one gets serviced first and cleared of pending interrupt. This
method is called Daisy Chaining. Generally, the slow speed device controllers are
assigned lower interrupt priority. Starvation may be a possibility. The INTR and
INT ACK signals are passed through all the controllers in the order of the pre-
assigned priority
Yet the question of which device is not answered. The identification is done by one
of the following methods
o Polling - As part of the ISR by reading the various status registers, the
interrupting device is identified. In this case, the I/O ISR always starts from
a particular common vector address. Polling is a Software method.
o Vectored Interrupts – Extra signal lines between CPU and I/O are run to
indicate the INT CODE (type code) of the interrupt. Every controller is
assigned an INT CODE. For example, if there are eight I/O controllers, 3
lines are used so that the INT CODE is presented in binary coded form. This
type code, as said earlier is useful in generating the vector for ISR.
Multiple Interrupts Handling – Two possibilities exist:
o Simultaneously more than one interrupt could be pending necessitating
some priority assignment and identification mechanism. We have just
discussed this case.
o During an ISR, another interrupt could come in. A decision to deal with (as
Nested Interrupt) or to defer (Masking the Interrupts) is required. Not all
interrupts are mask able.
Internal interrupts have higher priority over I/O interrupts. Internal interrupts are vectored
interrupts. In the case of software interrupts too, the instruction code will help identify the
ISR vector.
UNIT – 6
I/O INTERFACE & BUS ARCHITECTURE
Bus
Bus is a group of wires that connects different components of the computer. It is
used for transmitting data, control signal and memory address from one
component to another. A bus can be 8 bit, 16 bit, 32 bit and 64 bit. A 32 bit bus
can transmit 32 bit information at a time. A bus can be internal or external.
The bus in the computer is the shared transmission medium. This means multiple
components or devices use the same bus structure to transmit the information
signals to each other. At a time only one pair of devices can use this bus to
communicate with each other successfully. If multiple devices transmit the
information signal over the bus at the same time the signals overlap each other
and get jumbled.
Bus Structure
A system bus has typically from fifty to hundreds of distinct lines where each line is meant
for a certain function. These lines can be categories into three functional groups i.e., data
lines, address lines, and control lines.
1. Data Lines
Data lines coordinate in transferring the data among the system components. The
data lines are collectively called data bus. A data bus may have 32 lines, 64 lines,
128 lines, or even more lines. The number of lines present in the data bus defines
the width of the data bus.
Each data line is able to transfer only one bit at a time. So the number of data lines
in a data bus determines how many bits it can transfer at a time. The performance
of the system also depends on the width of the data bus.
2. Address Lines
The content of the address lines of the bus determines the source or destination
of the data present on the data bus. The number of address lines together is
referred to as address bus. The number of address lines in the address bus
determines its width.
The width of the address bus determines the memory capacity of the system. The
content of address lines is also used for addressing I/O ports.
The higher-order bits determine the bus module and the lower ordered bits
determine the address of memory locations or I/O ports.
Whenever the processor has to read a word from the memory it simply places the
address of the corresponding word on the address line.
3. Control Lines
The address lines and data lines are shared by all the components of the system
so there must some means to control the use and access of data and address
lines.
The control signals placed on the control lines. The control bus lines are used to
control access to address and data lines of the bus. The control signal consists of
the command and timing information.
Here the command in the control signal specify the operation that has to be
performed. And the timing information over the control signals specify till when the
data and address information is valid.
Memory Write: This command causes the data on the data bus to be placed over the
addressed memory location.
Memory Read: This command causes the data on the addressed memory location to be
placed on the data bus.
I/O Write: The command over this control line causes the data on the data bus to be
placed over the addressed I/O port.
I/O Read: The command over this control line causes the data from the addressed I/O
port to be placed over the data bus.
Transfer ACK: This control line indicates the data has been received from the data bus
or is placed over the data bus.
Bus Request: This control line indicates that the component has requested control over
the bus.
Bus Grant: This control line indicates that the bus has been granted to the requesting
component.
Interrupt Request: This control line indicates that interrupts are pending.
Interrupt ACK: This control line provides acknowledgment when the pending interrupt is
serviced.
Clock: This control line is used to synchronize the operations.
Reset: The bit information issued over this control line initializes all the modules.
During the transfer of data between two components, one component act as a master
and other act as a slave. The device initiating the data transfer is referred to
as master and usually, it is a processor, or sometimes it may be some other device or
component. The component addressed by the master component is referred to as
a slave.
The above figure shows the high performance integrated bus architecture. It is
similar to the traditional bus architecture, this too contains a local bus that connects
the processor to a cache controller.
This in turn is connected to the main memory through the system bus.
The cache controller is integrated into a bridge or buffering device that connects
to the high speed bus. This bus supports connections to high speed LANs, such
as fast Ethernet at 100 mbps, video & graphics work station controllers & interface
controllers to local peripheral controllers like SCSI & Fire Wire.
The Fire Wire is a high speed bus specially designed for high capacity I/O devices.
The lower speed devices are still supported by the expansion bus with an interface
buffering traffic between the expansion & high speed buses.
This way the high speed devices are more closely integrated with the processor through
the high speed bus & at the same time leaving the processor independent. Additionally
the speed differences between the processor, the high speed bus & the single line
connections are easily managed without pulling down the overall performance. Also the
processor & high speed bus become independent, making design changes easy.
Basic Parameters Of Bus Design
1) Bus Types
Bus lines can be classified into two generic types: dedicated & multiplexed.
Dedicated
A line is permanently assigned either to one function.
An example of functional dedication is the use of separate dedicated
address and data line.
Multiplexed
Using the same lines for multiple purpose.
Eg:- Address and data information may be transmitted over the same set of
lines.
At the beginning of the data transfer the address is placed on the bus and
the address valid line is activated.
The address is then remove from the same bus line is used for data
transfer.
Multiple buses are also called Physical Dedication, which connects each
module by a subset of bus lines.
2) Method of Arbitration
As there are the many modules connected to the buses, more than one module may need
control of the bus.
Centralized
A single hardware device called the bus controller or arbiter is responsible
for allocating time on the bus.
The device may be a separate or a part of a processor.
Distributed
There is no centralized controllers.
Each module contains assess control logic and the modules act together to
share the bus.
In both centralized & distributed schemes, one device, the processor or an I/O module, is
designated as master. The master will then initiate a data transfer to/ from some other
device, which will act as the slave.
3) Timing
Synchronous Timing
In this method of timing, a clock controls the occurrence of events on the
bus.
Bus includes a clock line upon which a clock transmits a regular sequence
of alternating 1's and 0's of equal duration.
A single 1-0 transition is referred to as a clock cycle or bus cycle.
All other devices on the bus can read the clock line.
All events start at the beginning of a clock cycle.
For read operation, a read command is issued at the start of the second
cycle. The addressed memory module places the data on the data lines
after a delay on one clock cycle.
For write operation, the processor puts the data on the data lines at the start
of the second cycle & once the data lines stabilize, issues a write command.
The memory module copies the information available on the data lines in
the third clock cycle.
Asynchronous Timing
The occurrence of one event on a bus follows and depends on the occurrence
of a previous event.
In this method, the processor places the address & the status signals on the
bus. After these signals stabilize, it issues a read command, signifying the
presence of valid address & control signals.
The addressed memory responds by placing the data on the data lines. After
the data lines stabilize, the memory module asserts the acknowledge line to
signal to the processor that the data are available.
4) Bus Width
The width of data bus has an impact on the data bus has an impact on the
system performance.
The wider data bus, the greater number of bit transferred at one time.
The wider address bus, the greater range of location that can be referenced.
All buses provide both write (master to slave) and read (slave to master) assigns.
1. The SCSI controller, acting as an initiator, contends for control of the bus.
2. When the 'initiator wins the arbitration process, it selects the target controller
and hands over control of the bus to it.
3. The target starts an output operation (from initiator to target); in response
to this, the initiator sends a command specifying the required read
operation.
4. The target, realizing that it first needs to perform a disk seek operation,
sends a message to the initiator indicating that it will temporarily suspend
the connection between them. Then it releases the bus.
5. The target controller sends a command to the disk drive to move the read
head to the first sector involved in the requested read operation. Then, it
reads the data stored in that sector and stores them in a data buffer. When
it is ready to begin transferring data to the initiator, the target requests
control of the bus. After it wins arbitration, it res elects the initiator controller,
thus restoring the suspended connection.
6. The target transfers the contents of the data buffer to the initiator and then
suspends the connection again. Data are transferred either 8 or 16 bits in
parallel, depending on the width of the bus.
7. The target controller sends a command to the disk drive to perform another
seek operation. Then, it transfers the contents of the second disk sector to
the initiator, as before. At the end of this transfer, the logical connection
between the two controllers is terminated.
8. As the initiator controller receives the data, it stores them into the main
memory using the DMA approach.
9. The SCSI controller sends an interrupt to the processor to inform it that the
requested operation has been completed.
Bus signals
The main phases involved in the operation of the SCSI bus are:
Arbitration
Selection
Information Transfer
Reselection
Arbitration
The bus is free when the -BSY signal is in the inactive (high voltage) state. Any controller
can request the use of the bus while it is in this state. Since two or more controllers may
generate such a request at the same time, an arbitration scheme must be implemented.
A controller requests the bus by asserting the -BSY signal and by asserting its associated
data line to identify itself. Each controller in the bus is assigned a fixed priority, with
controller 7 having highest priority. When -BSY becomes active, all controllers that are
requesting the bus examine the data lines and determine whether a higher priority device
is requesting the bus at the same time. The controller using the highest numbered line
realizes that it has won the arbitration.
Selection
Having won arbitration, controller 6 continues to assert -BSY and -DB6 (its address). It
indicates that it wishes to select controller 5 by asserting the -SEL and then the -DB5 line.
The selected target controller responds by asserting -BSY. This informs the initiator that
the connection it is requesting has been established, so that it may remove the address
information from data lines. The selection process is now complete, and the target
controller is asserting -BSY.
Information Transfer
The information transferred between two controllers may consist of commands from the
initiator to the target, status responses from the target to the initiator, or data being
transferred to or from the I/O device. Handshake signaling is used to control information
transfers. The target asserts -I/O during an input operation (target to initiator) and it
asserts -C/D to indicate that information is a command or status response. At the end of
the transfer, the target controller releases -BSY signal, thus freeing the bus for use by
other devices.
Reselection
When a logical connection is suspended and the target is ready to restore it, the target
must first gain control of the bus. It starts an arbitration cycle, and after winning arbitration,
it selects the initiator. The initiator is now asserting -BSY. The initiator waits for short
period after being selected to make sure that target has asserted -BSY, and then releases
the -BSY line. The connection between the two controllers has now been reestablished,
with the target in control of the bus as required for data transfer to proceed. The SCSI
standard defines the structure and contents of various types of packets that the controllers
exchange to handle different situations.
Port limitation:
Only a few ports are provided in a typical computer. To add new ports, a user must
open the computer box to gain access to the internal expansion bus and install a
new interface card.
The user may also need to know how to configure the device and the software.
An objective of the USB is to make it possible to add many devices to a computer
system at any time, without opening the computer box.
Device Characteristics:
The different kinds of devices may be connected to a computer cover a wide range
of functionality.
The speed, volume, and timing constraints associated with data transfers to and
from such devices vary significantly.
In the case of a keyboard, one byte of data is generated every time a key is
pressed, which may happen at any time. These data should be transferred to the
computer promptly.
Since the event of pressing a key is not synchronized to any other event in a
computer system, the data generated by the keyboard are called asynchronous.
Furthermore, the rate at which the data are generated is quite low. It is limited by
the speed of the human operator to about 100 bytes per second, which is less than
1000 bits per second.
Plug-and-play:
A serial transmission format has been chosen for the USB because a serial bus
satisfies the low-cost and flexibility requirements. Clock and data information are
encoded together and transmitted as a single signal. Hence, there are no
limitations on clock frequency or distance arising from data skew. Therefore, it is
possible to provide a high data transfer bandwidth by using a high clock frequency.
As pointed out earlier, the USB offers three bit rates, ranging from 1.5 to 480
megabits/s, to suit the needs of different I/O devices.
To accommodate a large number of devices that can be added or removed at any
time, the USB has the tree structure shown in Figure 5.33.
Each node of the tree has a device called a hub, which acts as an intermediate
control point between the host and the I/O devices. At the root of the tree, a root
hub connects the entire tree to the host computer. The leaves of the tree are the
I/O devices being served (for example, keyboard, speaker, or digital TV), which
are called functions in USB terminology.
The tree structure enables many devices to be connected while using only simple
point –to-point serial links. Each hub has a number of ports where devices may be
connected, including other hubs.
The USB enables the host to communicate with the I/O devices, but it does not
enable these devices to communicate with each other.
The tree makes it possible to connect a large no. of devices to a computer through
a few ports (the root hub) & each i/o devices is connected through a serial point-
to-point connection. This is an important consideration in facilitating the plug & play
feature.
The USB operates strictly on the basis of polling. A device may send a message
only in response to a poll message from the host. Hence, upstream messages do
not encounter conflicts or interfere with each other, as no two devices can send
messages at the same time. This restriction allows hubs to be simple, Low-cost
devices.
Consider the situation in Figure 5.34. Hub A is connected to the root hub by a high-
speed link. This hub serves one high-speed device, C, and one low-speed device,
D. Normally, a message to device D would be sent at low speed from the root hub.
At 1.5 megabits/s, even a short message takes several tens of microseconds.
For the duration of this message, no other data transfers can take place, thus
reducing the effectiveness of the high-speed links and introducing unacceptable
delays for high-speed devices.
The purpose of the USB software is to provide bidirectional communication links
between application software and I/O devices. These links are called pipes. Any
data entering at one end of a pipe is delivered at the other end. Issues such as
addressing, timing, or error detection and recovery are handled by the USB
protocols.
Addressing:
I/O devices are normally identified by assigning them a unique memory address.
In fact, a device usually has several addressable locations to enable the software
to send and receive control and status information and to transfer data.
When a USB is connected to a host computer, its root hub is attached to the
processor bus, where it appears as a single device. The host software
communicates with individual devices attached to the USB by sending packets of
information, which the root hub forwards to the appropriate device in the USB tree.
Each device on the USB, whether it is a hub or an I/O device, is assigned a 7 -bit
address. This address is local to the USB tree and is not related in any way to the
addresses used on the processor bus.
A hub may have any number of devices or other hubs connected to it, and
addresses are assigned arbitrarily. When a device is first connected to a hub, or
when it is powered on, it has the address 0.
The hardware of the hub to which this device is connected is capable of detecting
that the device has been connected, and it records this fact as part of its own status
information.
Periodically, the host polls each hub to collect status information and learn about
new devices that may have been added or disconnected.
USB Protocol:
All information transferred over the USB is organized in packets, where a packet
consists of one or more bytes of information.
The information transferred on the USB can be divided into two broad categories:
control and data.
Control packets perform such tasks as addressing a device to initiate data transfer,
acknowledging that data have been received correctly, or indicating an error. Data
packets carry information that is delivered to a device.
For example, input and output data are transferred inside data packets.
A packet consists of one or more fields containing different kinds of information.
The first field of any packet is called the packet identifier, PID, which identifies the
type of that packet. There are 4 bits of information in this field, but they are
transmitted twice. The first time they are sent with their true values, and the second
time with each bit complemented, as shown in Figure 5.35(a).
This enables the receiving device to verify that the PID byte has been received
correctly.
Electrical characteristics:
The cables used for USB connections consist of four wires. Two are used to carry power,
5 V and Ground. Thus, a hub or an I/O device may be powered directly from the bus, or
it may have its own external power connection. The other two wires are used to carry
data.
UNIT – 7
PARALLEL PROCESSING
Parallel Processing
Parallel processing can be described as a class of techniques which enables the system
to achieve simultaneous data-processing tasks to increase the computational speed of a
computer system.
The following diagram shows one possible way of separating the execution unit into eight
functional units operating in parallel.
The operation performed in each functional unit is indicated in each block if the diagram:
o The adder and integer multiplier performs the arithmetic operation with integer
numbers.
o The floating-point operations are separated into three circuits operating in parallel.
o The logic, shift, and increment operations can be performed concurrently on
different data. All units are independent of each other, so one number can be
shifted while another number is being incremented.
According to the data transfer mode, computer can be divided into 4 major groups:
1. SISD
2. SIMD
3. MISD
4. MIMD
PIPELINING
Pipelining is the process of accumulating instruction from the processor through a
pipeline. It allows storing and executing instructions in an orderly process. It is also known
as pipeline processing.
For example, take a car manufacturing plant. At the first stage, the automobile chassis is
prepared, in the next stage workers add body to the chassis, further, the engine is
installed, then painting work is done and so on.
The group of workers after working on the chassis of the first car don’t sit idle. They start
working on the chassis of the next car. And the next group take the chassis of the car and
add body to it. The same thing is repeated at every stage, after finishing the work on the
current car body they take on next car body which is the output of the previous stage.
Here, though the first car is completed in several hours or days, due to the assembly line
arrangement it becomes possible to have a new car at the end of an assembly line in
every clock cycle.
Similarly, the concept of pipelining works. The output of the first pipeline becomes the
input for the next pipeline. It is like a set of data processing unit connected in series to
utilize processor up to its maximum.
Now, understanding the division of the instruction into subtasks. Let us understand, how
the n number of instructions in a process, are pipelined.
Look at the figure below the 5 instructions are pipelined. The first instruction gets
completed in 5 clock cycle. After the completion of first instruction, in every new clock
cycle, a new instruction completes its execution.
Observe that when the Instruction fetch operation of the first instruction is completed in
the next clock cycle the instruction fetch of second instruction gets started. This way the
hardware never sits idle it is always busy in performing some or other operation. But, no
two instructions can execute their same stage at the same clock cycle.
Types of Pipelining
In 1977 Handler and Ramamoorthy classified pipeline processors depending on their
functionality.
1. Arithmetic Pipelining
Here, the multiple arithmetic logic units are built in the system to perform the
parallel arithmetic computation in various data format.
For example: The input to the Floating Point Adder pipeline is:
X = A*2^a
Y = B*2^b
Here A and B are mantissas (significant digit of floating point numbers), while a and b are
exponents.
The floating point addition and subtraction is done in 4 parts:
Here, the number of instruction are pipelined and the execution of current
instruction is overlapped by the execution of the subsequent instruction. It is also
called instruction lookahead.
3. Processor Pipelining
Here, the processors are pipelined to process the same data stream.
The data stream is processed by the first processor and the result is stored in the
memory block. The result in the memory block is accessed by the second
processor.
The second processor reprocesses the result obtained by the first processor and
the passes the refined result to the third processor and so on.
The pipeline performing the precise function every time is unifunctional pipeline.
On the other hand, the pipeline performing multiple functions at a different time or
multiple functions at the same time is multifunction pipeline.
The static pipeline performs a fixed-function each time. The static pipeline is
unifunctional.
The static pipeline executes the same type of instructions continuously. Frequent
change in the type of instruction may vary the performance of the pipelining.
Whenever a pipeline has to stall due to some reason it is called pipeline hazards. Below
we have discussed four pipelining hazards.
1. Data Dependency
In the figure above, you can see that result of the Add instruction is stored in the
register R2 and we know that the final result is stored at the end of the execution of the
instruction which will happen at the clock cycle t4.
But the Sub instruction need the value of the register R2 at the cycle t3. So the Sub
instruction has to stall two clock cycles. If it doesn’t stall it will generate an incorrect result.
Thus depending of one instruction on other instruction for data is data dependency.
2. Memory Delay
When an instruction or data is required, it is first searched in the cache memory if not
found then it is a cache miss. The data is further searched in the memory which may
take ten or more cycles. So, for that number of cycle the pipeline has to stall and this is
a memory delay hazard. The cache miss, also results in the delay of all the subsequent
instructions.
3. Branch Delay
Suppose the four instructions are pipelined I1, I2, I3, I4 in a sequence. The instruction I1 is
a branch instruction and its target instruction is Ik. Now, processing starts and instruction
I1 is fetched, decoded and the target address is computed at the 4 th stage in cycle t3.
But till then the instructions I2, I3, I4 are fetched in cycle 1, 2 & 3 before the target branch
address is computed. As I1 is found to be a branch instruction, the instructions I2, I3, I4 has
to be discarded because the instruction Ik has to be processed next to I1. So, this delay
of three cycles 1, 2, 3 is a branch delay.
Prefetching the target branch address will reduce the branch delay. Like if the target
branch is identified at the decode stage then the branch delay will reduce to 1 clock cycle.
4. Resource Limitation
If the two instructions request for accessing the same resource in the same clock cycle,
then one of the instruction has to stall and let the other instruction to use the resource.
This stalling is due to resource limitation. However, it can be prevented by adding more
hardware.
Advantages
Disadvantages
In shared memory multiprocessors, all the CPUs shares the common memory
but in a distributed memory multiprocessor, every CPU has its own private
memory.
Applications of Multiprocessor
1. As a uniprocessor, such as single instruction, single data stream (SISD).
2. As a multiprocessor, such as single instruction, multiple data stream (SIMD), which
is usually used for vector processing.
3. Multiple series of instructions in a single perspective, such as multiple instruction,
single data stream (MISD), which is used for describing hyper-threading or pipelined
processors.
4. Inside a single system for executing multiple, individual series of instructions in
multiple perspectives, such as multiple instruction, multiple data stream (MIMD).
Types of Multiprocessors
There are mainly two types of multiprocessors i.e. symmetric and asymmetric
multiprocessors.
Symmetric Multiprocessors
In these types of systems, each processor contains a similar copy of the operating
system and they all communicate with each other.
All the processors are in a peer to peer relationship i.e. no master - slave
relationship exists between them.
Asymmetric Multiprocessors
Message Passing
Shared Memory
Even though multiprocessor systems are cheaper in the long run than using multiple
computer systems, still they are quite expensive. It is much cheaper to buy a simple
single processor system than a multiprocessor system.
All the processors in the multiprocessor system share the memory. So a much larger
pool of memory is required as compared to single processor systems.
Flynn's Classical Taxonomy
M.J. Flynn proposed a classification for the organization of a computer system by
the number of instructions and data items that are manipulated simultaneously.
The sequence of instructions read from memory constitutes an instruction
stream.
The operations performed on the data in the processor constitute a data stream.
There are different ways to classify parallel computers.
One of the more widely used classifications, in use since 1966, is called Flynn's
Taxonomy.
Flynn's taxonomy distinguishes multi-processor computer architectures according
to how they can be classified along the two independent dimensions of Instruction
Stream and Data Stream. Each of these dimensions can have only one of two
possible states: Single or Multiple.
Flynn's classification
1. Single instruction stream, single data stream (SISD)
2. Single instruction stream, multiple data stream (SIMD)
3. Multiple instruction stream, single data stream (MISD)
4. Multiple instruction stream, multiple data stream (MIMD)
The matrix below defines the 4 possible classifications according to Flynn:
********************************************
*************************************
A type of parallel computer
Single Instruction: All processing units execute the same instruction at any given clock cycle
Multiple Data: Each processing unit can operate on a different data element
Best suited for specialized problems characterized by a high degree of regularity, such as
graphics/image processing.
Synchronous (lockstep) and deterministic execution
Two varieties: Processor Arrays and Vector Pipelines
Examples:
o Processor Arrays: Thinking Machines CM-2, MasPar MP-1 & MP-2, ILLIAC IV
o Vector Pipelines: IBM 9000, Cray X-MP, Y-MP & C90, Fujitsu VP, NEC SX-2, Hitachi
S820, ETA10
Most modern computers, particularly those with graphics processor units (GPUs) employ
SIMD instructions and execution units.
*********************************
A type of parallel computer
Multiple Instruction: Each processing unit operates on the data independently via separate
instruction streams.
Single Data: A single data stream is fed into multiple processing units.
Few (if any) actual examples of this class of parallel computer have ever existed.
Some conceivable uses might be:
o multiple frequency filters operating on a single signal stream
o multiple cryptography algorithms attempting to crack a single coded message.
Multiple Instruction, Multiple Data (MIMD):
An MIMD system is a multiprocessor machine which is capable of
executing multiple instructions on multiple data sets.
Each PE in the MIMD model has separate instruction and data streams;
therefore machines built using this model are capable to any kind of
application.
Unlike SIMD and MISD machines, PEs in MIMD machines work
asynchronously.
*********************
A type of parallel computer
Multiple Instruction: Every processor may be executing a different instruction stream
Multiple Data: Every processor may be working with a different data stream
Execution can be synchronous or asynchronous, deterministic or non-deterministic
Currently, the most common type of parallel computer - most modern supercomputers fall
into this category.
Examples: most current supercomputers, networked parallel computer clusters and "grids",
multi-processor SMP computers, multi-core PCs.
Note: many MIMD architectures also include SIMD execution sub-components