BCA COA Full Notes

Download as pdf or txt
Download as pdf or txt
You are on page 1of 83

lOMoARcPSD|26759719

COA full notes-1 - Hello

Bachelor of Computer Applications (Mahatma Gandhi University)

Studocu is not sponsored or endorsed by any college or university


Downloaded by Dinkan Du ([email protected])
lOMoARcPSD|26759719

Module 1
Syllabus
Basic Computer Organization and Design
Operational concepts, Instruction codes, Computer Registers, Computer Instructions,
Memory locations and addresses, Instruction cycle, Timing and control, Bus
organization.

Operational concepts
A Computer has five functional independent units like Input Unit, Memory Unit,
Arithmetic & Logic Unit, Output Unit, Control Unit.
Input Unit :-
Computers take coded information via input unit. The most famous input device is the
keyboard. Whenever we press any key it is automatically being translated to
corresponding binary code & transmitted over a cable to memory or processor.
Memory Unit :-
It stores programs as well as data and there are two types- Primary and Secondary
Memory
Primary Memory is quite fast which works at electronic speed. Programs should be
stored in memory before getting executed. Random Access Memory are those memory
in which location can be accessed in a shorter period of time after specifying the
address. Primary memory is essential but expensive so we went for secondary memory
which is quite cheaper. It is used when large amount of data & programs are needed to
store, particularly the information that we dont access very frequently. Ex- Magnetic
Disks, Tapes
Arithmetic & Logic Unit :-
All the arithmetic & Logical operations are performed by ALU and this operation are
initiated once the operands are brought into the processor.
Output Unit :– It displays the processed result to outside world.
Basic Operational Concepts
■ Instructions take a vital role for the proper working of the computer.
■ An appropriate program consisting of a list of instructions is stored in the
memory so that the tasks can be started.
■ The memory brings the Individual instructions into the processor, which
executes the specified operations.
■ Data which is to be used as operands are moreover also stored in the
memory.

Example:
Add LOCA, R0

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

■ This instruction adds the operand at memory location LOCA to the operand
which will be present in the Register R0.
■ The above mentioned example can be written as follows:

Load LOCA, R1
Add R1, R0
■ First instruction sends the contents of the memory location LOCA into
processor Register R0, and meanwhile the second instruction adds the
contents of Register R1 and R0 and places the output in the Register R1.

The memory and the processor are swapped and are started by sending the address of
the memory location to be accessed to the memory unit and issuing the appropriate
control signals.
■ The data is then transferred to or from the memory.

Analysing how processor and memory are connected :–


■ Processors have various registers to perform various functions :-
■ Program Counter :- It contains the memory address of next instruction to be
fetched.
■ Instruction Register:- It holds the instruction which is currently being executed.
■ MDR :- It facilities communication with memory. It contains the data to be
written into or read out of the addressed location.
■ MAR :- It holds the address of the location that is to be accessed
■ There are n general purpose registers that is R0 to Rn-1

Performance :-
■ Performance means how quickly a program can be executed.

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

■ In order to get the best performance it is required to design the compiler,


machine instruction set & hardware in a coordinated manner.

Connection B / W Processor & Memory


Connection B/W Processor & Memory
■ The above mentioned block diagram consists of the following components

1) Memory
2) MAR
3) MDR
4) PC
5) IR
6) General Purpose Registers
7) Control Unit
8) ALU
■ The instruction that is currently being executed is held by the Instruction
Register.
■ IR output is available to the control circuits, which generates the timing signal
that control the various processing elements involved in executing the
instruction.
■ The Memory address of the next instruction to be fetched and executed is
contained by the Program Counter.
■ It is a specialized register.
■ It keeps the record of the programs that are executed.
■ Role of these registers is to handle the data available in the instructions. They
store the data temporarily.
■ Two registers facilitate the communication with memory.

These registers are:


1) MAR (Memory Address Register)
2) MDR (Memory Data Register)
Memory Address Register:
■ The address of the location to be accessed is held by MAR.

Memory Data Register:


■ It contains the data to be written into or to be read out of the addressed
location.

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

Working Explanation
A PC is set to point to the first instruction of the program. The contents of the PC are
transferred to the MAR and a Read control signal is sent to the memory. The addressed
word is fetched from the location which is mentioned in the MAR and loaded into MDR.

Instruction codes
While a ​Program​, as we all know, is, A set of instructions that specify the operations,
operands, and the sequence by which processing has to occur. An ​instruction code is
a group of bits that tells the computer to perform a specific operation part.

Operation Code
The operation code of an instruction is a group of bits that define operations such as
add, subtract, multiply, shift and compliment. The number of bits required for the
operation code depends upon the total number of operations available on the computer.
The operation code must consist of at least ​n bits for a given ​2^n operations. The
operation part of an instruction code specifies the operation to be performed.

Register Part
The operation must be performed on the data stored in registers. An instruction code
therefore specifies not only operations to be performed but also the registers where the
operands(data) will be found as well as the registers where the result has to be stored.

Stored Program Organisation


The simplest way to organize a computer is to have a Processor Register and
instruction code with two parts. The first part specifies the operation to be performed
and second specifies an address. The memory address tells where the operand in
memory will be found.
Instructions are stored in one section of memory and data in another.

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

Some computers are with a single processor register, known as ​Accumulator (AC)​.
The operation is performed with the memory operand and the content of AC.

Common Bus System


The basic computer has 8 registers, a memory unit and a control unit. Paths must be
provided to transfer data from one register to another. An efficient method for
transferring data in a system is to use a ​Common Bus System​. The output of registers
and memory are connected to the common bus.

Load(LD)
The lines from the common bus are connected to the inputs of each register and data
inputs of memory. The particular register whose ​LD input is enabled receives the data
from the bus during the next ​clock pulse transition​.
Before studying about instruction formats lets first study about the operand address
parts.
When the 2nd part of an instruction code specifies the operand, the instruction is said to
have ​immediate operand​. And when the 2nd part of the instruction code specifies the
address of an operand, the instruction is said to have a ​direct address​. And in ​indirect
address​, the 2nd part of instruction code, specifies the address of a memory word in
which the address of the operand is found.

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

Computer Instructions
The basic computer has three instruction code formats. The ​Operation code (opcode)
part of the instruction contains 3 bits and remaining 13 bits depends upon the operation
code encountered.
There are three types of formats:

1. Memory Reference Instruction


It uses ​12 bits to specify the address and ​1 bit to specify the addressing mode (​I​). ​I is
equal to 0​ ​for ​direct address​ and ​1​for ​indirect address​.

2. Register Reference Instruction


These instructions are recognized by the opcode ​111 with a ​0 in the left most bit of
instruction. The other 12 bits specify the operation to be executed.

3. Input-Output Instruction
These instructions are recognized by the operation code ​111 with a ​1 in the leftmost bit
of instruction. The remaining 12 bits are used to specify the input-output operation.

Format of Instruction
The format of an instruction is depicted in a rectangular box symbolizing the bits of an
instruction. Basic fields of an instruction format are given below:
1. An operation code field that specifies the operation to be performed.
2. An address field that designates the memory address or register.
3. A mode field that specifies the way the operand of effective address is
determined.
Computers may have instructions of different lengths containing varying numbers of
addresses. The number of address fields in the instruction format depends upon the
internal organization of its registers.

Computer Registers
Registers are a type of computer memory used to quickly accept, store, and transfer
data and instructions that are being used immediately by the CPU. The registers used
by the CPU are often termed as Processor registers.
A processor register may hold an instruction, a storage address, or any data (such as
bit sequence or individual characters).

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

The computer needs processor registers for manipulating data and a register for holding
a memory address. The register holding the memory location is used to calculate the
address of the next instruction after the execution of the current instruction is
completed.

Following is the list of some of the most common registers used in a basic computer:

Register Symbol Number of bits Function

Data register DR 16 Holds memory operand

Address register AR 12 Holds address for the memory

Accumulator AC 16 Processor register

Instruction register IR 16 Holds instruction code

Program counter PC 12 Holds address of the instruction

Temporary register TR 16 Holds temporary data

Input register INPR 8 Carries input character

Output register OUTR 8 Carries output character

The following image shows the register and memory configuration for a basic computer.

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

● The Memory unit has a capacity of 4096 words, and each word contains 16 bits.
● The Data Register (DR) contains 16 bits which hold the operand read from the
memory location.
● The Memory Address Register (MAR) contains 12 bits which hold the address for
the memory location.
● The Program Counter (PC) also contains 12 bits which hold the address of the
next instruction to be read from memory after the current instruction is executed.
● The Accumulator (AC) register is a general purpose processing register.
● The instruction read from memory is placed in the Instruction register (IR).
● The Temporary Register (TR) is used for holding the temporary data during the
processing.
● The Input Registers (IR) holds the input characters given by the user.
● The Output Registers (OR) holds the output after processing the input data.

Computer Instructions
Computer instructions are a set of machine language instructions that a particular
processor understands and executes. A computer performs tasks on the basis of the
instruction provided.

An instruction comprises groups called fields. These fields include:

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

● The Operation code (Opcode) field which specifies the operation to be


performed.
● The Address field which contains the location of the operand, i.e., register or
memory location.
● The Mode field which specifies how the operand will be located.

A basic computer has three instruction code formats which are:

1. Memory - reference instruction


2. Register - reference instruction
3. Input-Output instruction

Memory - reference instruction

In Memory-reference instruction, 12 bits of memory is used to specify an address and


one bit to specify the addressing mode 'I'.

Register - reference instruction

The Register-reference instructions are represented by the Opcode 111 with a 0 in the
leftmost bit (bit 15) of the instruction.

Note: The Operation code (Opcode) of an instruction refers to a group of bits that define
arithmetic and logic operations such as add, subtract, multiply, shift, and compliment.

A Register-reference instruction specifies an operation on or a test of the AC


(Accumulator) register.

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

Input-Output instruction

Just like the Register-reference instruction, an Input-Output instruction does not need a
reference to memory and is recognized by the operation code 111 with a 1 in the
leftmost bit of the instruction. The remaining 12 bits are used to specify the type of the
input-output operation or test performed.

Note

● The three operation code bits in positions 12 through 14 should be equal to 111.
Otherwise, the instruction is a memory-reference type, and the bit in position 15
is taken as the addressing mode I.
● When the three operation code bits are equal to 111, control unit inspects the bit
in position 15. If the bit is 0, the instruction is a register-reference type.
Otherwise, the instruction is an input-output type having bit 1 at position 15.

Instruction Set Completeness


A set of instructions is said to be complete if the computer includes a sufficient number
of instructions in each of the following categories:

● Arithmetic, logical and shift instructions


● A set of instructions for moving information to and from memory and processor
registers.
● Instructions which controls the program together with instructions that check
status conditions.
● Input and Output instructions

Arithmetic, logic and shift instructions provide computational capabilities for processing
the type of data the user may wish to employ.
A huge amount of binary information is stored in the memory unit, but all computations
are done in processor registers. Therefore, one must possess the capability of moving
information between these two units.
Program control instructions such as branch instructions are used change the sequence
in which the program is executed.

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

Input and Output instructions act as an interface between the computer and the user.
Programs and data must be transferred into memory, and the results of computations
must be transferred back to the user.

Memory locations and addresses


MEMORY-LOCATIONS & ADDRESSES

• Memory consists of many millions of storage cells (flip-flops).

• Each cell can store a bit of information i.e. 0 or 1 (Figure).


• Each group of n bits is referred to as a word of information, and n is called the word
length.
• The word length can vary from 8 to 64 bits.

• A unit of 8 bits is called a byte.

• Accessing the memory to store or retrieve a single item of information (word/byte)


requires distinct addresses for each item location. (It is customary to use numbers from
0 through 2ˆk-1 as the addresses of successive-locations in the memory).
• If 2ˆk = no. of addressable locations;

then 2ˆk addresses constitute the address-space of the computer.

For example, a 24-bit address generates an address-space of 2ˆ24 locations (16 MB).

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

BYTE-ADDRESSABILITY

A byte is always 8 bits, but the word length typically ranges from 16 to 64 bits.

• In byte-addressable memory, successive addresses refer to successive byte locations


in the memory.

• Byte locations have addresses 0, 1, 2. . . . .


• If the word-length is 32 bits, successive words are located at addresses 0, 4, 8. . with
each word having 4 bytes.

Instruction cycle
A program residing in the memory unit of a computer consists of a sequence of instructions.
These instructions are executed by the processor by going through a cycle for each
instruction.

In a basic computer, each instruction cycle consists of the following phases:

1. Fetch instruction from memory.


2. Decode the instruction.
3. Read the effective address from memory.
4. Execute the instruction.

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

Input-Output Configuration
In computer architecture, input-output devices act as an interface between the machine and
the user.

Instructions and data stored in the memory must come from some input device. The results
are displayed to the user through some output device.

The following block diagram shows the input-output configuration for a basic computer.

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

● The input-output terminals send and receive information.


● The amount of information transferred will always have eight bits of an alphanumeric
code.
● The information generated through the keyboard is shifted into an input register
'INPR'.
● The information for the printer is stored in the output register 'OUTR'.
● Registers INPR and OUTR communicate with a communication interface serially and
with the AC in parallel.
● The transmitter interface receives information from the keyboard and transmits it to
INPR.
● The receiver interface receives information from OUTR and sends it to the printer
serially.

Design of a Basic Computer


A basic computer consists of the following hardware components.

1. A memory unit with 4096 words of 16 bits each


2. Registers: AC (Accumulator), DR (Data register), AR (Address register), IR
(Instruction register), PC (Program counter), TR (Temporary register), SC (Sequence
Counter), INPR (Input register), and OUTR (Output register).
3. Flip-Flops: I, S, E, R, IEN, FGI and FGO

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

Note: FGI and FGO are corresponding input and output flags which are considered as
control flip-flops.

1. Two decoders: a 3 x 8 operation decoder and 4 x 16 timing decoder


2. A 16-bit common bus
3. Control Logic Gates
4. The Logic and Adder circuits connected to the input of AC.

Timing and control


The timing for all registers in the basic computer is controlled by a master clock
generator. The clock pulses are applied to all flip-flops and registers in the system,
including the flip-flops and registers in the control unit. The clock pulses do not change
the state of a register unless the register is enabled by a control signal. The control
signals are generated in the control unit and provide control inputs for the multiplexers
in the common bus, control inputs in processor registers, and microoperations for the
accumulator.
There are two major types of control organization:

1. hardwired control and


2. microprogrammed control.

In the hardwired organization​, the control logic is implemented with gates, flip-flops,
decoders, and other digital circuits. It has the advantage that it can be optimized to
produce a fast mode of operation. In the microprogrammed organization, the control
information is stored in a control memory. The control memory is programmed to
initiate the required sequence of microoperations. A hardwired control, as the name
implies, requires changes in the wiring among the various components if the design
has to be modified or changed.
In the microprogrammed control​, any required changes or modifications can be
done by updating the microprogram in control memory.
The block diagram of the control unit is shown in Fig. 5.6.
It consists of two decoders,

1. a sequence counter, and


2. a number of control logic gates.

An instruction read from memory is placed in the instruction register (IR).position of this
register in the common bus system is indicated in Fig 5.4.

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

The instruction register is shown again in Fig. 5.6, where it is divided into three parts:

1. the 1 bit,
2. the operation code, and
3. bits 0 through 11.

The operation code in bits 12 through 14 are decoded with a 3 x 8 decoder. The eight
outputs of the decoder are designated by the symbols D​0 through D​7​. The subscripted
decimal number is equivalent to the binary value of the corresponding operation code.
Bit 15 of the instruction is transferred to a flip-flop designated by the symbol I. Bits 0
through 11 are applied to the control logic gates. The 4-bit sequence counter can count
in binary from 0 through 15. The outputs of the counter are decoded into 16 timing
signals T​0​ through T​15​.

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

The sequence counter SC can be incremented or cleared synchronously. Most of the


time, the counter is incremented to provide the sequence of timing signals out of the 4
x 16 decoder. Once in a while, the counter is cleared to 0, causing the next active
timing signal to be T​0​.
As an example, consider the case where SC is incremented to provide timing signals
T​0​, T​1​, T​2​, T​3​, and T​4 in sequence. At time T​4​, SC is cleared to 0 if decoder output D​3 is
active. This is expressed symbolically by the statement
D​3​T​4​: SC <​__​ 0
The timing diagram of Fig. 5-7 shows the time relationship of the control signals.

The sequence counter SC responds to the positive transition of the clock. Initially, the
CLR input of SC is active. The first positive transition of the clock clears SC to 0, which
in tum activates the timing signal T​0 out of the decoder. T​0 is active during one clock
cycle. The positive clock transition labeled T​0 in the diagram will trigger only those
registers whose control inputs are transitioned to timing signal T​0​. SC is incremented
with every positive clock transition unless its CLR input is active. This produces the
sequence of timing signals T​0​, T​1​, T​2​, T​3 ,T​4 and so on, as shown in the diagram. (Note
the relationship between the timing signal and its corresponding positive clock

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

transition.) If SC is not cleared, the timing signals will continue with T5, T​6 up to T​15
and back to T​0
The last three waveforms in Fig. 5-7 show how SC is cleared when D​3​T​4 = 1. Output D​3
from the operation decoder becomes active at the end of timing signal T​2​. When timing
signal T​4 becomes active, the output of the AND gate that implements the control
function D​3​T​4 becomes active. This signal is applied to the CLR input of SC. On the
next positive clock transition (the one marked T​4 in the diagram) the counter is cleared
to 0. This causes the timing signal T​0 to become active instead of T​5 that would have
been active if SC were incremented instead of cleared.
A memory read or write cycle will be initiated with the rising edge of a timing signal. It
will be assumed that a memory cycle time is less than the clock cycle time. According
to this assumption, a memory read or write cycle ini​tiated by a timing signal will be
completed by the time the next clock goes through its positive transition. The clock
transition will then be used to load the memory word into a register. This timing
relationship is not valid in many computers because the memory cycle time is usually
longer than the processor clock cycle. In such a case it is necessary to provide wait
cycles in the processor until the memory word is available. To facilitate the
presentation, we will assume that a wait period is not necessary in the basic computer.
To fully comprehend the operation of the computer, it is crucial that one understands
the timing relationship between the clock transition and the timing signals. For
example, the register transfer statement
T​0​: AR <​__​ PC
specifies a transfer of the content of PC into AR if timing signal T​0 is active. T​0 is active
during an entire clock cycle intervaL During this time the content of PC is placed onto
the bus (with S​2​S​1​S​0 = 010) and the LD (load) input of AR is enabled. The actual
transfer does not occur until the end of the clock cycle when the clock goes through a
positive transition. This same positive clock transition increments the sequence counter
SC from 0000 to 0001 . The next clock cycle has T​1​ active and T​0​ inactive.

Bus Organization
What Is Bus

Bus is a subsystem that is used to transfer data and other information between
devices.Means various devices in computer like(Memory, CPU, I/O and Other) are
communicate with each other through buses.In general, a bus is said to be as the
communication pathway connecting two or more devices.
A key characteristics of a bus is that it is a shared transmission medium,as multiple
devices are attached to a bus.

Typically, a bus consists of multiple communication Pathways or lines which are either
in the form of wires or metal lines etched in a card or board(Printed Circuit Board).Each
line is capable of transmitting binary 1 and binary 0.

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

Computer System Contains a number of different buses that provide pathways between
components at various levels of computer system hierarchy.
But before discussing them i.e., types of buses , i will first describe here one of the
most important aspect of buses which is described below-

Any Bus consists,typically of form about 50 to 100 of separate lines.And on any bus, the
lines may generally be classified into three functional groups, as depicted in figure
below:

Now,we will discussing about each in brief one by one.

● Data Lines​:

Data Lines provide a path for moving data between system modules.It is
bidirectional which means data lines are used to transfer data in both
directions.As an example, CPU can read data on these lines drom memory as
well as send data out of these lines to a memory location or to a port.And in any
bus the no. of lines in data lines are either 8,16,32 or more depending on size
of bus.These lines , collectively are called as data bus.

● Address Lines:

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

Address Lines are collectively called as address bus.In any bus, the no. of lines in
address are usually 16,20,24, or more depending on type and architecture of
bus.On these lines, CPU sends out the address of memory location on I/O
Port that is to be written on or read from.In short, it is an internal channel
from CPU to Memory across which the address of data(not data) are
transmitted.Here the communication is one way that is, the address is send from CPU
to Memory and I/O Port but not Memory and I/O port send address to CPU
on that line and hence these lines are unidirectional.

● Control Lines:

Control Lines are collectively called as Control Bus.Control Lines are gateway
used to transmit and receives control signals between the microprocessor and
various devices attached to it.In other words, Control Lines are used by CPUs
for Communicating with other devices within the computer.As an
example-CPU sends signals on the Control bus to enable the outputs of address
memory devices and port devices.Typical Control Lines signals are:
-Memory Read
-Memory Write
-I/O Read
-I/O Write
-Bus Request
-Bus Grant, etc.

Operation of Bus
The Operation of Bus is as follows:
● If One module wishes to send data to another, it must do two things:

1.Obtain the use of module Bus


2.Transfer for data to the Bus

● If one module wishes to request data from another module, it must:

1.Obtain the use of Bus


2.Transfer a request to other module over the appropriate control and address
lines.It must then wait for that second module to send the data.

Types Of Bus
There are variety of Buses, but here i will describe only those that are widely used.

● System Bus:

A Bus that connects major computer components (Processor, Memory, I/O) is


called a System Bus.It is a single computer bus among all Buses that connects all these

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

components of a computer system.And it is the only Bus, in which data lines, address,
control lines all are present.It is also Known as "front side " Bus.It is faster than
peripheral Bus(PCI, ISA, etc) but slower than backside Bus.

● Peripheral Bus(I/O Bus /External Bus):

Peripheral Bus also known as "I/O Bus".It is data pathway that connects
peripheral devices to the CPU.In other words , in computing, a peripheral bus is a
computer bus designed to support computer peripheral like printers,hard drives.The PCI
and USB buses are commonly used Peripheral Buses, and are today used in commonly
many PCs.Now we will discuss both of them in brief:

​PCI(Peripheral Component Interconnect): PCI Bus connects the CPU and


expansion boards such as modem cards ,network cards and sound cards.These
expansion boards are normally plugged into expansion slots on the motherboard.That's
why PCI bus is also known as expansion bus or external Bus.

​ SB(Universal Serial Bus): Universal Serial Bus is used to attach USB devices
U
like Pen Drive, etc to CPU.

● Local Bus:

Local Bus are the traditional I/O(Peripheral) buses such as ISA,MCA, or EISA
Buses.Now we will discuss about each in brief one by one.

​ISA(Industry Standard Architecture Bus): ​The ISA Bus permit bus mastering
i.e., it enabled peripheral connected directly to the bus to communicate directly with
other peripherals without going through the processor.One of the consequences of bus
mastering is Direct Memory Access.Up to end of 1990s almost all PCs computers were
equipped with ISA Bus, but it was progressively replaced by the PCI Bus,which offer a
better performance.

​ CA(Micro Channel Architecture): It is an improved proprietary bus designed


M
by IBM in 1987 to be used in their PS/2 lines of computers.

​EISA(Extended Industry Standard Architecture): The EISA Bus uses


connectors that are the same size as the ISA connectors but with 4 rows of contacts
instead of 2 for 32 bit addressing.

● High Speed Bus:

High Speed Bus are specifically designed to support high capacity I/O
devices.High Speed Bus brings high demand devices into closer integration with the
processor.This Bus supports connection to high speed LANs, such as Fast Ethernet at
100 Mbps, video and graphic workstation, firewire etc.

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

Module 2
Syllabus
Central Processing Unit:
General Register Organization, Stack Organization, Addressing modes, Instruction
Classification, Program control.

General Register Organization


The set of registers in a computer are connected to the ALU using buses and
Multiplexers. A 14-bit control word specifies two source registers (SELA & SELB), a
destination register (SELD), and an operation (OPR). The remaining four bits of
the control word can be used to specify the following ALU operations: The
registers can be specified using three bits each as follows:

Stack organization:
A stack is a storage device that stores information in a last-in, first-out (LIFO) fashion.
A stack has two operations: push, which places data onto the stack, and pop, which
removes data from the stack. A computer can have a separate memory reserved just for
stack operations.However, most utilize main memory for representing stacks. Hence, all
assembly programs should allocate memory for a stack. The SP register is initially
loaded with the address of the top of the stack. In memory, the stack is actually
upside-down, so when something is pushed onto the stack, the stack pointer is
decremented.

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

SP<-SP-1
M[SP] <- DR
And, when something is popped off the stack, the stack pointer is incremented.
DR<-M[SP]
SP <- SP + 1
The push and pop instructions can be explicitly executed in a program. However, they
are also implicitly executed for such things as procedure calls and interrupts as well.
Care must be taken when performing stack operations to ensure that an overflow
or underflow of the stack does not occur.

Addressing Modes
The operation field of an instruction specifies the operation to be performed. This
operation will be executed on some data which is stored in computer registers or the
main memory. The way any operand is selected during the program execution is
dependent on the addressing mode of the instruction. The purpose of using addressing
modes is as follows:
1. To give the programming versatility to the user.
2. To reduce the number of bits in addressing field of instruction.
Types of Addressing Modes
Below we have discussed different types of addressing modes one by one:

Immediate Mode
In this mode, the operand is specified in the instruction itself. An immediate mode
instruction has an operand field rather than the address field.
For example: ​ADD 7​, which says Add 7 to contents of accumulator. 7 is the operand
here.
Register Mode
In this mode the operand is stored in the register and this register is present in CPU.
The instruction has the address of the Register where the operand is stored.

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

Advantages
● Shorter instructions and faster instruction fetch.
● Faster memory access to the operand(s)
Disadvantages
● Very limited address space
● Using multiple registers helps performance but it complicates the instructions.

Register Indirect Mode


In this mode, the instruction specifies the register whose contents give us the address of
operand which is in memory. Thus, the register contains the address of operand rather
than the operand itself.

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

Auto Increment/Decrement Mode


In this the register is incremented or decremented after or before its value is used.

Direct Addressing Mode


In this mode, effective address of operand is present in instruction itself.
● Single memory reference to access data.
● No additional calculations to find the effective address of the operand.

For Example:​ ​ADD R1, 4000​ - In this the 4000 is effective address of operand.
NOTE:​ Effective Address is the location where operand is present.

Indirect Addressing Mode


In this, the address field of instruction gives the address where the effective address is
stored in memory. This slows down the execution, as this includes multiple memory
lookups to find the operand.

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

Displacement Addressing Mode


In this the contents of the indexed register is added to the Address part of the
instruction, to obtain the effective address of operand.
EA = A + (R)​, In this the address field holds two values, A(which is the base value) and
R(that holds the displacement), or vice versa.

Relative Addressing Mode

It is a version of Displacement addressing mode.


In this the contents of PC(Program Counter) is added to address part of instruction to
obtain the effective address.
EA = A + (PC)​, where EA is effective address and PC is program counter.
The operand is A cells away from the current cell(the one pointed to by PC)

Base Register Addressing Mode


It is again a version of Displacement addressing mode. This can be defined as ​EA = A +
(R)​, where A is displacement and R holds pointer to base address.

Stack Addressing Mode


In this mode, operand is at the top of the stack. For example: ​ADD​, this instruction will
POP top two items from the stack, add them, and will then ​PUSH the result to the top of
the stack.

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

Instruction Cycle
An instruction cycle, also known as ​fetch-decode-execute cycle is the basic
operational process of a computer. This process is repeated continuously by CPU from
boot up to shut down of computer. Following are the steps that occur during an
instruction cycle:

1. Fetch the Instruction


The instruction is fetched from memory address that is stored in PC(Program Counter)
and stored in the instruction register IR. At the end of the fetch operation, PC is
incremented by 1 and it then points to the next instruction to be executed.

2. Decode the Instruction


The instruction in the IR is executed by the decoder.

3. Read the Effective Address


If the instruction has an indirect address, the effective address is read from the memory.
Otherwise operands are directly read in case of immediate operand instruction.

4. Execute the Instruction


The Control Unit passes the information in the form of control signals to the functional
unit of CPU. The result generated is stored in main memory or sent to an output device.
The cycle is then repeated by fetching the next instruction. Thus in this way the
instruction cycle is repeated continuously.

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

Instruction Classification
Computer perform task on the basis of instruction provided. An instruction in computer
comprises of groups called fields. These field contains different information as for
computers every thing is in 0 and 1 so each field has different significance on the basis
of which a CPU decide what to perform. The most common fields are:

Operation field which specifies the operation to be performed like addition.


Address field which contain the location of operand, i.e., register or memory location.
Mode field which specifies how operand is to be founded.
An instruction is of various length depending upon the number of addresses it contain.
Generally CPU organization are of three types on the basis of number of address fields:

1. Single Accumulator organization


2. General register organization
3. Stack organization
In first organization operation is done involving a special register called accumulator. In
second on multiple registers are used for the computation purpose. In third organization
the work on stack basis operation due to which it does not contain any address field. It
is not necessary that only a single organization is applied a blend of various
organization is mostly what we see generally.

On the basis of number of address, instruction are classified as:

Note that we will use X = (A+B)*(C+D) expression to showcase the procedure.


Zero Address Instructions –

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

A stack based computer do not use address field in instruction.To evaluate a expression
first it is converted to revere Polish Notation i.e. Post fix Notation.

Expression: X = (A+B)*(C+D)
Postfixed : X = AB+CD+*
TOP means top of stack
M[X] is any memory location

PUSH A TOP = A
PUSH B TOP = B
ADD TOP = A+B
PUSH C TOP = C
PUSH D TOP = D
ADD TOP = C+D
MUL TOP = (C+D)*(A+B)
POP X M[X] = TOP

One Address Instructions –


This use a implied ACCUMULATOR register for data manipulation.One operand is in
accumulator and other is in register or memory location.Implied means that the CPU
already know that one operand is in accumulator so there is no need to specify it.

Expression: X = (A+B)*(C+D)
AC is accumulator
M[] is any memory location
M[T] is temporary location

LOAD A AC = M[A]
ADD B AC = AC + M[B]
STORE T M[T] = AC
LOAD C AC = M[C]
ADD D AC = AC + M[D]
MUL T AC = AC * M[T]
STORE X M[X] = AC

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

Two Address Instructions –


This is common in commercial computers.Here two address can be specified in the
instruction.Unlike earlier in one address instruction the result was stored in accumulator
here result can be stored at different location rather than just accumulator, but require
more number of bit to represent address.

Here destination address can also contain operand.

Expression: X = (A+B)*(C+D)
R1, R2 are registers
M[] is any memory location

MOV R1, A R1 = M[A]


ADD R1, B R1 = R1 + M[B]
MOV R2, C R2 = C
ADD R2, D R2 = R2 + D
MUL R1, R2 R1 = R1 * R2
MOV X, R1 M[X] = R1

Three Address Instructions –


This has three address field to specify a register or a memory location. Program created
are much short in size but number of bits per instruction increase. These instructions
make creation of program much easier but it does not mean that program will run much
faster because now instruction only contain more information but each micro operation
(changing content of register, loading address in address bus etc.) will be performed in
one cycle only.

Expression: X = (A+B)*(C+D)
R1, R2 are registers
M[] is any memory location

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

ADD R1, A, B R1 = M[A] + M[B]


ADD R2, C, D R2 = M[C] + M[D]
MUL X, R1, R2 M[X] = R1 * R2

Program Control
Program Control Instructions are the machine code that are used by machine or in
assembly language by user to command the processor act accordingly. These
instructions are of various types. These are used in assembly language by user also.
But in level language, user code is translated into machine code and thus instructions
are passed to instruct the processor do the task.
Types of Program Control Instructions:
There are different types of Program Control Instructions:
1. Compare Instruction:
Compare instruction is specifically provided, which is similar t a subtract instruction
except the result is not stored anywhere, but flags are set according to the result.
Example:
CMP R1, R2 ;
2. Unconditional Branch Instruction:
It causes an unconditional change of execution sequence to a new location.
Example:
JUMP L2
Mov R3, R1 goto L2
3. Conditional Branch Instruction:
A conditional branch instruction is used to examine the values stored in the condition
code register to determine whether the specific condition exists and to branch if it does.
Example:
Assembly Code : BE R1, R2, L1
Compiler allocates R1 for x and R2 for y
High Level Code: if (x==y) goto L1;
4. Subroutines:
A subroutine is a program fragment that lives in user space, performs a well-defined
task. It is invoked by another user program and returns control to the calling program
when finished.
Example:
CALL and RET
5. Halting Instructions:
● NOP Instruction – NOP is no operation. It cause no change in the
processor state other than an advancement of the program counter. It can
be used to synchronize timing.

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

● HALT – It brings the processor to an orderly halt, remaining in an idle state


until restarted by interrupt, trace, reset or external action.
6. Interrupt Instructions:
Interrupt is a mechanism by which an I/O or an instruction can suspend the normal
execution of processor and get itself serviced.
● RESET – It reset the processor. This may include any or all setting registers
to an initial value or setting program counter to standard starting location.
● TRAP – It is non-maskable edge and level triggered interrupt. TRAP has the
highest priority and vectored interrupt.
● INTR – It is level triggered and maskable interrupt. It has the lowest priority.
It can be disabled by resetting the processor.

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

Module 3
Syllabus
Memory Organization
Memory Hierarchy, Main Memory, Organization of RAM, SRAM, DRAM, Read Only Memory-
ROM-PROM,EPROM,EEPROM, Auxiliary memory, Cache memory, Virtual Memory, Memory
mapping Techniques.

Memory Hierarchy
A memory unit is the collection of storage units or devices together. The memory unit
stores the binary information in the form of bits. Generally, memory/storage is classified
into 2 categories:
● Volatile Memory​: This loses its data, when power is switched off.
● Non-Volatile Memory​: This is a permanent storage and does not lose any data
when power is switched off.
Fig:- Memory Hierarchy

The total memory capacity of a computer can be visualized by hierarchy of components.


The memory hierarchy system consists of all storage devices contained in a computer
system from the slow Auxiliary Memory to fast Main Memory and to smaller Cache
memory.
Auxillary memory access time is generally ​1000 times that of the main memory,
hence it is at the bottom of the hierarchy.
The ​main memory occupies the central position because it is equipped to communicate
directly with the CPU and with auxiliary memory devices through Input/output processor
(I/O).
When the program not residing in main memory is needed by the CPU, they are brought
in from auxiliary memory. Programs not currently needed in main memory are

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

transferred into auxiliary memory to provide space in main memory for other programs
that are currently in use.
The ​cache memory is used to store program data which is currently being executed in
the CPU. Approximate access time ratio between cache memory and main memory is
about ​1 to 7~10

Memory Access Methods


Each memory type, is a collection of numerous memory locations. To access data from
any memory, first it must be located and then the data is read from the memory location.
Following are the methods to access information from memory locations:
1. Random Access​: Main memories are random access memories, in which each
memory location has a unique address. Using this unique address any memory
location can be reached in the same amount of time in any order.
2. Sequential Access​: This methods allows memory access in a sequence or in
order.
3. Direct Access​: In this mode, information is stored in tracks, with each track
having a separate read/write head.
Main Memory
The memory unit that communicates directly within the CPU, Auxillary memory and
Cache memory, is called main memory. It is the central storage unit of the computer
system. It is a large and fast memory used to store data during computer operations.
Main memory is made up of ​RAM and ​ROM​, with RAM integrated circuit chips holing
the major share.
● RAM: Random Access Memory
○ DRAM​: Dynamic RAM, is made of capacitors and transistors, and must be
refreshed every 10~100 ms. It is slower and cheaper than SRAM.
○ SRAM​: Static RAM, has a six transistor circuit in each cell and retains
data, until powered off.
○ NVRAM​: Non-Volatile RAM, retains its data, even when turned off.
Example: Flash memory.
● ROM: Read Only Memory, is non-volatile and is more like a permanent storage
for information. It also stores the ​bootstrap loader​ program, to load and start the
operating system when computer is turned on. ​PROM​(Programmable ROM),

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

EPROM​(Erasable PROM) and ​EEPROM​(Electrically Erasable PROM) are some


commonly used ROMs.
Auxiliary Memory
Devices that provide backup storage are called auxiliary memory. ​For example:
Magnetic disks and tapes are commonly used auxiliary devices. Other devices used as
auxiliary memory are magnetic drums, magnetic bubble memory and optical disks.
It is not directly accessible to the CPU, and is accessed using the Input/Output
channels.
Cache Memory
The data or contents of the main memory that are used again and again by CPU, are
stored in the cache memory so that we can easily access that data in shorter time.
Whenever the CPU needs to access memory, it first checks the cache memory. If the
data is not found in cache memory then the CPU moves onto the main memory. It also
transfers block of recent data into the cache and keeps on deleting the old data in cache
to accomodate the new one.

Hit Ratio
The performance of cache memory is measured in terms of a quantity called ​hit ratio​.
When the CPU refers to memory and finds the word in cache it is said to produce a ​hit​.
If the word is not found in cache, it is in main memory then it counts as a ​miss​.
The ratio of the number of hits to the total CPU references to memory is called hit ratio.
Hit Ratio = Hit/(Hit + Miss)
Associative Memory
It is also known as ​content addressable memory (CAM)​. It is a memory chip in which
each bit position can be compared. In this the content is compared in each bit cell which
allows very fast table lookup. Since the entire chip can be compared, contents are
randomly stored without considering addressing scheme. These chips have less
storage capacity than regular memory chips.

Main Memory
The main memory acts as the central storage unit in a computer system. It is a relatively
large and fast memory which is used to store programs and data during the run time
operations.
The primary technology used for the main memory is based on semiconductor
integrated circuits. The integrated circuits for the main memory are classified into two
major units.

1. RAM (Random Access Memory) integrated circuit chips


2. ROM (Read Only Memory) integrated circuit chips

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

RAM integrated circuit chips


The RAM integrated circuit chips are further classified into two possible operating
modes, ​static​ and ​dynamic​.
The primary compositions of a static RAM are flip-flops that store the binary information.
The nature of the stored information is volatile, i.e. it remains valid as long as power is
applied to the system. The static RAM is easy to use and takes less time performing
read and write operations as compared to dynamic RAM.
The dynamic RAM exhibits the binary information in the form of electric charges that are
applied to capacitors. The capacitors are integrated inside the chip by MOS transistors.
The dynamic RAM consumes less power and provides large storage capacity in a single
memory chip.
RAM chips are available in a variety of sizes and are used as per the system
requirement. The following block diagram demonstrates the chip interconnection in a
128 * 8 RAM chip.

● A 128 * 8 RAM chip has a memory capacity of 128 words of eight bits (one byte)
per word. This requires a 7-bit address and an 8-bit bidirectional data bus.
● The 8-bit bidirectional data bus allows the transfer of data either from memory to
CPU during a ​read​ operation or from CPU to memory during a ​write​ operation.
● The ​read and ​write inputs specify the memory operation, and the two chip select
(CS) control inputs are for enabling the chip only when the microprocessor
selects it.
● The bidirectional data bus is constructed using ​three-state buffers​.
● The output generated by three-state buffers can be placed in one of the three
possible states which include a signal equivalent to logic 1, a signal equal to logic
0, or a high-impedance state.

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

Note: The logic 1 and 0 are standard digital signals whereas the high-impedance state
behaves like an open circuit, which means that the output does not carry a signal and
has no logic significance.

The following function table specifies the operations of a 128 * 8 RAM chip.

From the functional table, we can conclude that the unit is in operation only when CS1 =
1 and CS2 = 0. The bar on top of the second select variable indicates that this input is
enabled when it is equal to 0.

ROM integrated circuit


The primary component of the main memory is RAM integrated circuit chips, but a
portion of memory may be constructed with ROM chips.
A ROM memory is used for keeping programs and data that are permanently resident in
the computer.

Apart from the permanent storage of data, the ROM portion of main memory is needed
for storing an initial program called a ​bootstrap loader​. The primary function of the
bootstrap loader program is to start the computer software operating when power is
turned on.
ROM chips are also available in a variety of sizes and are also used as per the system
requirement. The following block diagram demonstrates the chip interconnection in a
512 * 8 ROM chip.

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

● A ROM chip has a similar organization as a RAM chip. However, a ROM can
only perform read operation; the data bus can only operate in an output mode.
● The 9-bit address lines in the ROM chip specify any one of the 512 bytes stored
in it.
● The value for chip select 1 and chip select 2 must be 1 and 0 for the unit to
operate. Otherwise, the data bus is said to be in a high-impedance state.

Organization of RAM, SRAM, DRAM, Read Only Memory-


ROM-PROM,EPROM,EEPROM
Memory is the most essential element of a computing system because without it
computer can’t perform simple tasks. Computer memory is of two basic type – Primary
memory(RAM and ROM) and Secondary memory(hard drive,CD,etc.). Random Access
Memory (RAM) is primary-volatile memory and Read Only Memory (ROM) is
primary-non-volatile memory.

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

1. Random Access Memory (RAM) –


● It is also called as ​read write memory or the main memory or the ​primary
memory​.
● The programs and data that the CPU requires during execution of a
program are stored in this memory.
● It is a volatile memory as the data loses when the power is turned off.
● RAM is further classified into two types- ​SRAM (Static Random Access
Memory)​ and ​DRAM (Dynamic Random Access Memory)​.

2. Read Only Memory (ROM) –

● Stores crucial information essential to operate the system, like the program
essential to boot the computer.
● It is not volatile.
● Always retains its data.
● Used in embedded systems or where the programming needs no change.
● Used in calculators and peripheral devices.
● ROM is further classified into 4 types- ​ROM​, ​PROM​, ​EPROM​, and
EEPROM​.

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

Types of Read Only Memory (ROM) –


1. PROM (Programmable read-only memory) – It can be programmed by
user. Once programmed, the data and instructions in it cannot be changed.
2. EPROM (Erasable Programmable read only memory) – It can be
reprogrammed. To erase data from it, expose it to ultra violet light. To
reprogram it, erase all the previous data.
3. EEPROM (Electrically erasable programmable read only memory) –
The data can be erased by applying electric field, no need of ultra violet
light. We can erase only portions of the chip.

Auxiliary memory
An Auxiliary memory is known as the lowest-cost, highest-capacity and slowest-access
storage in a computer system. It is where programs and data are kept for long-term
storage or when not in immediate use. The most common examples of auxiliary
memories are magnetic tapes and magnetic disks.

Magnetic Disks
A magnetic disk is a type of memory constructed using a circular plate of metal or
plastic coated with magnetized materials. Usually, both sides of the disks are used to
carry out read/write operations. However, several disks may be stacked on one spindle
with read/write head available on each surface.
The following image shows the structural representation for a magnetic disk.

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

● The memory bits are stored in the magnetized surface in spots along the
concentric circles called tracks.
● The concentric circles (tracks) are commonly divided into sections called sectors.

Magnetic Tape
Magnetic tape is a storage medium that allows data archiving, collection, and backup for
different kinds of data. The magnetic tape is constructed using a plastic strip coated with
a magnetic recording medium.

The bits are recorded as magnetic spots on the tape along several tracks. Usually,
seven or nine bits are recorded simultaneously to form a character together with a parity
bit.
Magnetic tape units can be halted, started to move forward or in reverse, or can be
rewound. However, they cannot be started or stopped fast enough between individual
characters. For this reason, information is recorded in blocks referred to as records.

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

Cache memory
Cache Memory is a special very high-speed memory. It is used to speed up and
synchronizing with high-speed CPU. Cache memory is costlier than main memory or
disk memory but economical than CPU registers. Cache memory is an extremely fast
memory type that acts as a buffer between RAM and the CPU. It holds frequently
requested data and instructions so that they are immediately available to the CPU when
needed.
Cache memory is used to reduce the average time to access data from the Main
memory. The cache is a smaller and faster memory which stores copies of the data
from frequently used main memory locations. There are various different independent
caches in a CPU, which store instructions and data.

Levels of memory:
● Level 1 or Register –
It is a type of memory in which data is stored and accepted that are
immediately stored in CPU. Most commonly used register is accumulator,
Program counter, address register etc.
● Level 2 or Cache memory –
It is the fastest memory which has faster access time where data is
temporarily stored for faster access.
● Level 3 or Main Memory –
It is memory on which computer works currently. It is small in size and once
power is off data no longer stays in this memory.
● Level 4 or Secondary Memory –
It is external memory which is not as fast as main memory but data stays
permanently in this memory.
Cache Performance:
When the processor needs to read or write a location in main memory, it first checks for
a corresponding entry in the cache.
● If the processor finds that the memory location is in the cache, a ​cache hit
has occurred and data is read from cache
● If the processor ​does not find the memory location in the cache, a ​cache
miss has occurred. For a cache miss, the cache allocates a new entry and

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

copies in data from main memory, then the request is fulfilled from the
contents of the cache.
The performance of cache memory is frequently measured in terms of a quantity called
Hit ratio.
Hit ratio = hit / (hit + miss) = no. of hits/total accesses
We can improve Cache performance using higher cache block size, higher associativity,
reduce miss rate, reduce miss penalty, and reduce the time to hit in the cache.
Application of Cache Memory –

1. Usually, the cache memory can store a reasonable number of


blocks at any given time, but this number is small compared to
the total number of blocks in the main memory.
2. The correspondence between the main memory blocks and
those in the cache is specified by a mapping function.
Types of Cache –
1. Primary Cache –
A primary cache is always located on the processor chip. This cache is small and
its access time is comparable to that of processor registers.
2. Secondary Cache –
Secondary cache is placed between the primary cache and the rest of the
memory. It is referred to as the level 2 (L2) cache. Often, the Level 2 cache is
also housed on the processor chip.

Locality of reference –
Since size of cache memory is less as compared to main memory. So to check which
part of main memory should be given priority and loaded in cache is decided based on
locality of reference.
Types of Locality of reference
1. Spatial Locality of reference
This says that there is a chance that element will be present in the close
proximity to the reference point and next time if again searched then more close
proximity to the point of reference.
2. Temporal Locality of reference
In this Least recently used algorithm will be used. Whenever there is page fault
occurs within a word will not only load word in main memory but complete page
fault will be loaded because spatial locality of reference rule says that if you are
referring any word next word will be referred in its register that’s why we load
complete page table so the complete block will be loaded.

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

Virtual Memory
Virtual memory is a memory management technique where secondary memory can be
used as if it were a part of the main memory. Virtual memory is a very common
technique used in the operating systems (OS) of computers. Virtual memory uses
hardware and software to allow a computer to compensate for physical memory
shortages, by temporarily transferring data from random access memory (​RAM​) to disk
storage. In essence, virtual memory allows a computer to treat secondary memory as
though it were the main memory.

Today, most PCs come with up to around 4 GB of RAM. However, sometimes this isn't
enough to run all the programs a user might want to use at once. This is where virtual
memory comes in. Virtual memory can be used to swap data that has not been used
recently -- and move it over to a storage device like a hard drive or ​solid-state drive
(SDD). This will free up more space on the RAM.

Virtual memory is important for ​improving system performance​, multitasking, using large
programs and flexibility. However, users shouldn't rely on virtual memory too much,
because using virtual data is considerably slower than the use of RAM. If the OS has to
swap data between virtual memory and RAM too often, it can make the computer feel
very slow -- this is called ​thrashing​.
Virtual memory was developed at a time when physical memory -- also referenced as
RAM -- was expensive. Computers have a finite amount of RAM, so memory can run
out, especially when multiple programs run at the same time. A system using virtual
memory uses a section of the hard drive to emulate RAM. With virtual memory, a
system can load larger programs or multiple programs running at the same time,
allowing each one to operate as if it has infinite memory and without having to purchase
more RAM.
How virtual memory works
Virtual memory uses both computer hardware and software to work. When an
application is in use, data from that program is stored in a physical address using RAM.
More specifically, virtual memory will map that address to RAM using a memory
management unit (​MMU​). The OS will make and manage memory mappings by using
page tables and other data structures. The MMU, which acts as an address translation
hardware, will automatically translate the addresses.
If at any point later the RAM space is needed for something more urgent, the
data can be swapped out of RAM and into virtual memory. The computer's memory
manager is in charge of keeping track of the shifts between physical and virtual
memory. If that data is needed again, a ​context switch can be used to resume execution
again.
While copying virtual memory into physical memory, the OS divides memory into
pagefiles or ​swap files with a fixed number of addresses. Each page is stored on a disk,
and when the page is needed, the OS copies it from the disk to main memory and
translates the virtual addresses into real addresses.
However, the process of swapping virtual memory to physical is rather slow. This
means that using virtual memory generally causes a noticeable reduction in

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

performance. Because of swapping, computers with more RAM are seen to have better
performance.
Types of virtual memory
A computer's MMU handles memory operations, including ​managing virtual memory​. In
most computers, the MMU hardware is integrated into the ​CPU​. There are two ways in
which virtual memory is handled: ​paged​ and ​segmented​.
Paging divides memory into sections or paging files, usually approximately 4 KB
in size. When a computer uses up its RAM, pages not in use are transferred to the
section of the hard drive designated for virtual memory using a swap file. A swap file is
a space set aside on the hard drive as the virtual memory extensions of the computer's
RAM. When the swap file is needed, it's sent back to RAM using a process called page
swapping. This system ensures that the computer's OS and applications don't run out of
real memory.
The paging process includes the use of page tables, which translate the virtual
addresses that the OS and applications use into the physical addresses that the MMU
uses. Entries in the page table indicate whether the page is in real memory. If the OS or
a program doesn't find what it needs in RAM, then the MMU responds to the missing
memory reference with a page fault exception to get the OS to move the page back to
memory when it's needed. Once the page is in RAM, its virtual address appears in the
page table.
Segmentation is also used to manage virtual memory. This approach divides
virtual memory into segments of different lengths. Segments not in use in memory can
be moved to virtual memory space on the hard drive. Segmented information or
processes are tracked in a segment table, which shows if a segment is present in
memory, whether it's been modified and what its physical address is. In addition, file
systems in segmentation are only made up of a list of segments mapped into a
process's potential address space.
Segmentation and paging differ as a memory model in terms of how memory is
divided; however, it can also be combined. Some virtual memory systems combine
segmentation and paging. In this case, memory gets divided into frames or pages. The
segments take up multiple pages, and the virtual address includes both the segment
number and the page number.
How to manage virtual memory
Operating systems have default settings that determine the amount of hard drive space
to allocate for virtual memory. That setting will work for most applications and
processes, but there may be times when it's necessary to manually reset the amount of
hard drive space allocated to virtual memory, such as with applications that depend on
fast response times or when the computer has multiple ​HDDs​.
When manually resetting virtual memory, the minimum and maximum amount of
hard drive space to be used for virtual memory must be specified. Allocating too little
HDD space for virtual memory can result in a computer running out of RAM. If a system
continually needs more virtual memory space, it may be wise to consider adding RAM.
Common operating systems may generally recommend users not increasing virtual
memory beyond 1.5 times the amount of RAM.

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

Managing virtual memory may be a different experience on different types of


operating systems, however. And IT professionals should understand the basics when it
comes to managing physical memory, virtual memory and virtual addresses.

Benefits of using virtual memory

Benefits of virtual memory include:


● its ability to handle twice as many addresses as main memory;
● frees applications from managing ​shared memory​and saves users from
having to add memory modules when RAM space runs out;
● increased security because of memory isolation;
● multiple larger applications can be run simultaneously;
● allocating memory is relatively cheap;
● doesn't need external fragmentation;
● effective CPU use;
● data can be moved automatically; and
● pages in the original process can be shared during a fork system call.
In addition, in a virtualized computing environment, administrators can use virtual
memory management techniques to allocate additional memory to a virtual machine
(​VM​) that has run out of resources. Such ​virtualization management tactics can improve
VM performance and management flexibility.

Limitations
● The use of virtual memory has its tradeoffs, particularly with speed. It's
generally better to have as much physical memory as possible, so programs
work directly from RAM or physical memory.
● The use of virtual memory slows a computer because data must be mapped
between virtual and physical memory, which requires extra hardware support
for address translations.
● The size of virtual storage is limited by the amount of secondary storage, as
well as the addressing scheme with the computer system.
● Thrashing can happen if the amount of RAM is too small, which will make the
computer perform slower.
● It may take time to switch between applications using virtual memory.
Virtual memory vs. physical memory
When talking about the differences between virtual and physical memory, the biggest
distinction is normally seen to be in speed. RAM is considerably faster than virtual
memory. RAM, however, tends to be more expensive than virtual memory. When a
computer requires storage, RAM is the first used. Virtual memory is used when the
RAM is filled, because it's slower. Users can actively add RAM to a computer by buying
and installing more RAM chips if they are experiencing slowdowns due to memory
swaps happening too often. The amount of RAM depends on what's installed on a
computer. Virtual memory, on the other hand, is limited by the size of the computer's
hard drive. Virtual memory settings can often be controlled through the operating
system.

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

Memory mapping Techniques.

Cache Mapping:
There are three different types of mapping used for the purpose of cache memory which
are as follows: Direct mapping, Associative mapping, and Set-Associative mapping.
These are explained below.
1. Direct Mapping –
The simplest technique, known as direct mapping, maps each block of main memory
into only one possible cache line. Or In Direct mapping, assign each memory block to a
specific line in the cache. If a line is previously taken up by a memory block when a new
block needs to be loaded, the old block is trashed. An address space is split into two
parts: index field and a tag field. The cache is used to store the tag field whereas the
rest is stored in the main memory. Direct mapping`s performance is directly proportional
to the Hit ratio.
i = j modulo m
where
i=cache line number
j= main memory block number
m=number of lines in the cache
For purposes of cache access, each main memory address can be viewed as consisting
of three fields. The least significant w bits identify a unique word or byte within a block of
main memory. In most contemporary machines, the address is at the byte level. The
remaining s bits specify one of the 2s blocks of main memory. The cache logic
interprets these s bits as a tag of s-r bits (most significant portion) and a line field of r
bits. This latter field identifies one of the m=2r lines of the cache.

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

2. Associative Mapping –
In this type of mapping, the associative memory is used to store content and addresses
of the memory word. Any block can go into any line of the cache. This means that the
word id bits are used to identify which word in the block is needed, but the tag becomes
all of the remaining bits. This enables the placement of any word at any place in the
cache memory. It is considered to be the fastest and the most flexible mapping form.

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

3. Set-associative Mapping –
This form of mapping is an enhanced form of direct mapping where the drawbacks of
direct mapping are removed. Set associative addresses the problem of possible
thrashing in the direct mapping method. It does this by saying that instead of having
exactly one line that a block can map to in the cache, we will group a few lines together
creating a ​set​. Then a block in memory can map to any one of the lines of a specific
set..Set-associative mapping allows that each word that is present in the cache can
have two or more words in the main memory for the same index address. Set
associative cache mapping combines the best of direct and associative cache mapping
techniques.In this case, the cache consists of a number of sets, each of which consists
of a number of lines. The relationships are
m=v*k
i= j mod v
where
i=cache set number
j=main memory block number
v=number of sets
m=number of lines in the cache number of sets

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

k=number of lines in each set

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

Module 4
Syllabus
Parallel Computer Structures:
Introduction to parallel processing, Pipeline computers, Multi processing systems,
Architectural classification scheme-SISD, SIMD, MISD, MIMD.

Introduction to parallel processing


Parallel processing can be described as a class of techniques which enables the
system to achieve simultaneous data-processing tasks to increase the computational
speed of a computer system. A parallel processing system can carry out simultaneous
data-processing to achieve faster execution time. For instance, while an instruction is
being processed in the ALU component of the CPU, the next instruction can be read
from memory. The primary purpose of parallel processing is to enhance the computer
processing capability and increase its throughput, i.e. the amount of processing that can
be accomplished during a given interval of time. A parallel processing system can be
achieved by having a multiplicity of functional units that perform identical or different
operations simultaneously. The data can be distributed among various multiple
functional units.
The following diagram shows one possible way of separating the execution unit into
eight functional units operating in parallel. The operation performed in each functional
unit is indicated in each block if the diagram:

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

● The adder and integer multiplier performs the arithmetic operation with integer
numbers.
● The floating-point operations are separated into three circuits operating in
parallel.
● The logic, shift, and increment operations can be performed concurrently on
different data. All units are independent of each other, so one number can be
shifted while another number is being incremented.

Pipeline computers
The term Pipelining refers to a technique of decomposing a sequential process into
sub-operations, with each sub-operation being executed in a dedicated segment that
operates concurrently with all other segments.
The most important characteristic of a pipeline technique is that several computations
can be in progress in distinct segments at the same time. The overlapping of
computation is made possible by associating a register with each segment in the
pipeline. The registers provide isolation between each segment so that each can
operate on distinct data simultaneously.

The structure of a pipeline organization can be represented simply by including an input


register for each segment followed by a combinational circuit.

Let us consider an example of combined multiplication and addition operation to get a


better understanding of the pipeline organization.

The combined multiplication and addition operation is done with a stream of numbers
such as:
A​i​* B​i​ + C​i​ for i = 1, 2, 3, ......., 7

The operation to be performed on the numbers is decomposed into sub-operations with


each sub-operation to be implemented in a segment within a pipeline.

The sub-operations performed in each segment of the pipeline are defined as:
R1 ← A​i​, R2 ← B​i Input A​i​, and B​i
R3 ← R1 * R2, R4 ← C​i Multiply, and input C​i
R5 ← R3 + R4 Add C​i​ to product

The following block diagram represents the combined as well as the sub-operations
performed in each segment of the pipeline.

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

Registers R1, R2, R3, and R4 hold the data and the combinational circuits operate in a
particular segment.
The output generated by the combinational circuit in a given segment is applied as an
input register of the next segment. For instance, from the block diagram, we can see
that the register R3 is used as one of the input registers for the combinational adder
circuit.
In general, the pipeline organization is applicable for two areas of computer design
which includes:

1. Arithmetic Pipeline
2. Instruction Pipeline

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

Arithmetic Pipeline

Arithmetic Pipelines are mostly used in high-speed computers. They are used to
implement floating-point operations, multiplication of fixed-point numbers, and similar
computations encountered in scientific problems.

To understand the concepts of arithmetic pipeline in a more convenient way, let us


consider an example of a pipeline unit for floating-point addition and subtraction.

The inputs to the floating-point adder pipeline are two normalized floating-point binary
numbers defined as:
X = A * 2​a​ = 0.9504 * 10​3
Y = B * 2​b​ = 0.8200 * 10​2

Where ​A and ​B are two fractions that represent the mantissa and ​a and ​b are the
exponents.
The combined operation of floating-point addition and subtraction is divided into four
segments. Each segment contains the corresponding sub-operation to be performed in
the given pipeline. The sub-operations that are shown in the four segments are:

1. Compare the exponents by subtraction.


2. Align the mantissas.
3. Add or subtract the mantissas.
4. Normalize the result.

We will discuss each sub-operation in a more detailed manner later in this section.
The following block diagram represents the sub-operations performed in each segment
of the pipeline.

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

Note: Registers are placed after each sub-operation to store the intermediate results.

1. Compare exponents by subtraction:


The exponents are compared by subtracting them to determine their difference. The
larger exponent is chosen as the exponent of the result.
The difference of the exponents, i.e., ​3 - ​2 = ​1 determines how many times the mantissa
associated with the smaller exponent must be shifted to the right.

2. Align the mantissas:

The mantissa associated with the smaller exponent is shifted according to the difference
of exponents determined in segment one.

X = 0.9504 * 10​3
Y = 0.08200 * 10​3

3. Add mantissas:

The two mantissas are added in segment three.

Z = X + Y = 1.0324 * 10​3

4. Normalize the result:

After normalization, the result is written as:


Z = 0.1324 * 10​4

Instruction Pipeline

Pipeline processing can occur not only in the data stream but in the instruction stream
as well. Most of the digital computers with complex instructions require instruction
pipeline to carry out operations like fetch, decode and execute instructions.
In general, the computer needs to process each instruction with the following sequence
of steps.

1. Fetch instruction from memory.


2. Decode the instruction.
3. Calculate the effective address.
4. Fetch the operands from memory.
5. Execute the instruction.
6. Store the result in the proper place.

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

Each step is executed in a particular segment, and there are times when different
segments may take different times to operate on the incoming information. Moreover,
there are times when two or more segments may require memory access at the same
time, causing one segment to wait until another is finished with the memory.
The organization of an instruction pipeline will be more efficient if the instruction cycle is
divided into segments of equal duration. One of the most common examples of this type
of organization is a ​Four-segment instruction pipeline.

A ​four-segment instruction pipeline combines two or more different segments and


makes it as a single one. For instance, the decoding of the instruction can be combined
with the calculation of the effective address into one segment.
The following block diagram shows a typical example of a four-segment instruction
pipeline. The instruction cycle is completed in four segments.

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

Segment 1:

The instruction fetch segment can be implemented using first in, first out (FIFO) buffer.

Segment 2:
The instruction fetched from memory is decoded in the second segment, and
eventually, the effective address is calculated in a separate arithmetic circuit.

Segment 3:
An operand from memory is fetched in the third segment.

Segment 4:
The instructions are finally executed in the last segment of the pipeline organization.

Multi processing systems


Most computer systems are single processor systems i.e they only have one processor.
However, multiprocessor or parallel systems are increasing in importance nowadays.
These systems have multiple processors working in parallel that share the computer
clock, memory, bus, peripheral devices etc. An image demonstrating the multiprocessor
architecture is −

A Multiprocessor is a computer system with two or more central processing units


(CPUs) share full access to a common RAM. The main objective of using a
multiprocessor is to boost the system’s execution speed, with other objectives being
fault tolerance and application matching.
There are two types of multiprocessors, one is called shared memory multiprocessor
and another is distributed memory multiprocessor. In shared memory multiprocessors,

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

all the CPUs shares the common memory but in a distributed memory multiprocessor,
every CPU has its own private memory.

Applications of Multiprocessor –
1. As a uniprocessor, such as single instruction, single data stream (SISD).
2. As a multiprocessor, such as single instruction, multiple data stream
(SIMD), which is usually used for vector processing.
3. Multiple series of instructions in a single perspective, such as multiple
instruction, single data stream (MISD), which is used for describing
hyper-threading or pipelined processors.
4. Inside a single system for executing multiple, individual series of instructions
in multiple perspectives, such as multiple instruction, multiple data stream
(MIMD).

Types of Multiprocessors
There are mainly two types of multiprocessors i.e. symmetric and asymmetric
multiprocessors. Details about them are as follows −

Symmetric Multiprocessors
In these types of systems, each processor contains a similar copy of the operating
system and they all communicate with each other. All the processors are in a peer to
peer relationship i.e. no master - slave relationship exists between them.

An example of the symmetric multiprocessing system is the Encore version of Unix for
the Multimax Computer.

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

Asymmetric Multiprocessors
In asymmetric systems, each processor is given a predefined task. There is a master
processor that gives instruction to all the other processors. Asymmetric multiprocessor
system contains a master slave relationship.
Asymmetric multiprocessor was the only type of multiprocessor available before
symmetric multiprocessors were created. Now also, this is the cheaper option.
Advantages of Multiprocessor Systems
There are multiple advantages to multiprocessor systems. Some of these are −

More reliable Systems


In a multiprocessor system, even if one processor fails, the system will not halt. This
ability to continue working despite hardware failure is known as graceful degradation.
For example: If there are 5 processors in a multiprocessor system and one of them fails,
then also 4 processors are still working. So the system only becomes slower and does
not ground to a halt.
Enhanced Throughput

If multiple processors are working in tandem, then the throughput of the system
increases i.e. number of processes getting executed per unit of time increase. If there
are N processors then the throughput increases by an amount just under N.
More Economic Systems
Multiprocessor systems are cheaper than single processor systems in the long run
because they share the data storage, peripheral devices, power supplies etc. If there
are multiple processes that share data, it is better to schedule them on multiprocessor
systems with shared data than have different computer systems with multiple copies of
the data.
Disadvantages of Multiprocessor Systems
There are some disadvantages as well to multiprocessor systems. Some of these are:

Increased Expense
Even though multiprocessor systems are cheaper in the long run than using multiple
computer systems, still they are quite expensive. It is much cheaper to buy a simple
single processor system than a multiprocessor system.

Complicated Operating System Required


There are multiple processors in a multiprocessor system that share peripherals,
memory etc. So, it is much more complicated to schedule processes and impart

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

resources to processes.than in single processor systems. Hence, a more complex and


complicated operating system is required in multiprocessor systems.

Large Main Memory Required


All the processors in the multiprocessor system share the memory. So a much larger
pool of memory is required as compared to single processor systems.
Multicomputer:
A ​multicomputer system is a computer system with multiple processors that are
connected together to solve a problem. Each processor has its own memory and it is
accessible by that particular processor and those processors can communicate with
each other via an interconnection network.

As the multicomputer is capable of messages passing between the processors, it is


possible to divide the task between the processors to complete the task. Hence, a
multicomputer can be used for distributed computing. It is cost effective and easier to
build a multicomputer than a multiprocessor.
Difference between multiprocessor and Multicomputer:
1. Multiprocessor is a system with two or more central processing units (CPUs)
that is capable of performing multiple tasks whereas a multicomputer is a
system with multiple processors that are attached via an interconnection
network to perform a computation task.

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

2. A multiprocessor system is a single computer that operates with multiple


CPUs whereas a multicomputer system is a cluster of computers that
operate as a singular computer.
3. Construction of a multicomputer is easier and cost effective than a
multiprocessor.
4. In a multiprocessor system, a program tends to be easier whereas in a
multicomputer system, a program tends to be more difficult.
5. Multiprocessor supports parallel computing, Multicomputer supports
distributed computing.

Architectural classification scheme-SISD, SIMD, MISD, MIMD

Parallel computing is a computing where the jobs are broken into discrete parts that
can be executed concurrently. Each part is further broken down to a series of
instructions. Instructions from each part execute simultaneously on different CPUs.
Parallel systems deal with the simultaneous use of multiple computer resources that
can include a single computer with multiple processors, a number of computers
connected by a network to form a parallel processing cluster or a combination of both.
Parallel systems are more difficult to program than computers with a single processor
because the architecture of parallel computers varies accordingly and the processes of
multiple CPUs must be coordinated and synchronized.
The crux of parallel processing are CPUs. Based on the number of ​instruction and
data streams that can be processed simultaneously, computing systems are classified
into four major categories:

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

Flynn’s classification/taxonomy –
1. Single-instruction, single-data (SISD) systems –
An SISD computing system is a uniprocessor machine which is capable of
executing a single instruction, operating on a single data stream. In SISD,
machine instructions are processed in a sequential manner and computers
adopting this model are popularly called sequential computers. Most
conventional computers have SISD architecture. All the instructions and
data to be processed have to be stored in primary memory.

The speed of the processing element in the SISD model is


limited(dependent) by the rate at which the computer can transfer
information internally. Dominant representative SISD systems are IBM PC,
workstations.
2. Single-instruction, multiple-data (SIMD) systems –
An SIMD system is a multiprocessor machine capable of executing the
same instruction on all the CPUs but operating on different data streams.
Machines based on an SIMD model are well suited to scientific computing
since they involve lots of vector and matrix operations. So that the
information can be passed to all the processing elements (PEs) organized
data elements of vectors can be divided into multiple sets(N-sets for N PE
systems) and each PE can process one data set.

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

Dominant representative SIMD systems is Cray’s vector processing


machine.
3. Multiple-instruction, single-data (MISD) systems –
An MISD computing system is a multiprocessor machine capable of
executing different instructions on different PEs but all of them operating on
the same dataset .

Example Z = sin(x)+cos(x)+tan(x)
The system performs different operations on the same data set. Machines
built using the MISD model are not useful in most of the application, a few
machines are built, but none of them are available commercially.
4. Multiple-instruction, multiple-data (MIMD) systems –
An MIMD system is a multiprocessor machine which is capable of executing
multiple instructions on multiple data sets. Each PE in the MIMD model has
separate instruction and data streams; therefore machines built using this
model are capable to any kind of application. Unlike SIMD and MISD
machines, PEs in MIMD machines work asynchronously.

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

MIMD machines are broadly categorized into ​shared-memory MIMD and


distributed-memory MIMD based on the way PEs are coupled to the main
memory.
In the ​shared memory MIMD model (tightly coupled multiprocessor
systems), all the PEs are connected to a single global memory and they all
have access to it. The communication between PEs in this model takes
place through the shared memory, modification of the data stored in the
global memory by one PE is visible to all other PEs. Dominant
representative shared memory MIMD systems are Silicon Graphics
machines and Sun/IBM’s SMP (Symmetric Multi-Processing).
In ​Distributed memory MIMD machines (loosely coupled multiprocessor
systems) all PEs have a local memory. The communication between PEs in
this model takes place through the interconnection network (the inter
process communication channel, or IPC). The network connecting PEs can
be configured to tree, mesh or in accordance with the requirement.
The shared-memory MIMD architecture is easier to program but is less
tolerant to failures and harder to extend with respect to the distributed
memory MIMD model. Failures in a shared-memory MIMD affect the entire
system, whereas this is not the case of the distributed model, in which each
of the PEs can be easily isolated. Moreover, shared memory MIMD
architectures are less likely to scale because the addition of more PEs leads
to memory contention. This is a situation that does not happen in the case
of distributed memory, in which each PE has its own memory. As a result of
practical outcomes and user’s requirement , distributed memory MIMD
architecture is superior to the other existing models.

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

Module 5
Syllabus
Pipelining and Vector processing
Introduction to pipelining, Instruction and Arithmetic pipelines (design) Vector
processing, Array Processors.

Introduction to pipelining
Pipelining organizes the execution of the ​multiple instructions simultaneously​.
Pipelining improves the ​throughput of the system. In pipelining the instruction is
divided into the subtasks. Each subtask performs the dedicated task.
The instruction is divided into 5 subtasks: ​instruction fetch​, ​instruction
decode​, ​operand fetch​, ​instruction execution and ​operand store​. The instruction
fetch subtask will only perform the instruction fetching operation, instruction decode
subtask will only be decoding the fetched instruction and so on the other subtasks will
do.
In this section, we will discuss the types of pipelining, pipelining hazards, its
advantages. So let us start.
Content: Pipelining in Computer Architecture
1. Introduction
2. Types of Pipelining
3. Pipelining Hazards
4. Advantages
5. Key Takeaways
Introduction

Have you ever visited an industrial plant and see the assembly lines over there? How a
product passes through the assembly line and while passing it is worked on, at different
phases simultaneously. For ​example​, take a car manufacturing plant. At the first stage,
the automobile chassis is prepared, in the next stage workers add a body to the
chassis, further, the engine is installed, then painting work is done and so on. The group
of workers after working on the chassis of the first car don’t sit idle. They start working
on the chassis of the next car. And the next group take the chassis of the car and add
body to it. The same thing is repeated at every stage, after finishing the work on the
current car body they take on the next car body which is the output of the previous
stage.

Here, though the first car is completed in several hours or days, due to the assembly
line arrangement it becomes possible to have a new car at the end of an assembly line
in every clock cycle. Similarly, the concept of ​pipelining works. The output of the first
pipeline becomes the input for the next pipeline. It is like a set of data processing unit
connected in series to utilize the processor up to its maximum.

An instruction in a process is divided into 5 subtasks likely,

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

1. In the first subtask, the instruction is fetched.


2. The fetched instruction is decoded in the second stage.
3. In the third stage, the operands of the instruction are fetched.
4. In the fourth, arithmetic and logical operation are performed on the operands to
execute the instruction.
5. In the fifth stage, the result is stored in memory.

Now, understanding the division of the instruction into subtasks. Let us understand, how
the n number of instructions in a process, are pipelined.

Look at the figure below the 5 instructions are pipelined. The first instruction gets
completed in 5 clock cycle. After the completion of first instruction, in every new clock
cycle, a new instruction completes its execution.

Observe that when the Instruction fetch operation of the first instruction is completed in
the next clock cycle the instruction fetch of second instruction gets started. This way the
hardware never sits idle it is always busy in performing some or other operation. But, no
two instructions can ​execute​ their ​same stage​ at the ​same clock cycle​.

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

Types of Pipelining

In 1977 Handler and Ramamoorthy classified pipeline processors depending on their


functionality.

1. Arithmetic Pipelining

It is designed to perform high-speed floating-point addition, multiplication and division.


Here, the multiple arithmetic logic units are built in the system to perform the parallel
arithmetic computation in various data format. Examples of the arithmetic pipelined
processor are Star-100, TI-ASC, Cray-1, Cyber-205.

2. Instruction Pipelining

Here, the number of instruction are pipelined and the execution of current instruction is
overlapped by the execution of the subsequent instruction. It is also called ​instruction
lookahead​.

3. Processor Pipelining

Here, the processors are pipelined to process the ​same data stream​. The data stream
is processed by the first processor and the result is stored in the memory block. The
result in the memory block is accessed by the second processor. The second processor
reprocesses the result obtained by the first processor and the passes the refined result
to the third processor and so on.

4. Unifunction Vs. Multifunction Pipelining

The pipeline performing the precise function every time is unifunctional pipeline. On the
other hand, the pipeline performing multiple functions at a different time or multiple
functions at the same time is multifunction pipeline.

5. Static vs Dynamic Pipelining

The static pipeline performs a fixed-function each time. The static pipeline is
unifunctional. The static pipeline executes the same type of instructions continuously.
Frequent change in the type of instruction may vary the performance of the pipelining.

Dynamic pipeline performs several functions simultaneously. It is a multifunction


pipelining.

6. Scalar vs Vector Pipelining

Scalar pipelining processes the instructions with scalar operands. The vector pipeline
processes the instruction with vector operands.

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

Pipelining Hazards

Whenever a pipeline has to stall due to some reason it is called pipeline hazards. Below
we have discussed four pipelining hazards.

1. Data Dependency

Consider the following two instructions and their pipeline execution:

In the figure above, you can see that result of the ​Add instruction is stored in the
register ​R2 and we know that the final result is stored at the end of the execution of the
instruction which will happen at the clock cycle ​t4​.

But the ​Sub instruction need the value of the register ​R2 at the cycle ​t3​. So the Sub
instruction has to ​stall two clock cycles. If it doesn’t stall it will generate an incorrect
result. Thus depending of one instruction on other instruction for data is ​data
dependency​.

2. Memory Delay

When an instruction or data is required, it is first searched in the cache memory if not
found then it is a ​cache miss​. The data is further searched in the memory which may
take ten or more cycles. So, for that number of cycle the pipeline has to stall and this is
a ​memory delay hazard. The cache miss, also results in the delay of all the subsequent
instructions.

3. Branch Delay

Suppose the four instructions are pipelined I1, I2, I3, I4 in a sequence. The instruction I1
is a branch instruction and its target instruction is Ik. Now, processing starts and

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

instruction I1 is fetched, decoded and the target address is computed at the 4th stage in
cycle t3.

But till then the instructions I2, I3, I4 are fetched in cycle 1, 2 & 3 before the target
branch address is computed. As I1 is found to be a branch instruction, the instructions
I2, I3, I4 has to be discarded because the instruction Ik has to be processed next to I1.
So, this delay of three cycles 1, 2, 3 is a ​branch delay​.

Prefetching the target branch address will reduce the branch delay. Like if the target
branch is identified at the decode stage then the branch delay will reduce to 1 clock
cycle.

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

4. Resource Limitation

If the two instructions request for accessing the same resource in the same clock cycle,
then one of the instruction has to stall and let the other instruction to use the resource.
This stalling is due to ​resource limitation​. However, it can be prevented by adding
more hardware.

Advantages

1. Pipelining improves the throughput of the system.


2. In every clock cycle, a new instruction finishes its execution.
3. Allow multiple instructions to be executed concurrently.

Key Takeaways

● Pipelining divides the instruction in 5 stages instruction fetch, instruction decode,


operand fetch, instruction execution and operand store.
● The pipeline allows the execution of multiple instructions concurrently with the
limitation that no two instructions would be executed at the same stage in the
same clock cycle​.
● All the stages must process at equal speed else the slowest stage would become
the bottleneck.
● Whenever a pipeline has to stall for any reason it is a pipeline hazard.

This is all about pipelining. So, basically the pipelining is used to improve the
performance of the system by improving its efficiency.

Instruction and Arithmetic pipelines (design)


1. Arithmetic Pipeline :
An arithmetic pipeline divides an arithmetic problem into various sub problems for
execution in various pipeline segments. It is used for floating point operations,
multiplication and various other computations. The process or flowchart arithmetic
pipeline for floating point addition is shown in the diagram.

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

Floating point addition using arithmetic pipeline :


The following sub operations are performed in this case:
1. Compare the exponents.
2. Align the mantissas.

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

3. Add or subtract the mantissas.


4. Normalise the result

First of all the two exponents are compared and the larger of two exponents is chosen
as the result exponent. The difference in the exponents then decides how many times
we must shift the smaller exponent to the right. Then after shifting of exponent, both the
mantissas get aligned. Finally the addition of both numbers take place followed by
normalisation of the result in the last segment.
Example:
Let us consider two numbers,
X=0.3214*10^3 and Y=0.4500*10^2
Explanation:
First of all the two exponents are subtracted to give 3-2=1. Thus 3 becomes the
exponent of result and the smaller exponent is shifted 1 times to the right to give
Y=0.0450*10^3
Finally the two numbers are added to produce
Z=0.3664*10^3
As the result is already normalized the result remains the same.
2. Instruction Pipeline :
In this a stream of instructions can be executed by overlapping fetch, decode and
execute phases of an instruction cycle. This type of technique is used to increase the
throughput of the computer system. An instruction pipeline reads instruction from the
memory while previous instructions are being executed in other segments of the
pipeline. Thus we can execute multiple instructions simultaneously. The pipeline will be
more efficient if the instruction cycle is divided into segments of equal duration.
In the most general case computer needs to process each instruction in following
sequence of steps:
1. Fetch the instruction from memory (FI)
2. Decode the instruction (DA)
3. Calculate the effective address
4. Fetch the operands from memory (FO)
5. Execute the instruction (EX)
6. Store the result in the proper place
The flowchart for instruction pipeline is shown below.

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

Let us see an example of instruction pipeline.


Example:

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

Here the instruction is fetched on the first clock cycle in segment 1.


Now it is decoded in the next clock cycle, then operands are fetched and finally the
instruction is executed. We can see that here the fetch and decode phase overlap due
to pipelining. By the time the first instruction is being decoded, the next instruction is
fetched by the pipeline.
In case of third instruction we see that it is a branched instruction. Here when it is being
decoded 4th instruction is fetched simultaneously. But as it is a branched instruction it
may point to some other instruction when it is decoded. Thus fourth instruction is kept
on hold until the branched instruction is executed. When it gets executed then the fourth
instruction is copied back and the other phases continue as usual.

Vector processing
Vector processing performs the arithmetic operation on the large array of integers or
floating-point number. Vector processing operates on all the elements of the array in
parallel providing each pass is independent of the other. Vector processing avoids the
overhead​ of the loop control mechanism that occurs in general-purpose computers.

Introduction

We need computers that can solve mathematical problems for us which include,
arithmetic operations on the large arrays of integers or floating-point numbers quickly.
The general-purpose computer would use loops to operate on an array of integers or
floating-point numbers. But, for large array using loop would cause overhead to the
processor.

To avoid the overhead of processing loops and fasten the computation, some kind of
parallelism must be introduced. ​Vector processing operates on the entire array in just
one operation i.e. it operates on elements of the array in ​parallel​. But, vector
processing is possible only if the operations performed in parallel are ​independent​.

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

Look at the figure below, and compare the vector processing with the general computer
processing, you will notice the difference. Below, instructions in both the blocks are set
to add two arrays and store the result in the third array. Vector processing adds both the
array in parallel by avoiding the use of the loop.

Operating on multiple data in just one instruction is also called ​Single Instruction
Multiple Data (SIMD) or they are also termed as ​Vector instructions​. Now, the data
for vector instruction are stored in ​vector registers​. Each vector register is capable of
storing several data elements at a time. These several data elements in a vector
register are termed as a ​vector operand​. So, if there are n number of elements in a
vector operand then n is the ​length of the vector​.

Supercomputers were evolved to deal with billions of floating-point operations/second.


Supercomputer optimizes numerical computations (vector computations). But, along
with vector processing supercomputers are also capable of doing scalar processing.
Later, ​Array processor was introduced which particularly deals with vector processing,
they do not indulge in scalar processing.

Characteristics of Vector Processing

Each element of the vector operand is a ​scalar quantity which can either be an integer,
floating-point number, logical value or a character. Below we have classified the vector
instructions in four types.

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

Here, V is representing the vector operands and S represents the scalar operands. In
the figure below, O1 and O2 are the unary operations and O3 and O4 are the binary
operations.

Most of the vector instructions are ​pipelined as vector instruction performs the same
operation on the different data sets repeatedly. Now, the pipelining has start-up delay,
so longer vectors would perform better here. The pipelined vector processors can be
classified into two types based on from where the operand is being ​fetched for vector
processing. The two architectural classifications are Memory-to-Memory and
Register-to-Register.

In ​Memory-to-Memory vector processor the operands for instruction, the intermediate


result and the final result all these are retrieved from the ​main memory​. TI-ASC, CDC
STAR-100, and Cyber-205 use memory-to-memory format for vector instructions.

In ​Register-to-Register vector processor the source operands for instruction, the


intermediate result, and the final result all are retrieved from ​vector or scalar registers​.
Cray-1 and Fujitsu VP-200 use register-to-register format for vector instructions.

Vector Instruction

A vector instruction has the following fields:

1. Operation Code

Operation code indicates the operation that has to be performed in the given instruction.
It decides the functional unit for the specified operation or reconfigures the multifunction
unit.

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

2. Base Address

Base address field refers to the ​memory location from where the operands are to be
fetched or to where the result has to be stored. The base address is found in the
memory reference instructions. In the vector instruction, the operand and the result both
are stored in the vector registers. Here, the ​base address refers to the designated
vector register​.

3. Address Increment

A vector operand has several data elements and address increment specifies the
address of the next element in the operand​. Some computer stores the data element
consecutively in main memory for which the increment is always 1. But, some
computers that do not store the data elements consecutively requires the variable
address increment.

4. Address Offset

Address Offset is always specified related to the base address. The effective ​memory
address ​is calculated using the address offset.

5. Vector Length

Vector length specifies the ​number of elements in a vector operand​. It identifies the
termination​ of a vector instruction.

Improving Performance

In vector processing, we come across two overheads setup time and flushing time.
When the vector processing is pipelined, the time required to route the vector operands
to the functional unit is called ​Set up time​. ​Flushing time is a ​time duration that a
vector instruction takes right from its ​decoding​ until its ​first result is out​ from the pipeline.

The vector length also affects the efficiency of processing as the longer vector length
would cause overhead of subdividing the long vector for processing.

For obtaining the better performance the optimized object code must be produced in
order to utilize pipeline resources to its maximum.

1. Improving the vector instruction

We can improve the vector instruction by reducing the memory access, and maximize
resource utilization.

2. Integrate the scalar instruction

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

The scalar instruction of the same type must be integrated as a batch. As it will reduce
the overhead of reconfiguring the pipeline again and again.

3. Algorithm

Choose the algorithm that would work faster for vector pipelined processing.

4. Vectorizing Compiler

A vectorizing compiler must regenerate the parallelism by using the higher-level


programming language. In advance programming, the four-stage are identified in the
development of the parallelism. Those are

Parallel Algorithm(A)

High-level Language(L)

Efficient object code(O)

Target machine code (M)

You can see a parameter in the parenthesis at each stage which denotes the degree of
parallelism. In the ideal situation, the parameters are expected in the order A≥L≥O≥M.

Key Takeaways

● Computers having vector instruction are vector processors.


● Vector processor have the vector instructions which operates on the large array
of integer or floating-point numbers or logical values or characters, all elements
in parallel. It is called ​vectorization​.
● Vectorization is possible only if the operation performed in parallel are
independent​ of each other.
● Operands of vector instruction are stored in the ​vector register​. A vector register
stores several data elements at a time which is called ​vector operand​.
● A vector operand has several ​scalar data elements​.
● A vector instruction needs to perform the same operation on the different data
set. Hence, vector processors have a ​pipelined​ structure.
● Vector processing ignores the overhead caused due to the loops while operating
on an array.

So, this is how vector processing allows parallel operation on the large arrays and
fasten the processing speed.

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

Array Processors
Array Processor performs computations on large array of data. These are two types of
Array Processors: Attached Array Processor, and SIMD Array Processor. These are
explained as following below.
1. Attached Array Processor :
To improve the performance of the host computer in numerical computational tasks
auxiliary processor is attatched to it.

Attached array processor has two interfaces:


1. Input output interface to a common processor.
2. Interface with a local memory.
Here local memory interconnects main memory. Host computer is general purpose
computer. Attached processor is back end machine driven by the host computer.
The array processor is connected through an I/O controller to the computer & the
computer treats it as an external interface.
2. SIMD array processor :
This is computer with multiple process unit operating in parallel Both types of array
processors, manipulate vectors but their internal organization is different.

Downloaded by Dinkan Du ([email protected])


lOMoARcPSD|26759719

SIMD is a computer with multiple processing units operating in parallel. The processing
units are synchronized to perform the same operation under the control of a common
control unit. Thus providing a single instruction stream, multiple data stream (SIMD)
organization. As shown in figure, SIMD contains a set of identical processing elements
(PES) each having a local memory M.
Each PE includes –
ALU
Floating point arithmetic unit
Working registers
Master control unit controls the operation in the PEs. The function of master control unit
is to decode the instruction and determine how the instruction to be executed. If the
instruction is scalar or program control instruction then it is directly executed within the
master control unit. Main memory is used for storage of the program while each PE
uses operands stored in its local memory.

Downloaded by Dinkan Du ([email protected])

You might also like