BCA COA Full Notes
BCA COA Full Notes
BCA COA Full Notes
Module 1
Syllabus
Basic Computer Organization and Design
Operational concepts, Instruction codes, Computer Registers, Computer Instructions,
Memory locations and addresses, Instruction cycle, Timing and control, Bus
organization.
Operational concepts
A Computer has five functional independent units like Input Unit, Memory Unit,
Arithmetic & Logic Unit, Output Unit, Control Unit.
Input Unit :-
Computers take coded information via input unit. The most famous input device is the
keyboard. Whenever we press any key it is automatically being translated to
corresponding binary code & transmitted over a cable to memory or processor.
Memory Unit :-
It stores programs as well as data and there are two types- Primary and Secondary
Memory
Primary Memory is quite fast which works at electronic speed. Programs should be
stored in memory before getting executed. Random Access Memory are those memory
in which location can be accessed in a shorter period of time after specifying the
address. Primary memory is essential but expensive so we went for secondary memory
which is quite cheaper. It is used when large amount of data & programs are needed to
store, particularly the information that we dont access very frequently. Ex- Magnetic
Disks, Tapes
Arithmetic & Logic Unit :-
All the arithmetic & Logical operations are performed by ALU and this operation are
initiated once the operands are brought into the processor.
Output Unit :– It displays the processed result to outside world.
Basic Operational Concepts
■ Instructions take a vital role for the proper working of the computer.
■ An appropriate program consisting of a list of instructions is stored in the
memory so that the tasks can be started.
■ The memory brings the Individual instructions into the processor, which
executes the specified operations.
■ Data which is to be used as operands are moreover also stored in the
memory.
Example:
Add LOCA, R0
■ This instruction adds the operand at memory location LOCA to the operand
which will be present in the Register R0.
■ The above mentioned example can be written as follows:
Load LOCA, R1
Add R1, R0
■ First instruction sends the contents of the memory location LOCA into
processor Register R0, and meanwhile the second instruction adds the
contents of Register R1 and R0 and places the output in the Register R1.
The memory and the processor are swapped and are started by sending the address of
the memory location to be accessed to the memory unit and issuing the appropriate
control signals.
■ The data is then transferred to or from the memory.
Performance :-
■ Performance means how quickly a program can be executed.
1) Memory
2) MAR
3) MDR
4) PC
5) IR
6) General Purpose Registers
7) Control Unit
8) ALU
■ The instruction that is currently being executed is held by the Instruction
Register.
■ IR output is available to the control circuits, which generates the timing signal
that control the various processing elements involved in executing the
instruction.
■ The Memory address of the next instruction to be fetched and executed is
contained by the Program Counter.
■ It is a specialized register.
■ It keeps the record of the programs that are executed.
■ Role of these registers is to handle the data available in the instructions. They
store the data temporarily.
■ Two registers facilitate the communication with memory.
Working Explanation
A PC is set to point to the first instruction of the program. The contents of the PC are
transferred to the MAR and a Read control signal is sent to the memory. The addressed
word is fetched from the location which is mentioned in the MAR and loaded into MDR.
Instruction codes
While a Program, as we all know, is, A set of instructions that specify the operations,
operands, and the sequence by which processing has to occur. An instruction code is
a group of bits that tells the computer to perform a specific operation part.
Operation Code
The operation code of an instruction is a group of bits that define operations such as
add, subtract, multiply, shift and compliment. The number of bits required for the
operation code depends upon the total number of operations available on the computer.
The operation code must consist of at least n bits for a given 2^n operations. The
operation part of an instruction code specifies the operation to be performed.
Register Part
The operation must be performed on the data stored in registers. An instruction code
therefore specifies not only operations to be performed but also the registers where the
operands(data) will be found as well as the registers where the result has to be stored.
Some computers are with a single processor register, known as Accumulator (AC).
The operation is performed with the memory operand and the content of AC.
Load(LD)
The lines from the common bus are connected to the inputs of each register and data
inputs of memory. The particular register whose LD input is enabled receives the data
from the bus during the next clock pulse transition.
Before studying about instruction formats lets first study about the operand address
parts.
When the 2nd part of an instruction code specifies the operand, the instruction is said to
have immediate operand. And when the 2nd part of the instruction code specifies the
address of an operand, the instruction is said to have a direct address. And in indirect
address, the 2nd part of instruction code, specifies the address of a memory word in
which the address of the operand is found.
Computer Instructions
The basic computer has three instruction code formats. The Operation code (opcode)
part of the instruction contains 3 bits and remaining 13 bits depends upon the operation
code encountered.
There are three types of formats:
3. Input-Output Instruction
These instructions are recognized by the operation code 111 with a 1 in the leftmost bit
of instruction. The remaining 12 bits are used to specify the input-output operation.
Format of Instruction
The format of an instruction is depicted in a rectangular box symbolizing the bits of an
instruction. Basic fields of an instruction format are given below:
1. An operation code field that specifies the operation to be performed.
2. An address field that designates the memory address or register.
3. A mode field that specifies the way the operand of effective address is
determined.
Computers may have instructions of different lengths containing varying numbers of
addresses. The number of address fields in the instruction format depends upon the
internal organization of its registers.
Computer Registers
Registers are a type of computer memory used to quickly accept, store, and transfer
data and instructions that are being used immediately by the CPU. The registers used
by the CPU are often termed as Processor registers.
A processor register may hold an instruction, a storage address, or any data (such as
bit sequence or individual characters).
The computer needs processor registers for manipulating data and a register for holding
a memory address. The register holding the memory location is used to calculate the
address of the next instruction after the execution of the current instruction is
completed.
Following is the list of some of the most common registers used in a basic computer:
The following image shows the register and memory configuration for a basic computer.
● The Memory unit has a capacity of 4096 words, and each word contains 16 bits.
● The Data Register (DR) contains 16 bits which hold the operand read from the
memory location.
● The Memory Address Register (MAR) contains 12 bits which hold the address for
the memory location.
● The Program Counter (PC) also contains 12 bits which hold the address of the
next instruction to be read from memory after the current instruction is executed.
● The Accumulator (AC) register is a general purpose processing register.
● The instruction read from memory is placed in the Instruction register (IR).
● The Temporary Register (TR) is used for holding the temporary data during the
processing.
● The Input Registers (IR) holds the input characters given by the user.
● The Output Registers (OR) holds the output after processing the input data.
Computer Instructions
Computer instructions are a set of machine language instructions that a particular
processor understands and executes. A computer performs tasks on the basis of the
instruction provided.
The Register-reference instructions are represented by the Opcode 111 with a 0 in the
leftmost bit (bit 15) of the instruction.
Note: The Operation code (Opcode) of an instruction refers to a group of bits that define
arithmetic and logic operations such as add, subtract, multiply, shift, and compliment.
Input-Output instruction
Just like the Register-reference instruction, an Input-Output instruction does not need a
reference to memory and is recognized by the operation code 111 with a 1 in the
leftmost bit of the instruction. The remaining 12 bits are used to specify the type of the
input-output operation or test performed.
Note
● The three operation code bits in positions 12 through 14 should be equal to 111.
Otherwise, the instruction is a memory-reference type, and the bit in position 15
is taken as the addressing mode I.
● When the three operation code bits are equal to 111, control unit inspects the bit
in position 15. If the bit is 0, the instruction is a register-reference type.
Otherwise, the instruction is an input-output type having bit 1 at position 15.
Arithmetic, logic and shift instructions provide computational capabilities for processing
the type of data the user may wish to employ.
A huge amount of binary information is stored in the memory unit, but all computations
are done in processor registers. Therefore, one must possess the capability of moving
information between these two units.
Program control instructions such as branch instructions are used change the sequence
in which the program is executed.
Input and Output instructions act as an interface between the computer and the user.
Programs and data must be transferred into memory, and the results of computations
must be transferred back to the user.
For example, a 24-bit address generates an address-space of 2ˆ24 locations (16 MB).
BYTE-ADDRESSABILITY
A byte is always 8 bits, but the word length typically ranges from 16 to 64 bits.
Instruction cycle
A program residing in the memory unit of a computer consists of a sequence of instructions.
These instructions are executed by the processor by going through a cycle for each
instruction.
Input-Output Configuration
In computer architecture, input-output devices act as an interface between the machine and
the user.
Instructions and data stored in the memory must come from some input device. The results
are displayed to the user through some output device.
The following block diagram shows the input-output configuration for a basic computer.
Note: FGI and FGO are corresponding input and output flags which are considered as
control flip-flops.
In the hardwired organization, the control logic is implemented with gates, flip-flops,
decoders, and other digital circuits. It has the advantage that it can be optimized to
produce a fast mode of operation. In the microprogrammed organization, the control
information is stored in a control memory. The control memory is programmed to
initiate the required sequence of microoperations. A hardwired control, as the name
implies, requires changes in the wiring among the various components if the design
has to be modified or changed.
In the microprogrammed control, any required changes or modifications can be
done by updating the microprogram in control memory.
The block diagram of the control unit is shown in Fig. 5.6.
It consists of two decoders,
An instruction read from memory is placed in the instruction register (IR).position of this
register in the common bus system is indicated in Fig 5.4.
The instruction register is shown again in Fig. 5.6, where it is divided into three parts:
1. the 1 bit,
2. the operation code, and
3. bits 0 through 11.
The operation code in bits 12 through 14 are decoded with a 3 x 8 decoder. The eight
outputs of the decoder are designated by the symbols D0 through D7. The subscripted
decimal number is equivalent to the binary value of the corresponding operation code.
Bit 15 of the instruction is transferred to a flip-flop designated by the symbol I. Bits 0
through 11 are applied to the control logic gates. The 4-bit sequence counter can count
in binary from 0 through 15. The outputs of the counter are decoded into 16 timing
signals T0 through T15.
The sequence counter SC responds to the positive transition of the clock. Initially, the
CLR input of SC is active. The first positive transition of the clock clears SC to 0, which
in tum activates the timing signal T0 out of the decoder. T0 is active during one clock
cycle. The positive clock transition labeled T0 in the diagram will trigger only those
registers whose control inputs are transitioned to timing signal T0. SC is incremented
with every positive clock transition unless its CLR input is active. This produces the
sequence of timing signals T0, T1, T2, T3 ,T4 and so on, as shown in the diagram. (Note
the relationship between the timing signal and its corresponding positive clock
transition.) If SC is not cleared, the timing signals will continue with T5, T6 up to T15
and back to T0
The last three waveforms in Fig. 5-7 show how SC is cleared when D3T4 = 1. Output D3
from the operation decoder becomes active at the end of timing signal T2. When timing
signal T4 becomes active, the output of the AND gate that implements the control
function D3T4 becomes active. This signal is applied to the CLR input of SC. On the
next positive clock transition (the one marked T4 in the diagram) the counter is cleared
to 0. This causes the timing signal T0 to become active instead of T5 that would have
been active if SC were incremented instead of cleared.
A memory read or write cycle will be initiated with the rising edge of a timing signal. It
will be assumed that a memory cycle time is less than the clock cycle time. According
to this assumption, a memory read or write cycle initiated by a timing signal will be
completed by the time the next clock goes through its positive transition. The clock
transition will then be used to load the memory word into a register. This timing
relationship is not valid in many computers because the memory cycle time is usually
longer than the processor clock cycle. In such a case it is necessary to provide wait
cycles in the processor until the memory word is available. To facilitate the
presentation, we will assume that a wait period is not necessary in the basic computer.
To fully comprehend the operation of the computer, it is crucial that one understands
the timing relationship between the clock transition and the timing signals. For
example, the register transfer statement
T0: AR <__ PC
specifies a transfer of the content of PC into AR if timing signal T0 is active. T0 is active
during an entire clock cycle intervaL During this time the content of PC is placed onto
the bus (with S2S1S0 = 010) and the LD (load) input of AR is enabled. The actual
transfer does not occur until the end of the clock cycle when the clock goes through a
positive transition. This same positive clock transition increments the sequence counter
SC from 0000 to 0001 . The next clock cycle has T1 active and T0 inactive.
Bus Organization
What Is Bus
Bus is a subsystem that is used to transfer data and other information between
devices.Means various devices in computer like(Memory, CPU, I/O and Other) are
communicate with each other through buses.In general, a bus is said to be as the
communication pathway connecting two or more devices.
A key characteristics of a bus is that it is a shared transmission medium,as multiple
devices are attached to a bus.
Typically, a bus consists of multiple communication Pathways or lines which are either
in the form of wires or metal lines etched in a card or board(Printed Circuit Board).Each
line is capable of transmitting binary 1 and binary 0.
Computer System Contains a number of different buses that provide pathways between
components at various levels of computer system hierarchy.
But before discussing them i.e., types of buses , i will first describe here one of the
most important aspect of buses which is described below-
Any Bus consists,typically of form about 50 to 100 of separate lines.And on any bus, the
lines may generally be classified into three functional groups, as depicted in figure
below:
● Data Lines:
Data Lines provide a path for moving data between system modules.It is
bidirectional which means data lines are used to transfer data in both
directions.As an example, CPU can read data on these lines drom memory as
well as send data out of these lines to a memory location or to a port.And in any
bus the no. of lines in data lines are either 8,16,32 or more depending on size
of bus.These lines , collectively are called as data bus.
● Address Lines:
Address Lines are collectively called as address bus.In any bus, the no. of lines in
address are usually 16,20,24, or more depending on type and architecture of
bus.On these lines, CPU sends out the address of memory location on I/O
Port that is to be written on or read from.In short, it is an internal channel
from CPU to Memory across which the address of data(not data) are
transmitted.Here the communication is one way that is, the address is send from CPU
to Memory and I/O Port but not Memory and I/O port send address to CPU
on that line and hence these lines are unidirectional.
● Control Lines:
Control Lines are collectively called as Control Bus.Control Lines are gateway
used to transmit and receives control signals between the microprocessor and
various devices attached to it.In other words, Control Lines are used by CPUs
for Communicating with other devices within the computer.As an
example-CPU sends signals on the Control bus to enable the outputs of address
memory devices and port devices.Typical Control Lines signals are:
-Memory Read
-Memory Write
-I/O Read
-I/O Write
-Bus Request
-Bus Grant, etc.
Operation of Bus
The Operation of Bus is as follows:
● If One module wishes to send data to another, it must do two things:
Types Of Bus
There are variety of Buses, but here i will describe only those that are widely used.
● System Bus:
components of a computer system.And it is the only Bus, in which data lines, address,
control lines all are present.It is also Known as "front side " Bus.It is faster than
peripheral Bus(PCI, ISA, etc) but slower than backside Bus.
Peripheral Bus also known as "I/O Bus".It is data pathway that connects
peripheral devices to the CPU.In other words , in computing, a peripheral bus is a
computer bus designed to support computer peripheral like printers,hard drives.The PCI
and USB buses are commonly used Peripheral Buses, and are today used in commonly
many PCs.Now we will discuss both of them in brief:
SB(Universal Serial Bus): Universal Serial Bus is used to attach USB devices
U
like Pen Drive, etc to CPU.
● Local Bus:
Local Bus are the traditional I/O(Peripheral) buses such as ISA,MCA, or EISA
Buses.Now we will discuss about each in brief one by one.
ISA(Industry Standard Architecture Bus): The ISA Bus permit bus mastering
i.e., it enabled peripheral connected directly to the bus to communicate directly with
other peripherals without going through the processor.One of the consequences of bus
mastering is Direct Memory Access.Up to end of 1990s almost all PCs computers were
equipped with ISA Bus, but it was progressively replaced by the PCI Bus,which offer a
better performance.
High Speed Bus are specifically designed to support high capacity I/O
devices.High Speed Bus brings high demand devices into closer integration with the
processor.This Bus supports connection to high speed LANs, such as Fast Ethernet at
100 Mbps, video and graphic workstation, firewire etc.
Module 2
Syllabus
Central Processing Unit:
General Register Organization, Stack Organization, Addressing modes, Instruction
Classification, Program control.
Stack organization:
A stack is a storage device that stores information in a last-in, first-out (LIFO) fashion.
A stack has two operations: push, which places data onto the stack, and pop, which
removes data from the stack. A computer can have a separate memory reserved just for
stack operations.However, most utilize main memory for representing stacks. Hence, all
assembly programs should allocate memory for a stack. The SP register is initially
loaded with the address of the top of the stack. In memory, the stack is actually
upside-down, so when something is pushed onto the stack, the stack pointer is
decremented.
SP<-SP-1
M[SP] <- DR
And, when something is popped off the stack, the stack pointer is incremented.
DR<-M[SP]
SP <- SP + 1
The push and pop instructions can be explicitly executed in a program. However, they
are also implicitly executed for such things as procedure calls and interrupts as well.
Care must be taken when performing stack operations to ensure that an overflow
or underflow of the stack does not occur.
Addressing Modes
The operation field of an instruction specifies the operation to be performed. This
operation will be executed on some data which is stored in computer registers or the
main memory. The way any operand is selected during the program execution is
dependent on the addressing mode of the instruction. The purpose of using addressing
modes is as follows:
1. To give the programming versatility to the user.
2. To reduce the number of bits in addressing field of instruction.
Types of Addressing Modes
Below we have discussed different types of addressing modes one by one:
Immediate Mode
In this mode, the operand is specified in the instruction itself. An immediate mode
instruction has an operand field rather than the address field.
For example: ADD 7, which says Add 7 to contents of accumulator. 7 is the operand
here.
Register Mode
In this mode the operand is stored in the register and this register is present in CPU.
The instruction has the address of the Register where the operand is stored.
Advantages
● Shorter instructions and faster instruction fetch.
● Faster memory access to the operand(s)
Disadvantages
● Very limited address space
● Using multiple registers helps performance but it complicates the instructions.
For Example: ADD R1, 4000 - In this the 4000 is effective address of operand.
NOTE: Effective Address is the location where operand is present.
Instruction Cycle
An instruction cycle, also known as fetch-decode-execute cycle is the basic
operational process of a computer. This process is repeated continuously by CPU from
boot up to shut down of computer. Following are the steps that occur during an
instruction cycle:
Instruction Classification
Computer perform task on the basis of instruction provided. An instruction in computer
comprises of groups called fields. These field contains different information as for
computers every thing is in 0 and 1 so each field has different significance on the basis
of which a CPU decide what to perform. The most common fields are:
Zero Address Instructions –
A stack based computer do not use address field in instruction.To evaluate a expression
first it is converted to revere Polish Notation i.e. Post fix Notation.
Expression: X = (A+B)*(C+D)
Postfixed : X = AB+CD+*
TOP means top of stack
M[X] is any memory location
PUSH A TOP = A
PUSH B TOP = B
ADD TOP = A+B
PUSH C TOP = C
PUSH D TOP = D
ADD TOP = C+D
MUL TOP = (C+D)*(A+B)
POP X M[X] = TOP
Expression: X = (A+B)*(C+D)
AC is accumulator
M[] is any memory location
M[T] is temporary location
LOAD A AC = M[A]
ADD B AC = AC + M[B]
STORE T M[T] = AC
LOAD C AC = M[C]
ADD D AC = AC + M[D]
MUL T AC = AC * M[T]
STORE X M[X] = AC
Expression: X = (A+B)*(C+D)
R1, R2 are registers
M[] is any memory location
Expression: X = (A+B)*(C+D)
R1, R2 are registers
M[] is any memory location
Program Control
Program Control Instructions are the machine code that are used by machine or in
assembly language by user to command the processor act accordingly. These
instructions are of various types. These are used in assembly language by user also.
But in level language, user code is translated into machine code and thus instructions
are passed to instruct the processor do the task.
Types of Program Control Instructions:
There are different types of Program Control Instructions:
1. Compare Instruction:
Compare instruction is specifically provided, which is similar t a subtract instruction
except the result is not stored anywhere, but flags are set according to the result.
Example:
CMP R1, R2 ;
2. Unconditional Branch Instruction:
It causes an unconditional change of execution sequence to a new location.
Example:
JUMP L2
Mov R3, R1 goto L2
3. Conditional Branch Instruction:
A conditional branch instruction is used to examine the values stored in the condition
code register to determine whether the specific condition exists and to branch if it does.
Example:
Assembly Code : BE R1, R2, L1
Compiler allocates R1 for x and R2 for y
High Level Code: if (x==y) goto L1;
4. Subroutines:
A subroutine is a program fragment that lives in user space, performs a well-defined
task. It is invoked by another user program and returns control to the calling program
when finished.
Example:
CALL and RET
5. Halting Instructions:
● NOP Instruction – NOP is no operation. It cause no change in the
processor state other than an advancement of the program counter. It can
be used to synchronize timing.
Module 3
Syllabus
Memory Organization
Memory Hierarchy, Main Memory, Organization of RAM, SRAM, DRAM, Read Only Memory-
ROM-PROM,EPROM,EEPROM, Auxiliary memory, Cache memory, Virtual Memory, Memory
mapping Techniques.
Memory Hierarchy
A memory unit is the collection of storage units or devices together. The memory unit
stores the binary information in the form of bits. Generally, memory/storage is classified
into 2 categories:
● Volatile Memory: This loses its data, when power is switched off.
● Non-Volatile Memory: This is a permanent storage and does not lose any data
when power is switched off.
Fig:- Memory Hierarchy
transferred into auxiliary memory to provide space in main memory for other programs
that are currently in use.
The cache memory is used to store program data which is currently being executed in
the CPU. Approximate access time ratio between cache memory and main memory is
about 1 to 7~10
Hit Ratio
The performance of cache memory is measured in terms of a quantity called hit ratio.
When the CPU refers to memory and finds the word in cache it is said to produce a hit.
If the word is not found in cache, it is in main memory then it counts as a miss.
The ratio of the number of hits to the total CPU references to memory is called hit ratio.
Hit Ratio = Hit/(Hit + Miss)
Associative Memory
It is also known as content addressable memory (CAM). It is a memory chip in which
each bit position can be compared. In this the content is compared in each bit cell which
allows very fast table lookup. Since the entire chip can be compared, contents are
randomly stored without considering addressing scheme. These chips have less
storage capacity than regular memory chips.
Main Memory
The main memory acts as the central storage unit in a computer system. It is a relatively
large and fast memory which is used to store programs and data during the run time
operations.
The primary technology used for the main memory is based on semiconductor
integrated circuits. The integrated circuits for the main memory are classified into two
major units.
● A 128 * 8 RAM chip has a memory capacity of 128 words of eight bits (one byte)
per word. This requires a 7-bit address and an 8-bit bidirectional data bus.
● The 8-bit bidirectional data bus allows the transfer of data either from memory to
CPU during a read operation or from CPU to memory during a write operation.
● The read and write inputs specify the memory operation, and the two chip select
(CS) control inputs are for enabling the chip only when the microprocessor
selects it.
● The bidirectional data bus is constructed using three-state buffers.
● The output generated by three-state buffers can be placed in one of the three
possible states which include a signal equivalent to logic 1, a signal equal to logic
0, or a high-impedance state.
Note: The logic 1 and 0 are standard digital signals whereas the high-impedance state
behaves like an open circuit, which means that the output does not carry a signal and
has no logic significance.
The following function table specifies the operations of a 128 * 8 RAM chip.
From the functional table, we can conclude that the unit is in operation only when CS1 =
1 and CS2 = 0. The bar on top of the second select variable indicates that this input is
enabled when it is equal to 0.
Apart from the permanent storage of data, the ROM portion of main memory is needed
for storing an initial program called a bootstrap loader. The primary function of the
bootstrap loader program is to start the computer software operating when power is
turned on.
ROM chips are also available in a variety of sizes and are also used as per the system
requirement. The following block diagram demonstrates the chip interconnection in a
512 * 8 ROM chip.
● A ROM chip has a similar organization as a RAM chip. However, a ROM can
only perform read operation; the data bus can only operate in an output mode.
● The 9-bit address lines in the ROM chip specify any one of the 512 bytes stored
in it.
● The value for chip select 1 and chip select 2 must be 1 and 0 for the unit to
operate. Otherwise, the data bus is said to be in a high-impedance state.
● Stores crucial information essential to operate the system, like the program
essential to boot the computer.
● It is not volatile.
● Always retains its data.
● Used in embedded systems or where the programming needs no change.
● Used in calculators and peripheral devices.
● ROM is further classified into 4 types- ROM, PROM, EPROM, and
EEPROM.
Auxiliary memory
An Auxiliary memory is known as the lowest-cost, highest-capacity and slowest-access
storage in a computer system. It is where programs and data are kept for long-term
storage or when not in immediate use. The most common examples of auxiliary
memories are magnetic tapes and magnetic disks.
Magnetic Disks
A magnetic disk is a type of memory constructed using a circular plate of metal or
plastic coated with magnetized materials. Usually, both sides of the disks are used to
carry out read/write operations. However, several disks may be stacked on one spindle
with read/write head available on each surface.
The following image shows the structural representation for a magnetic disk.
● The memory bits are stored in the magnetized surface in spots along the
concentric circles called tracks.
● The concentric circles (tracks) are commonly divided into sections called sectors.
Magnetic Tape
Magnetic tape is a storage medium that allows data archiving, collection, and backup for
different kinds of data. The magnetic tape is constructed using a plastic strip coated with
a magnetic recording medium.
The bits are recorded as magnetic spots on the tape along several tracks. Usually,
seven or nine bits are recorded simultaneously to form a character together with a parity
bit.
Magnetic tape units can be halted, started to move forward or in reverse, or can be
rewound. However, they cannot be started or stopped fast enough between individual
characters. For this reason, information is recorded in blocks referred to as records.
Cache memory
Cache Memory is a special very high-speed memory. It is used to speed up and
synchronizing with high-speed CPU. Cache memory is costlier than main memory or
disk memory but economical than CPU registers. Cache memory is an extremely fast
memory type that acts as a buffer between RAM and the CPU. It holds frequently
requested data and instructions so that they are immediately available to the CPU when
needed.
Cache memory is used to reduce the average time to access data from the Main
memory. The cache is a smaller and faster memory which stores copies of the data
from frequently used main memory locations. There are various different independent
caches in a CPU, which store instructions and data.
Levels of memory:
● Level 1 or Register –
It is a type of memory in which data is stored and accepted that are
immediately stored in CPU. Most commonly used register is accumulator,
Program counter, address register etc.
● Level 2 or Cache memory –
It is the fastest memory which has faster access time where data is
temporarily stored for faster access.
● Level 3 or Main Memory –
It is memory on which computer works currently. It is small in size and once
power is off data no longer stays in this memory.
● Level 4 or Secondary Memory –
It is external memory which is not as fast as main memory but data stays
permanently in this memory.
Cache Performance:
When the processor needs to read or write a location in main memory, it first checks for
a corresponding entry in the cache.
● If the processor finds that the memory location is in the cache, a cache hit
has occurred and data is read from cache
● If the processor does not find the memory location in the cache, a cache
miss has occurred. For a cache miss, the cache allocates a new entry and
copies in data from main memory, then the request is fulfilled from the
contents of the cache.
The performance of cache memory is frequently measured in terms of a quantity called
Hit ratio.
Hit ratio = hit / (hit + miss) = no. of hits/total accesses
We can improve Cache performance using higher cache block size, higher associativity,
reduce miss rate, reduce miss penalty, and reduce the time to hit in the cache.
Application of Cache Memory –
Locality of reference –
Since size of cache memory is less as compared to main memory. So to check which
part of main memory should be given priority and loaded in cache is decided based on
locality of reference.
Types of Locality of reference
1. Spatial Locality of reference
This says that there is a chance that element will be present in the close
proximity to the reference point and next time if again searched then more close
proximity to the point of reference.
2. Temporal Locality of reference
In this Least recently used algorithm will be used. Whenever there is page fault
occurs within a word will not only load word in main memory but complete page
fault will be loaded because spatial locality of reference rule says that if you are
referring any word next word will be referred in its register that’s why we load
complete page table so the complete block will be loaded.
Virtual Memory
Virtual memory is a memory management technique where secondary memory can be
used as if it were a part of the main memory. Virtual memory is a very common
technique used in the operating systems (OS) of computers. Virtual memory uses
hardware and software to allow a computer to compensate for physical memory
shortages, by temporarily transferring data from random access memory (RAM) to disk
storage. In essence, virtual memory allows a computer to treat secondary memory as
though it were the main memory.
Today, most PCs come with up to around 4 GB of RAM. However, sometimes this isn't
enough to run all the programs a user might want to use at once. This is where virtual
memory comes in. Virtual memory can be used to swap data that has not been used
recently -- and move it over to a storage device like a hard drive or solid-state drive
(SDD). This will free up more space on the RAM.
Virtual memory is important for improving system performance, multitasking, using large
programs and flexibility. However, users shouldn't rely on virtual memory too much,
because using virtual data is considerably slower than the use of RAM. If the OS has to
swap data between virtual memory and RAM too often, it can make the computer feel
very slow -- this is called thrashing.
Virtual memory was developed at a time when physical memory -- also referenced as
RAM -- was expensive. Computers have a finite amount of RAM, so memory can run
out, especially when multiple programs run at the same time. A system using virtual
memory uses a section of the hard drive to emulate RAM. With virtual memory, a
system can load larger programs or multiple programs running at the same time,
allowing each one to operate as if it has infinite memory and without having to purchase
more RAM.
How virtual memory works
Virtual memory uses both computer hardware and software to work. When an
application is in use, data from that program is stored in a physical address using RAM.
More specifically, virtual memory will map that address to RAM using a memory
management unit (MMU). The OS will make and manage memory mappings by using
page tables and other data structures. The MMU, which acts as an address translation
hardware, will automatically translate the addresses.
If at any point later the RAM space is needed for something more urgent, the
data can be swapped out of RAM and into virtual memory. The computer's memory
manager is in charge of keeping track of the shifts between physical and virtual
memory. If that data is needed again, a context switch can be used to resume execution
again.
While copying virtual memory into physical memory, the OS divides memory into
pagefiles or swap files with a fixed number of addresses. Each page is stored on a disk,
and when the page is needed, the OS copies it from the disk to main memory and
translates the virtual addresses into real addresses.
However, the process of swapping virtual memory to physical is rather slow. This
means that using virtual memory generally causes a noticeable reduction in
performance. Because of swapping, computers with more RAM are seen to have better
performance.
Types of virtual memory
A computer's MMU handles memory operations, including managing virtual memory. In
most computers, the MMU hardware is integrated into the CPU. There are two ways in
which virtual memory is handled: paged and segmented.
Paging divides memory into sections or paging files, usually approximately 4 KB
in size. When a computer uses up its RAM, pages not in use are transferred to the
section of the hard drive designated for virtual memory using a swap file. A swap file is
a space set aside on the hard drive as the virtual memory extensions of the computer's
RAM. When the swap file is needed, it's sent back to RAM using a process called page
swapping. This system ensures that the computer's OS and applications don't run out of
real memory.
The paging process includes the use of page tables, which translate the virtual
addresses that the OS and applications use into the physical addresses that the MMU
uses. Entries in the page table indicate whether the page is in real memory. If the OS or
a program doesn't find what it needs in RAM, then the MMU responds to the missing
memory reference with a page fault exception to get the OS to move the page back to
memory when it's needed. Once the page is in RAM, its virtual address appears in the
page table.
Segmentation is also used to manage virtual memory. This approach divides
virtual memory into segments of different lengths. Segments not in use in memory can
be moved to virtual memory space on the hard drive. Segmented information or
processes are tracked in a segment table, which shows if a segment is present in
memory, whether it's been modified and what its physical address is. In addition, file
systems in segmentation are only made up of a list of segments mapped into a
process's potential address space.
Segmentation and paging differ as a memory model in terms of how memory is
divided; however, it can also be combined. Some virtual memory systems combine
segmentation and paging. In this case, memory gets divided into frames or pages. The
segments take up multiple pages, and the virtual address includes both the segment
number and the page number.
How to manage virtual memory
Operating systems have default settings that determine the amount of hard drive space
to allocate for virtual memory. That setting will work for most applications and
processes, but there may be times when it's necessary to manually reset the amount of
hard drive space allocated to virtual memory, such as with applications that depend on
fast response times or when the computer has multiple HDDs.
When manually resetting virtual memory, the minimum and maximum amount of
hard drive space to be used for virtual memory must be specified. Allocating too little
HDD space for virtual memory can result in a computer running out of RAM. If a system
continually needs more virtual memory space, it may be wise to consider adding RAM.
Common operating systems may generally recommend users not increasing virtual
memory beyond 1.5 times the amount of RAM.
Limitations
● The use of virtual memory has its tradeoffs, particularly with speed. It's
generally better to have as much physical memory as possible, so programs
work directly from RAM or physical memory.
● The use of virtual memory slows a computer because data must be mapped
between virtual and physical memory, which requires extra hardware support
for address translations.
● The size of virtual storage is limited by the amount of secondary storage, as
well as the addressing scheme with the computer system.
● Thrashing can happen if the amount of RAM is too small, which will make the
computer perform slower.
● It may take time to switch between applications using virtual memory.
Virtual memory vs. physical memory
When talking about the differences between virtual and physical memory, the biggest
distinction is normally seen to be in speed. RAM is considerably faster than virtual
memory. RAM, however, tends to be more expensive than virtual memory. When a
computer requires storage, RAM is the first used. Virtual memory is used when the
RAM is filled, because it's slower. Users can actively add RAM to a computer by buying
and installing more RAM chips if they are experiencing slowdowns due to memory
swaps happening too often. The amount of RAM depends on what's installed on a
computer. Virtual memory, on the other hand, is limited by the size of the computer's
hard drive. Virtual memory settings can often be controlled through the operating
system.
Cache Mapping:
There are three different types of mapping used for the purpose of cache memory which
are as follows: Direct mapping, Associative mapping, and Set-Associative mapping.
These are explained below.
1. Direct Mapping –
The simplest technique, known as direct mapping, maps each block of main memory
into only one possible cache line. Or In Direct mapping, assign each memory block to a
specific line in the cache. If a line is previously taken up by a memory block when a new
block needs to be loaded, the old block is trashed. An address space is split into two
parts: index field and a tag field. The cache is used to store the tag field whereas the
rest is stored in the main memory. Direct mapping`s performance is directly proportional
to the Hit ratio.
i = j modulo m
where
i=cache line number
j= main memory block number
m=number of lines in the cache
For purposes of cache access, each main memory address can be viewed as consisting
of three fields. The least significant w bits identify a unique word or byte within a block of
main memory. In most contemporary machines, the address is at the byte level. The
remaining s bits specify one of the 2s blocks of main memory. The cache logic
interprets these s bits as a tag of s-r bits (most significant portion) and a line field of r
bits. This latter field identifies one of the m=2r lines of the cache.
2. Associative Mapping –
In this type of mapping, the associative memory is used to store content and addresses
of the memory word. Any block can go into any line of the cache. This means that the
word id bits are used to identify which word in the block is needed, but the tag becomes
all of the remaining bits. This enables the placement of any word at any place in the
cache memory. It is considered to be the fastest and the most flexible mapping form.
3. Set-associative Mapping –
This form of mapping is an enhanced form of direct mapping where the drawbacks of
direct mapping are removed. Set associative addresses the problem of possible
thrashing in the direct mapping method. It does this by saying that instead of having
exactly one line that a block can map to in the cache, we will group a few lines together
creating a set. Then a block in memory can map to any one of the lines of a specific
set..Set-associative mapping allows that each word that is present in the cache can
have two or more words in the main memory for the same index address. Set
associative cache mapping combines the best of direct and associative cache mapping
techniques.In this case, the cache consists of a number of sets, each of which consists
of a number of lines. The relationships are
m=v*k
i= j mod v
where
i=cache set number
j=main memory block number
v=number of sets
m=number of lines in the cache number of sets
Module 4
Syllabus
Parallel Computer Structures:
Introduction to parallel processing, Pipeline computers, Multi processing systems,
Architectural classification scheme-SISD, SIMD, MISD, MIMD.
● The adder and integer multiplier performs the arithmetic operation with integer
numbers.
● The floating-point operations are separated into three circuits operating in
parallel.
● The logic, shift, and increment operations can be performed concurrently on
different data. All units are independent of each other, so one number can be
shifted while another number is being incremented.
Pipeline computers
The term Pipelining refers to a technique of decomposing a sequential process into
sub-operations, with each sub-operation being executed in a dedicated segment that
operates concurrently with all other segments.
The most important characteristic of a pipeline technique is that several computations
can be in progress in distinct segments at the same time. The overlapping of
computation is made possible by associating a register with each segment in the
pipeline. The registers provide isolation between each segment so that each can
operate on distinct data simultaneously.
The combined multiplication and addition operation is done with a stream of numbers
such as:
Ai* Bi + Ci for i = 1, 2, 3, ......., 7
The sub-operations performed in each segment of the pipeline are defined as:
R1 ← Ai, R2 ← Bi Input Ai, and Bi
R3 ← R1 * R2, R4 ← Ci Multiply, and input Ci
R5 ← R3 + R4 Add Ci to product
The following block diagram represents the combined as well as the sub-operations
performed in each segment of the pipeline.
Registers R1, R2, R3, and R4 hold the data and the combinational circuits operate in a
particular segment.
The output generated by the combinational circuit in a given segment is applied as an
input register of the next segment. For instance, from the block diagram, we can see
that the register R3 is used as one of the input registers for the combinational adder
circuit.
In general, the pipeline organization is applicable for two areas of computer design
which includes:
1. Arithmetic Pipeline
2. Instruction Pipeline
Arithmetic Pipeline
Arithmetic Pipelines are mostly used in high-speed computers. They are used to
implement floating-point operations, multiplication of fixed-point numbers, and similar
computations encountered in scientific problems.
The inputs to the floating-point adder pipeline are two normalized floating-point binary
numbers defined as:
X = A * 2a = 0.9504 * 103
Y = B * 2b = 0.8200 * 102
Where A and B are two fractions that represent the mantissa and a and b are the
exponents.
The combined operation of floating-point addition and subtraction is divided into four
segments. Each segment contains the corresponding sub-operation to be performed in
the given pipeline. The sub-operations that are shown in the four segments are:
We will discuss each sub-operation in a more detailed manner later in this section.
The following block diagram represents the sub-operations performed in each segment
of the pipeline.
Note: Registers are placed after each sub-operation to store the intermediate results.
The mantissa associated with the smaller exponent is shifted according to the difference
of exponents determined in segment one.
X = 0.9504 * 103
Y = 0.08200 * 103
3. Add mantissas:
Z = X + Y = 1.0324 * 103
Instruction Pipeline
Pipeline processing can occur not only in the data stream but in the instruction stream
as well. Most of the digital computers with complex instructions require instruction
pipeline to carry out operations like fetch, decode and execute instructions.
In general, the computer needs to process each instruction with the following sequence
of steps.
Each step is executed in a particular segment, and there are times when different
segments may take different times to operate on the incoming information. Moreover,
there are times when two or more segments may require memory access at the same
time, causing one segment to wait until another is finished with the memory.
The organization of an instruction pipeline will be more efficient if the instruction cycle is
divided into segments of equal duration. One of the most common examples of this type
of organization is a Four-segment instruction pipeline.
Segment 1:
The instruction fetch segment can be implemented using first in, first out (FIFO) buffer.
Segment 2:
The instruction fetched from memory is decoded in the second segment, and
eventually, the effective address is calculated in a separate arithmetic circuit.
Segment 3:
An operand from memory is fetched in the third segment.
Segment 4:
The instructions are finally executed in the last segment of the pipeline organization.
all the CPUs shares the common memory but in a distributed memory multiprocessor,
every CPU has its own private memory.
Applications of Multiprocessor –
1. As a uniprocessor, such as single instruction, single data stream (SISD).
2. As a multiprocessor, such as single instruction, multiple data stream
(SIMD), which is usually used for vector processing.
3. Multiple series of instructions in a single perspective, such as multiple
instruction, single data stream (MISD), which is used for describing
hyper-threading or pipelined processors.
4. Inside a single system for executing multiple, individual series of instructions
in multiple perspectives, such as multiple instruction, multiple data stream
(MIMD).
Types of Multiprocessors
There are mainly two types of multiprocessors i.e. symmetric and asymmetric
multiprocessors. Details about them are as follows −
Symmetric Multiprocessors
In these types of systems, each processor contains a similar copy of the operating
system and they all communicate with each other. All the processors are in a peer to
peer relationship i.e. no master - slave relationship exists between them.
An example of the symmetric multiprocessing system is the Encore version of Unix for
the Multimax Computer.
Asymmetric Multiprocessors
In asymmetric systems, each processor is given a predefined task. There is a master
processor that gives instruction to all the other processors. Asymmetric multiprocessor
system contains a master slave relationship.
Asymmetric multiprocessor was the only type of multiprocessor available before
symmetric multiprocessors were created. Now also, this is the cheaper option.
Advantages of Multiprocessor Systems
There are multiple advantages to multiprocessor systems. Some of these are −
If multiple processors are working in tandem, then the throughput of the system
increases i.e. number of processes getting executed per unit of time increase. If there
are N processors then the throughput increases by an amount just under N.
More Economic Systems
Multiprocessor systems are cheaper than single processor systems in the long run
because they share the data storage, peripheral devices, power supplies etc. If there
are multiple processes that share data, it is better to schedule them on multiprocessor
systems with shared data than have different computer systems with multiple copies of
the data.
Disadvantages of Multiprocessor Systems
There are some disadvantages as well to multiprocessor systems. Some of these are:
Increased Expense
Even though multiprocessor systems are cheaper in the long run than using multiple
computer systems, still they are quite expensive. It is much cheaper to buy a simple
single processor system than a multiprocessor system.
Parallel computing is a computing where the jobs are broken into discrete parts that
can be executed concurrently. Each part is further broken down to a series of
instructions. Instructions from each part execute simultaneously on different CPUs.
Parallel systems deal with the simultaneous use of multiple computer resources that
can include a single computer with multiple processors, a number of computers
connected by a network to form a parallel processing cluster or a combination of both.
Parallel systems are more difficult to program than computers with a single processor
because the architecture of parallel computers varies accordingly and the processes of
multiple CPUs must be coordinated and synchronized.
The crux of parallel processing are CPUs. Based on the number of instruction and
data streams that can be processed simultaneously, computing systems are classified
into four major categories:
Flynn’s classification/taxonomy –
1. Single-instruction, single-data (SISD) systems –
An SISD computing system is a uniprocessor machine which is capable of
executing a single instruction, operating on a single data stream. In SISD,
machine instructions are processed in a sequential manner and computers
adopting this model are popularly called sequential computers. Most
conventional computers have SISD architecture. All the instructions and
data to be processed have to be stored in primary memory.
Example Z = sin(x)+cos(x)+tan(x)
The system performs different operations on the same data set. Machines
built using the MISD model are not useful in most of the application, a few
machines are built, but none of them are available commercially.
4. Multiple-instruction, multiple-data (MIMD) systems –
An MIMD system is a multiprocessor machine which is capable of executing
multiple instructions on multiple data sets. Each PE in the MIMD model has
separate instruction and data streams; therefore machines built using this
model are capable to any kind of application. Unlike SIMD and MISD
machines, PEs in MIMD machines work asynchronously.
Module 5
Syllabus
Pipelining and Vector processing
Introduction to pipelining, Instruction and Arithmetic pipelines (design) Vector
processing, Array Processors.
Introduction to pipelining
Pipelining organizes the execution of the multiple instructions simultaneously.
Pipelining improves the throughput of the system. In pipelining the instruction is
divided into the subtasks. Each subtask performs the dedicated task.
The instruction is divided into 5 subtasks: instruction fetch, instruction
decode, operand fetch, instruction execution and operand store. The instruction
fetch subtask will only perform the instruction fetching operation, instruction decode
subtask will only be decoding the fetched instruction and so on the other subtasks will
do.
In this section, we will discuss the types of pipelining, pipelining hazards, its
advantages. So let us start.
Content: Pipelining in Computer Architecture
1. Introduction
2. Types of Pipelining
3. Pipelining Hazards
4. Advantages
5. Key Takeaways
Introduction
Have you ever visited an industrial plant and see the assembly lines over there? How a
product passes through the assembly line and while passing it is worked on, at different
phases simultaneously. For example, take a car manufacturing plant. At the first stage,
the automobile chassis is prepared, in the next stage workers add a body to the
chassis, further, the engine is installed, then painting work is done and so on. The group
of workers after working on the chassis of the first car don’t sit idle. They start working
on the chassis of the next car. And the next group take the chassis of the car and add
body to it. The same thing is repeated at every stage, after finishing the work on the
current car body they take on the next car body which is the output of the previous
stage.
Here, though the first car is completed in several hours or days, due to the assembly
line arrangement it becomes possible to have a new car at the end of an assembly line
in every clock cycle. Similarly, the concept of pipelining works. The output of the first
pipeline becomes the input for the next pipeline. It is like a set of data processing unit
connected in series to utilize the processor up to its maximum.
Now, understanding the division of the instruction into subtasks. Let us understand, how
the n number of instructions in a process, are pipelined.
Look at the figure below the 5 instructions are pipelined. The first instruction gets
completed in 5 clock cycle. After the completion of first instruction, in every new clock
cycle, a new instruction completes its execution.
Observe that when the Instruction fetch operation of the first instruction is completed in
the next clock cycle the instruction fetch of second instruction gets started. This way the
hardware never sits idle it is always busy in performing some or other operation. But, no
two instructions can execute their same stage at the same clock cycle.
Types of Pipelining
1. Arithmetic Pipelining
2. Instruction Pipelining
Here, the number of instruction are pipelined and the execution of current instruction is
overlapped by the execution of the subsequent instruction. It is also called instruction
lookahead.
3. Processor Pipelining
Here, the processors are pipelined to process the same data stream. The data stream
is processed by the first processor and the result is stored in the memory block. The
result in the memory block is accessed by the second processor. The second processor
reprocesses the result obtained by the first processor and the passes the refined result
to the third processor and so on.
The pipeline performing the precise function every time is unifunctional pipeline. On the
other hand, the pipeline performing multiple functions at a different time or multiple
functions at the same time is multifunction pipeline.
The static pipeline performs a fixed-function each time. The static pipeline is
unifunctional. The static pipeline executes the same type of instructions continuously.
Frequent change in the type of instruction may vary the performance of the pipelining.
Scalar pipelining processes the instructions with scalar operands. The vector pipeline
processes the instruction with vector operands.
Pipelining Hazards
Whenever a pipeline has to stall due to some reason it is called pipeline hazards. Below
we have discussed four pipelining hazards.
1. Data Dependency
In the figure above, you can see that result of the Add instruction is stored in the
register R2 and we know that the final result is stored at the end of the execution of the
instruction which will happen at the clock cycle t4.
But the Sub instruction need the value of the register R2 at the cycle t3. So the Sub
instruction has to stall two clock cycles. If it doesn’t stall it will generate an incorrect
result. Thus depending of one instruction on other instruction for data is data
dependency.
2. Memory Delay
When an instruction or data is required, it is first searched in the cache memory if not
found then it is a cache miss. The data is further searched in the memory which may
take ten or more cycles. So, for that number of cycle the pipeline has to stall and this is
a memory delay hazard. The cache miss, also results in the delay of all the subsequent
instructions.
3. Branch Delay
Suppose the four instructions are pipelined I1, I2, I3, I4 in a sequence. The instruction I1
is a branch instruction and its target instruction is Ik. Now, processing starts and
instruction I1 is fetched, decoded and the target address is computed at the 4th stage in
cycle t3.
But till then the instructions I2, I3, I4 are fetched in cycle 1, 2 & 3 before the target
branch address is computed. As I1 is found to be a branch instruction, the instructions
I2, I3, I4 has to be discarded because the instruction Ik has to be processed next to I1.
So, this delay of three cycles 1, 2, 3 is a branch delay.
Prefetching the target branch address will reduce the branch delay. Like if the target
branch is identified at the decode stage then the branch delay will reduce to 1 clock
cycle.
4. Resource Limitation
If the two instructions request for accessing the same resource in the same clock cycle,
then one of the instruction has to stall and let the other instruction to use the resource.
This stalling is due to resource limitation. However, it can be prevented by adding
more hardware.
Advantages
Key Takeaways
This is all about pipelining. So, basically the pipelining is used to improve the
performance of the system by improving its efficiency.
First of all the two exponents are compared and the larger of two exponents is chosen
as the result exponent. The difference in the exponents then decides how many times
we must shift the smaller exponent to the right. Then after shifting of exponent, both the
mantissas get aligned. Finally the addition of both numbers take place followed by
normalisation of the result in the last segment.
Example:
Let us consider two numbers,
X=0.3214*10^3 and Y=0.4500*10^2
Explanation:
First of all the two exponents are subtracted to give 3-2=1. Thus 3 becomes the
exponent of result and the smaller exponent is shifted 1 times to the right to give
Y=0.0450*10^3
Finally the two numbers are added to produce
Z=0.3664*10^3
As the result is already normalized the result remains the same.
2. Instruction Pipeline :
In this a stream of instructions can be executed by overlapping fetch, decode and
execute phases of an instruction cycle. This type of technique is used to increase the
throughput of the computer system. An instruction pipeline reads instruction from the
memory while previous instructions are being executed in other segments of the
pipeline. Thus we can execute multiple instructions simultaneously. The pipeline will be
more efficient if the instruction cycle is divided into segments of equal duration.
In the most general case computer needs to process each instruction in following
sequence of steps:
1. Fetch the instruction from memory (FI)
2. Decode the instruction (DA)
3. Calculate the effective address
4. Fetch the operands from memory (FO)
5. Execute the instruction (EX)
6. Store the result in the proper place
The flowchart for instruction pipeline is shown below.
Vector processing
Vector processing performs the arithmetic operation on the large array of integers or
floating-point number. Vector processing operates on all the elements of the array in
parallel providing each pass is independent of the other. Vector processing avoids the
overhead of the loop control mechanism that occurs in general-purpose computers.
Introduction
We need computers that can solve mathematical problems for us which include,
arithmetic operations on the large arrays of integers or floating-point numbers quickly.
The general-purpose computer would use loops to operate on an array of integers or
floating-point numbers. But, for large array using loop would cause overhead to the
processor.
To avoid the overhead of processing loops and fasten the computation, some kind of
parallelism must be introduced. Vector processing operates on the entire array in just
one operation i.e. it operates on elements of the array in parallel. But, vector
processing is possible only if the operations performed in parallel are independent.
Look at the figure below, and compare the vector processing with the general computer
processing, you will notice the difference. Below, instructions in both the blocks are set
to add two arrays and store the result in the third array. Vector processing adds both the
array in parallel by avoiding the use of the loop.
Operating on multiple data in just one instruction is also called Single Instruction
Multiple Data (SIMD) or they are also termed as Vector instructions. Now, the data
for vector instruction are stored in vector registers. Each vector register is capable of
storing several data elements at a time. These several data elements in a vector
register are termed as a vector operand. So, if there are n number of elements in a
vector operand then n is the length of the vector.
Each element of the vector operand is a scalar quantity which can either be an integer,
floating-point number, logical value or a character. Below we have classified the vector
instructions in four types.
Here, V is representing the vector operands and S represents the scalar operands. In
the figure below, O1 and O2 are the unary operations and O3 and O4 are the binary
operations.
Most of the vector instructions are pipelined as vector instruction performs the same
operation on the different data sets repeatedly. Now, the pipelining has start-up delay,
so longer vectors would perform better here. The pipelined vector processors can be
classified into two types based on from where the operand is being fetched for vector
processing. The two architectural classifications are Memory-to-Memory and
Register-to-Register.
Vector Instruction
1. Operation Code
Operation code indicates the operation that has to be performed in the given instruction.
It decides the functional unit for the specified operation or reconfigures the multifunction
unit.
2. Base Address
Base address field refers to the memory location from where the operands are to be
fetched or to where the result has to be stored. The base address is found in the
memory reference instructions. In the vector instruction, the operand and the result both
are stored in the vector registers. Here, the base address refers to the designated
vector register.
3. Address Increment
A vector operand has several data elements and address increment specifies the
address of the next element in the operand. Some computer stores the data element
consecutively in main memory for which the increment is always 1. But, some
computers that do not store the data elements consecutively requires the variable
address increment.
4. Address Offset
Address Offset is always specified related to the base address. The effective memory
address is calculated using the address offset.
5. Vector Length
Vector length specifies the number of elements in a vector operand. It identifies the
termination of a vector instruction.
Improving Performance
In vector processing, we come across two overheads setup time and flushing time.
When the vector processing is pipelined, the time required to route the vector operands
to the functional unit is called Set up time. Flushing time is a time duration that a
vector instruction takes right from its decoding until its first result is out from the pipeline.
The vector length also affects the efficiency of processing as the longer vector length
would cause overhead of subdividing the long vector for processing.
For obtaining the better performance the optimized object code must be produced in
order to utilize pipeline resources to its maximum.
We can improve the vector instruction by reducing the memory access, and maximize
resource utilization.
The scalar instruction of the same type must be integrated as a batch. As it will reduce
the overhead of reconfiguring the pipeline again and again.
3. Algorithm
Choose the algorithm that would work faster for vector pipelined processing.
4. Vectorizing Compiler
Parallel Algorithm(A)
High-level Language(L)
You can see a parameter in the parenthesis at each stage which denotes the degree of
parallelism. In the ideal situation, the parameters are expected in the order A≥L≥O≥M.
Key Takeaways
So, this is how vector processing allows parallel operation on the large arrays and
fasten the processing speed.
Array Processors
Array Processor performs computations on large array of data. These are two types of
Array Processors: Attached Array Processor, and SIMD Array Processor. These are
explained as following below.
1. Attached Array Processor :
To improve the performance of the host computer in numerical computational tasks
auxiliary processor is attatched to it.
SIMD is a computer with multiple processing units operating in parallel. The processing
units are synchronized to perform the same operation under the control of a common
control unit. Thus providing a single instruction stream, multiple data stream (SIMD)
organization. As shown in figure, SIMD contains a set of identical processing elements
(PES) each having a local memory M.
Each PE includes –
ALU
Floating point arithmetic unit
Working registers
Master control unit controls the operation in the PEs. The function of master control unit
is to decode the instruction and determine how the instruction to be executed. If the
instruction is scalar or program control instruction then it is directly executed within the
master control unit. Main memory is used for storage of the program while each PE
uses operands stored in its local memory.