COA Unit-5
COA Unit-5
(CS304PC)
D.Koteshwar Rao
Assistant Professor,
Department of ECE
Unit-5
The horizontal axis displays the time in clock cycles and the
vertical axis gives the segment number.
Pipelining
General Considerations
The diagram shows six tasks T1 through T6 executed in
four segments. Initially, task T1 is handled by segment 1.
After the first clock, segment 2 is busy with T 1, while
segment 1 is busy with task T2.
Continuing in this manner, the first task T 1 is completed
after the fourth clock cycle. From then on, the pipe
completes a task every clock cycle. Once the pipeline is full,
it takes only one clock period to obtain an output.
Pipelining
General Considerations
Consider the case where a k-segment pipeline with a clock
cycle time tp is used to execute n tasks. The first task T1
requires a time equal to ktp to complete its operation since
there are k segments in the pipe.
The remaining n - 1 tasks emerge from the pipe at the rate
of one task per clock cycle and they will be completed after
a time equal to (n - 1)tp.
Therefore, to complete n tasks using a k-segment pipeline
requires k + (n - 1) clock cycles.
Pipelining
General Considerations
Consider a non pipeline unit that performs the same
operation and takes a time equal to tn to complete each task.
The total time required for n tasks is ntn.
The speedup of a pipeline processing over an equivalent
non pipeline processing is defined by the ratio
Pipelining
General Considerations
As the number of tasks increases, n becomes much larger
than k-1, and k + n - 1 approaches the value of n. Under this
condition, the speedup becomes
For example, the number in the first row and first column of
matrix C is calculated by letting i = 1, j = 1, to obtain
C11 = a11b11 + a12b21 + a13b31
Vector Processing
Matrix Multiplication
This requires three multiplications and (after initializing C11
to 0) three additions.
The total number of multiplications or additions required to
compute the matrix product is 9 x 3 = 27.
If we consider the linked multiply-add operation c + a x b as
a cumulative operation, the product of two n x n matrices
requires n3 multiply-add operations.
The computation consists of n2 inner products, with each
inner product requiring n multiply-add operations, assuming
that c is initialized to zero before computing each element in
the product matrix.
Vector Processing
Matrix Multiplication
In general, the inner product consists of the sum of k
product terms of the form
Address
Bus
Multi Processors
Characteristics of Multiprocessors
A multiprocessor system is an interconnection of two or
more CPUs with memory and input-output equipment.
The term "processor" in multiprocessor can mean either a
central processing unit (CPU) or an input-output processor
(lOP).
However, a system with a single CPU and one or more lOPs
is usually not included in the definition of a multiprocessor
system unless the lOP has computational facilities
comparable to a CPU.
As it is most commonly defined, a multiprocessor system
implies the existence of multiple CPUs, although usually
there will be one or more lOPs as well.
Characteristics of Multiprocessors
A multiprocessor system is controlled by one operating
system that provides interaction between processors and all
the components of the system cooperate in the solution of a
problem.
Multiprocessing improves the reliability of the system so that
a failure or error in one part has a limited effect on the rest of
the system.
If a fault causes one processor to fail, a second processor can
be assigned to perform the functions of the disabled
processor. The system as a whole can continue to function
correctly with perhaps some loss in efficiency.
The benefit derived from a multiprocessor organization is an
improved system performance.
Characteristics of Multiprocessors
The system derives its high performance from the fact that
computations can proceed in parallel in one of two ways.
1. Multiple independent jobs can be made to operate in parallel.
2. A single job can be partitioned into multiple parallel tasks.
An example is a computer system where one processor performs
the computations for an industrial process control while others
monitor and control the various parameters, such as temperature
and flow rate.
Another example is a computer where one processor performs
high speed floating-point mathematical computations and
another takes care of routine data-processing tasks.
Characteristics of Multiprocessors
An overall function can be partitioned into a number of
tasks that each processor can handle individually.
Multiprocessing can improve performance by decomposing
a program into parallel executable tasks.
Multiprocessors are classified by the way their memory is
organized. A multiprocessor system with common shared
memory is classified as a shared memory or tightly coupled
multiprocessor.
An alternative model of microprocessor is the distributed-
memory or loosely coupled system. Each processor element
in a loosely coupled system has its own private local
memory.
Interconnection Structures
The components that form a multiprocessor system are
CPUs, lOPs connected to input-output devices, and a
memory unit that may be partitioned into a number of
separate modules.
The interconnection between the components can have
different physical configurations, depending on the number
of transfer paths that are available between the processors
and memory in a shared memory system or among the
processing elements in a loosely coupled system.
Interconnection Structures
Some physical forms available for establishing an
interconnection network are:
1. Time-shared common bus
2. Multiport memory
3. Crossbar switch
4. Multistage switching network
5. Hypercube system
Interconnection Structures
Time-Shared Common Bus
A common-bus multiprocessor system consists of a number
of processors connected through a common path to a
memory unit. A time-shared common bus for five
processors is shown in Figure.
Only one processor can communicate with the memory or
another processor at any given time.
Interconnection Structures
Time-Shared Common Bus
Transfer operations are conducted by the processor that is in
control of the bus at the time.
Any other processor wishing to initiate a transfer must first
determine the availability status of the bus, and only after
the bus becomes available can the processor address the
destination unit to initiate the transfer.
A command is issued to inform the destination unit what
operation is to be performed . The receiving unit recognizes
its address in the bus and responds to the control signals
from the sender, after which the transfer is initiated.
Interconnection Structures
Time-Shared Common Bus
The system may exhibit transfer conflicts since one
common bus is shared by all processors.
These conflicts must be resolved by incorporating a bus
controller that establishes priorities among the requesting
units.
A single common-bus system is restricted to one transfer at
a time.
A more economical implementation of a dual bus structure
is depicted in Figure.
Interconnection Structures
Time-Shared Common Bus
Each local bus may be connected to a CPU, an IOP or any
combination of processors.