Unit 6 - Pipeline, Vector Processing and Multiprocessors
Unit 6 - Pipeline, Vector Processing and Multiprocessors
SIMD represents an organization that includes many processing units under the supervision
of a common control unit. All processors receive the same instruction from the control unit
but operate on different items of data. The shared memory unit must contain multiple
modules so that it can communicate with all the processors simultaneously.
MISD structure is only of theoretical interest since no practical system has been constructed
using this organization.
1. Resource conflicts caused by access to memory by two segments at the same time.
Most of these conflicts can be resolved by using separate instruction and data memories.
2. Data dependency conflicts arise when an instruction depends on the result of a previous
instruction, but this result is not yet available.
3. Branch difficulties arise from branch and other instructions that change the value of PC.
(a) Multiple streaming - It is a brute-force approach which replicates the initial portions of the
pipeline and allows the pipeline to fetch both instructions, making use of two streams
(branches).
(b) Prefetch branch target - When a conditional branch is recognized, the target of the
branch is prefetched, in addition to the instruction following the branch. This target is then
saved until the branch instruction is executed. If the branch is taken, the target has already
been prefetched.
(c) Branch prediction - uses additional logic to prediction the outcomes of a (conditional)
branch before it is executed. The popular approaches are - predict never taken, predict
always taken, predict by opcode, taken/not taken switch and using branch history table.
e) Delayed branch - This technique is employed in most RISC processors. In this technique,
compiler detects the branch instructions and re-arranges the instructions by inserting
useful instructions to avoid pipeline hazards.
In many science and engineering applications, the problems can be formulated in terms of
vectors and matrices that lend themselves to vector processing.
To achieve the required level of high performance it is necessary to utilize the fastest and
most reliable hardware and apply innovative procedures from vector and parallel processing
techniques.
This is a program for adding two vectors A and B of length 100 to produce a vector C.
.A computer capable of vector processing eliminates the overhead associated with the time it takes to fetch and execute
the instructions in the program loop. It allows operations to be specified with a single vector instruction of the form
C(1 : 100) = A(1 : 100) + B(1: 100)
The vector instruction includes the initial address of the operands, the length of the vectors, and the operation to be
performed, all in one composite instruction.
For example, the number in the first row and first column of matrix C is calculated by letting i = 1, j = 1, to
obtain
Inner Product
In general, the inner product consists of the sum of k product terms of the form
In a typical application k may be equal to 100 or even 1000. The inner product calculation on a pipeline vector
processor is shown below:
Let’s take an example of a pipeline unit for floating-point addition and subtraction. The
inputs to the floating-point adder pipeline are two normalized floating-point binary numbers.
2. Multiport Memory
A multiport memory system employs separate
buses between each memory module and each
CPU.
3. Crossbar Switch
Consists of a number of cross points that are
placed at intersections between processor buses
and memory module paths.