Differentiate Organization and Architecture.: Advanced Computer Architechture Assignment 1
Differentiate Organization and Architecture.: Advanced Computer Architechture Assignment 1
RISC processor has 5 stage instruction pipelines to execute all the instructions in the RISC
instruction set. Following are the 5 stages of RISC pipeline with their respective operations:
Consider a ‘k’ segment pipeline with clock cycle time as ‘Tp’. Let there be ‘n’ tasks to be
completed in the pipelined processor. Now, the first instruction is going to take ‘k’ cycles to
come out of the pipeline but the other ‘n – 1’ instructions will take only ‘1’ cycle each, i.e., a total
of ‘n – 1’ cycles. So, time taken to execute ‘n’ instructions in a pipelined processor:
ETpipeline = k + n – 1 cycles
= (k + n – 1) Tp
In the same case, for a non-pipelined processor, execution time of ‘n’ instructions will be:
ETnon-pipeline = n * k * Tp
So, speedup (S) of the pipelined processor over non-pipelined processor, when ‘n’ tasks are
executed on the same processor is:
S = Performance of pipelined processor /Performance of Non-pipelined processor
As the performance of a processor is inversely proportional to the execution time, we have,
S = ETnon-pipeline / ETpipeline
=> S = [n * k * Tp] / [(k + n – 1) *
Tp] S = [n * k] / [k + n – 1]
When the number of tasks ‘n’ are significantly larger than k, that is, n >> k
S=n*k/n
S=k
Where ‘k’ are the number of stages in the pipeline.
Chaining starts when a match occurs between one of the V register operand designators of an
instruction awaiting issue in CIP, and the V register result designator of a previously issued
instruction which has not yet returned its first result element. When this element becomes
available for delivery to the result register, the instruction in CIP is issued (provided there are no
other hold-ups) and the result element is forwarded with this instruction to the appropriate
functional unit. Successive elements follow until the whole vector has been both written into its
result register and forwarded to the second functional unit. The results of this second vector
operation may themselves be chained into a third operation, and so on, as shown in the example
in diagram (a).
Diagram (a) shows a schematic representation of the execution of the instruction sequence
V0 <- Memory
V1 <- V0 * S1
V3 <- V1 + V2
The first instruction causes 64 operands from a designated area in memory to be read out and
copied in sequence into the 64 element positions in V0. Store requests are pipelined in such a
way that the store appears to the processor as a pseudo functional unit. Thus after a start-up
delay of seven clock periods, the first element of the vector from store becomes available for
delivery to V0, and successive elements follow in successive clock periods.
In the clock period following the issue of the first instruction, the second instruction in the
sequence is copied into CIP, but the reservation on V0 prevents it from being issued
immediately. This reservation is lifted, however, allowing the instruction to issue, during the
clock period in which the first vector element arrives from store ready for delivery to V0. This
clock period is known as chain slot time. Chaining allows the vector elements being copied into
V0 to flow directly from the memory read pipeline into the Floating-point Multiply Unit pipeline,
where each element is multiplied by the value taken from S1 at the start of the operation, to
produce the vector V1.
The third instruction in the sequence becomes ready for issue in the clock period following issue
of the second instruction, and it too is held up by a reservation on one of its input operands, this
time V1. When the first element of V1 appears from the Floating-point Multiply Unit, the
reservation on V1 is lifted, allowing this third instruction to issue. Now the elements emanating
from the Floating-point Multiply Unit can flow directly into the Floating-point Add Unit pipeline
as well as into the result register V1. Thus the memory read pipeline, and the Floating-point
Multiply and Floating-point Add Unit pipelines are all chained together to produce the elements
of V3, and the need for all pipelines to have the same segment time now becomes very
apparent; pipelines such as those in the CDC 7600, which have different segment times, could
not be chained together in this way.