0% found this document useful (0 votes)
96 views11 pages

Silver Oak College of Engineering and Technology: Computer Organization Module Solution - 4

Instruction pipelining allows overlapping of instruction fetch, decode, execute stages to improve processor throughput. It reads instructions from memory in advance while previous instructions are executing. A potential issue is branches, which can disrupt the pipeline by changing the instruction sequence. An example 4-stage pipeline is described with fetch, decode/address, operand fetch, and execute stages. It achieves up to 4 instructions in progress simultaneously.

Uploaded by

sodutu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
96 views11 pages

Silver Oak College of Engineering and Technology: Computer Organization Module Solution - 4

Instruction pipelining allows overlapping of instruction fetch, decode, execute stages to improve processor throughput. It reads instructions from memory in advance while previous instructions are executing. A potential issue is branches, which can disrupt the pipeline by changing the instruction sequence. An example 4-stage pipeline is described with fetch, decode/address, operand fetch, and execute stages. It achieves up to 4 instructions in progress simultaneously.

Uploaded by

sodutu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

SILVER OAK COLLEGE OF ENGINEERING and TECHNOLOGY

Computer Organization
Module Solution -4

Q. 21 Discuss the Instruction Pipelining with example.


Ans. ● Pipeline processing can occur in data stream as well as in instruction stream.
● An instruction pipeline reads consecutive instructions from memory while
previous instructions are being executed in other segments.
● This causes the instruction fetch and executes phases to overlap and perform
simultaneous operations.
● One possible digression associated with such a scheme is that an instruction
may cause a branch out of sequence.
● In that case the pipeline must be emptied and all the instructions that have been
read from memory after the branch instruction must be discarded.
● Consider a computer with an instruction fetch unit and an instruction execution
unit designed to provide a two-segment pipeline.
● The instruction fetch segment can be implemented by means of a first-in,
first-out (FIFO) buffer.
● The buffer acts as a queue from which control then extracts the instructions for
the execution unit.

Instruction cycle:

● The fetch and execute to process an instruction completely.


● In the most general case, the computer needs to process each instruction with
the following sequence of steps
1. Fetch the instruction from memory.
2. Decode the instruction.
3. Calculate the effective address.
4. Fetch the operands from memory.
5. Execute the instruction.
6. Store the result in the proper place.
● There are certain difficulties that will prevent the instruction pipeline from
operating at its maximum rate.
● Different segments may take different times to operate on the incoming
information.
● Some segments are skipped for certain operations.
● The design of an instruction pipeline will be most efficient if the instruction
cycle is divided into segments of equal duration.
● The time that each step takes to fulfill its function depends on the instruction
and the way it is executed.
Example: Four-Segment Instruction Pipeline

● Assume that the decoding of the instruction can be combined with the
calculation of the effective address into one segment.
● Assume further that most of the instructions place the result into a processor
registers so that the instruction execution and storing of the result can be
combined into one segment.
● This reduces the instruction pipeline into four segments.
1. FI: Fetch an instruction from memory
2. DA: Decode the instruction and calculate the effective address of the operand
3. FO: Fetch the operand
4. EX: Execute the operation

● Figure 1 shows​, how the instruction cycle in the CPU can be processed with a
four-segment pipeline.
● While an instruction is being executed in segment 4, the next instruction in
sequence is busy fetching an operand from memory in segment 3.
● The effectiv888*

● e address may be calculated in a separate arithmetic circuit for the third
instruction, and whenever the memory is available, the fourth and all subsequent
instructions can be fetched and placed in an instruction FIFO.
● Thus up to four sub operations in the instruction cycle can overlap and up to
four different instructions can be in progress of being processed at the same
time.

Fig 1 four segment CPU pipeline

Figure 2 shows​ the operation of the instruction pipeline. The time in the horizontal axis
is divided into steps of equal duration. The four segments are represented in the
diagram with an abbreviated symbol.

Fig 2

Timing of instruction pipeline


● It is assumed that the processor has separate instruction and data memories so
that the operation in FI and FO can proceed at the same time.
● Thus, in step 4, instruction 1 is being executed in segment EX; the operand for
instruction 2 is being fetched in segment FO; instruction 3 is being decoded in
segment DA; and instruction 4 is being fetched from memory in segment FI.
● Assume now that instruction 3 is a branch instruction.
● As soon as this instruction is decoded in segment DA in step 4, the transfer from
FI to DA of the other instructions is halted until the branch instruction is
executed in step 6.
● If the branch is taken, a new instruction is fetched in step 7. If the branch is not
taken, the instruction fetched previously in step 4 can be used.
● The pipeline then continues until a new branch instruction is encountered.
● Another delay may occur in the pipeline if the EX segment needs to store the
result of the operation in the data memory while the FO segment needs to fetch
an operand.
● In that case, segment FO must wait until segment EX has finished its operation.
Q.22 What is Array Processors? Explain Attached Array Processors and
SIMD Array Processors.
Ans. ● An array processor is a processor that performs computations on large
arrays of data.
● The term is used to refer to two different types of processors, attached
array processor and SIMD array processor.
● An attached array processor is an auxiliary processor attached to a
general-purpose computer.
● It is intended to improve the performance of the host computer in specific
numerical computation tasks.
● An SIMD array processor is a processor that has a single-instruction
multiple-data organization.
● It manipulates vector instructions by means of multiple functional units
responding to a common instruction.
Attached Array Processor

● An attached array processor is designed as a peripheral for a


conventional host
computer, and its purpose is to enhance the performance of the computer by
providing vector processing for complex scientific applications.
● It achieves high performance by means of parallel processing with
multiple functional units.
● It includes an arithmetic unit containing one or more pipelined
floating-point adders and multipliers.
● The array processor can be programmed by the user to accommodate a
variety of complex arithmetic problems.
● Figure 3 shows ​the interconnection of an attached array processor to
host computer.
Fig 3

Attached Array Processor with host computer


● The host computer is a general-purpose commercial computer and the
attached
processor is a back-end machine driven by the host computer.
● The array processor is connected through an input-output controller to
the computer and the computer treats it like an external interface.
● The data for the attached processor are transferred from main memory
to a local
memory through a high-speed bus.
● he general-purpose computer without the attached processor serves the
users that need conventional data processing.
● The system with the attached processor satisfies the needs for complex
arithmetic applications.
SIMD Array Processor

● An SIMD array processor is a computer with multiple processing units


operating in parallel.
● The processing units are synchronized to perform the same operation
under the control of a common control unit, thus providing a single
instruction stream, multiple data stream (SIMD) organization.
● A general block diagram of an array processor is shown in Figure 4

Fig 4
SIMD
Array
Processor Organization
● It contains a set of identical processing elements (PEs), each having a
local memory M.
● Each processor element includes an ALU, a floating-point arithmetic unit
and working registers.
● The master control unit controls the operations in the processor
elements.
● The main memory is used for storage of the program.
● The function of the master control unit is to decode the instructions and
determine how the instruction is to be executed.
● Scalar and program control instructions are directly executed within the
master control unit.
● Vector instructions are broadcast to all PEs simultaneously.
● Vector operands are distributed to the local memories prior to the
parallel execution of the instruction.
● Masking schemes are used to control the status of each PE during the
execution of vector instructions.
● Each PE has a flag that is set when the PE is active and reset when the
PE is inactive.
● This ensures that only that PE’S that needs to participate is active during
the execution of the instruction.
● SIMD processors are highly specialized computers.
Q.23 What is pipeline conflict? Explain data dependency and handling of branch
instruction in detail.
Ans. Pipeline conflict:
There are three major difficulties that cause the instruction pipeline conflicts.
1. Resource conflicts caused by access to memory by two segments at the
same time.
2. Data dependency conflicts arise when an instruction depends on the result of
a previous instruction, but this result is not yet available.
3. Branch difficulties arise from branch and other instructions that change the
value of PC.

Data Dependency:

● A collision occurs when an instruction cannot proceed because previous


instructions did not complete certain operations.
● A data dependency occurs when an instruction needs data that are not
yet available.
● Similarly, an address dependency may occur when an operand address
cannot be calculated because the information needed by the addressing
mode is not available.
● Pipelined computers deal with such conflicts between data
dependencies in a variety of ​ways as follows:

Hardware interlocks:
● An interlock is a circuit that detects instructions whose source operands
are destinations of instructions farther up in the pipeline.
● Detection of this situation causes the instruction whose source is not
available to be delayed by enough clock cycles to resolve the conflict.
● This approach maintains the program sequence by using hardware to
insert the required delays.

Operand forwarding:
● It uses special hardware to detect a conflict and then avoid it by routing
the data through special paths between pipeline segments.
● This method requires additional hardware paths through multiplexers as
well as the circuit that detects the conflict.

Delayed load:
● Sometimes compiler has the responsibility for solving data conflicts
problems.
● The compiler for such computers is designed to detect a data conflict
and reorder the instructions as necessary to delay the loading of the
conflicting data by inserting no-operation instruction, this method is
called delayed load.

Handling of Branch Instructions:

● One of the major problems in operating an instruction pipeline is the


occurrence of branch instructions.
● A branch instruction can be conditional or unconditional.
● The branch instruction breaks the normal sequence of the instruction
stream, causing difficulties in the operation of the instruction pipeline.
● Various hardware techniques are available to minimize the performance
degradation caused by instruction branching.

Pre-fetch target:

● One way of handling a conditional branch is to prefetch the target


instruction in addition to the instruction following the branch.
● If the branch condition is successful, the pipeline continues from the
branch target instruction.
● An extension of this procedure is to continue fetching instructions from
both places until the branch decision is made.

​Branch target buffer:

● Another possibility is the use of a branch target buffer or BTB.


● The BTB is an associative memory included in the fetch segment of the
pipeline.
● Each entry in the BTB consists of the address of a previously executed
branch instruction and the target instruction for that branch.
● It also stores the next few instructions after the branch target instruction.
● The advantage of this scheme is that branch instructions that have
occurred previously are readily available in the pipeline without
interruption.

Loop buffer:
● A variation of the BTB is the loop buffer. This is a small very high speed
register file maintained by the instruction fetch segment of the pipeline.
● When a program loop is detected in the program, it is stored in the loop
buffer in its entirety, including all branches.

Branch Prediction:

● A pipeline with branch prediction uses some additional logic to guess the
outcome of a conditional branch instruction before it is executed.
● The pipeline then begins pre-fetching the instruction stream from the
predicted path.
● A correct prediction eliminates the wasted time caused by branch
penalties.

Delayed branch:

● A procedure employed in most RISC processors is the delayed branch.


● In this procedure, the compiler detects the branch instructions and
rearranges the machine language code sequence by inserting useful
instructions that keep the pipeline operating without interruptions.
Q. 24 Explain the Booth’s algorithm in depth with the help of flowchart.
Ans. ● Booth algorithm gives a procedure for multiplying binary integers in
signed- 2’scomplement representation.
● It operates on the fact that strings of 0’s in the multiplier require no
addition but just shifting, and a string of 1’s in the multiplier from bit
weight 2 k to weight 2 m can be treated as 2 k+1 – 2 m .
● For example, the binary number 001110 (+14) has a string 1’s from 2 3
to 2 1 (k=3, m=1). The number can be represented as 2 k+1 – 2 m . = 2
4 – 2 1 = 16 – 2 = 14. Therefore, the multiplication M X 14, where M is
the multiplicand and 14 the multiplier, can be done as M X 2 4 – M X 2 1
.
● Thus the product can be obtained by shifting the binary multiplicand M
four times to the left and subtracting M shifted left once.
As multiplication schemes, booth algorithm requires examination of the
multiplier bits and shifting of partial product.
● Prior to the shifting, the multiplicand may be added to the partial product,
subtracted from the partial, or left unchanged according to the following
rules:
1. The multiplicand is subtracted from the partial product upon
encountering
the first least significant 1 in a string of 1’s in the multiplier.
2. The multiplicand is added to the partial product upon encountering
the first
0 in a string of 0’s in the multiplier.
3. The partial product does not change when multiplier bit is identical to
the
previous multiplier bit.
● The algorithm works for positive or negative multipliers in 2’s
complement
representation.
● This is because a negative multiplier ends with a string of 1’s and the
last operation will be a subtraction of the appropriate weight.
● The two bits of the multiplier in Q n and Q n+1 are inspected.
● If the two bits are equal to 10, it means that the first 1 in a string of 1 's
has been
encountered. This requires a subtraction of the multiplicand from the
partial
product in AC.
● If the two bits are equal to 01, it means that the first 0 in a string of 0's
has been
encountered. This requires the addition of the multiplicand to the partial
product
in AC.
● When the two bits are equal, the partial product does not change.
Q. 25 Explain with proper block diagram the Addition Operation on two floating point
numbers.
Ans. The operations in a typical 32-bit floating-point adder can be explained as follows: First
the floating-point numbers are unpacked, i.e. sign bit, exponent and mantissa are
isolated and the exponents are given as an input to a 8-bit comparator/subtractor. The
mantissa with the smallest exponent is selected using a multiplexer and is right-shifted
by a number of times equal to the difference of the exponents, making the exponents
equal. The new mantissas are now added/ subtracted as per the control signal resulting
in a difference that may be a positive or negative number. In case of a negative number,
2’s complement of the number is taken to get the final result. Then, if required, the
result is normalized and rounded. The architecture of the adder is shown in Fig.

A comparator is required for exponent’s comparison and these exponents are subtracted
to initiate the operation of addition/subtraction. This can be taken care of by the
comparator/subtractor block. Since these are the initial blocks which can’t be avoided
in the critical path, an optimized design is needed which can reduce the delay and
power of these blocks.
Further, mantissas are added/subtracted according to a given operation by reusing
an efficient adder/subtractor. Moreover a 2’s complement block is required to correct
the result during the subtraction. Some designs have been developed to eliminate the
need of 2’s complement but that lead to circuit overhead. An efficient design which
reduces the overheads is thus required.
Q. 26 Draw the block diagram for BCD adder and explain it.
Ans. ● BCD representation is a class of binary encodings of decimal numbers
where each decimal digit is represented by a fixed number of bits.
● BCD adder is a circuit that adds two BCD digits in parallel and produces
a sum
● digit in BCD form.
● Since each input digit does not exceed 9, the output sum cannot be
greater than
19(9+9+1). For example: suppose we apply two BCD digits to 4-bit binary
adder.
● The adder will form the sum in binary and produce a result that may
range from 0 to 19.
● In following figure 7.5, these binary numbers are represented by K, Z8 ,
Z4, Z2 , and Z1
● K is the carry and subscripts under the Z represent the weights 8, 4, 2,
and 1 that can be assigned to the four bits in the BCD code.
● When binary sum is equal to or less than or equal to 9, corresponding
BCD number is identical and therefore no conversion is needed.
● The condition for correction and output carry can be expressed by the
Boolean function:
C= K + Z8 Z4 + Z8 Z2
● When it is greater than 9, we obtain non valid BCD representation, then
additional binary 6 to binary sum converts it to correct BCD
representation.
● The two decimal digits, together with the input-carry, are first added in
the top 4-bit binary adder to produce the binary sum. When the
output-carry is equal to 0, nothing is added to the binary sum.
● When C is equal to 1, binary 0110 is added to the binary sum using
bottom 4-bit binaryadder.
● The output carry generated from the bottom binary-adder may be
ignored.
Q.27 Explain Booth multiplication algorithm for multiplying binary integers in signed
2’s complement representation.
Ans. For explanation refer Ans 24.

● A numerical example of booth algorithm is shown for n=5. It shows the


step-by-step multiplication of (-9) X (-13) = +117.

You might also like