Silver Oak College of Engineering and Technology: Computer Organization Module Solution - 4
Silver Oak College of Engineering and Technology: Computer Organization Module Solution - 4
Computer Organization
Module Solution -4
Instruction cycle:
● Assume that the decoding of the instruction can be combined with the
calculation of the effective address into one segment.
● Assume further that most of the instructions place the result into a processor
registers so that the instruction execution and storing of the result can be
combined into one segment.
● This reduces the instruction pipeline into four segments.
1. FI: Fetch an instruction from memory
2. DA: Decode the instruction and calculate the effective address of the operand
3. FO: Fetch the operand
4. EX: Execute the operation
● Figure 1 shows, how the instruction cycle in the CPU can be processed with a
four-segment pipeline.
● While an instruction is being executed in segment 4, the next instruction in
sequence is busy fetching an operand from memory in segment 3.
● The effectiv888*
●
● e address may be calculated in a separate arithmetic circuit for the third
instruction, and whenever the memory is available, the fourth and all subsequent
instructions can be fetched and placed in an instruction FIFO.
● Thus up to four sub operations in the instruction cycle can overlap and up to
four different instructions can be in progress of being processed at the same
time.
Figure 2 shows the operation of the instruction pipeline. The time in the horizontal axis
is divided into steps of equal duration. The four segments are represented in the
diagram with an abbreviated symbol.
Fig 2
Fig 4
SIMD
Array
Processor Organization
● It contains a set of identical processing elements (PEs), each having a
local memory M.
● Each processor element includes an ALU, a floating-point arithmetic unit
and working registers.
● The master control unit controls the operations in the processor
elements.
● The main memory is used for storage of the program.
● The function of the master control unit is to decode the instructions and
determine how the instruction is to be executed.
● Scalar and program control instructions are directly executed within the
master control unit.
● Vector instructions are broadcast to all PEs simultaneously.
● Vector operands are distributed to the local memories prior to the
parallel execution of the instruction.
● Masking schemes are used to control the status of each PE during the
execution of vector instructions.
● Each PE has a flag that is set when the PE is active and reset when the
PE is inactive.
● This ensures that only that PE’S that needs to participate is active during
the execution of the instruction.
● SIMD processors are highly specialized computers.
Q.23 What is pipeline conflict? Explain data dependency and handling of branch
instruction in detail.
Ans. Pipeline conflict:
There are three major difficulties that cause the instruction pipeline conflicts.
1. Resource conflicts caused by access to memory by two segments at the
same time.
2. Data dependency conflicts arise when an instruction depends on the result of
a previous instruction, but this result is not yet available.
3. Branch difficulties arise from branch and other instructions that change the
value of PC.
Data Dependency:
Hardware interlocks:
● An interlock is a circuit that detects instructions whose source operands
are destinations of instructions farther up in the pipeline.
● Detection of this situation causes the instruction whose source is not
available to be delayed by enough clock cycles to resolve the conflict.
● This approach maintains the program sequence by using hardware to
insert the required delays.
Operand forwarding:
● It uses special hardware to detect a conflict and then avoid it by routing
the data through special paths between pipeline segments.
● This method requires additional hardware paths through multiplexers as
well as the circuit that detects the conflict.
Delayed load:
● Sometimes compiler has the responsibility for solving data conflicts
problems.
● The compiler for such computers is designed to detect a data conflict
and reorder the instructions as necessary to delay the loading of the
conflicting data by inserting no-operation instruction, this method is
called delayed load.
Pre-fetch target:
Branch Prediction:
● A pipeline with branch prediction uses some additional logic to guess the
outcome of a conditional branch instruction before it is executed.
● The pipeline then begins pre-fetching the instruction stream from the
predicted path.
● A correct prediction eliminates the wasted time caused by branch
penalties.
Delayed branch:
A comparator is required for exponent’s comparison and these exponents are subtracted
to initiate the operation of addition/subtraction. This can be taken care of by the
comparator/subtractor block. Since these are the initial blocks which can’t be avoided
in the critical path, an optimized design is needed which can reduce the delay and
power of these blocks.
Further, mantissas are added/subtracted according to a given operation by reusing
an efficient adder/subtractor. Moreover a 2’s complement block is required to correct
the result during the subtraction. Some designs have been developed to eliminate the
need of 2’s complement but that lead to circuit overhead. An efficient design which
reduces the overheads is thus required.
Q. 26 Draw the block diagram for BCD adder and explain it.
Ans. ● BCD representation is a class of binary encodings of decimal numbers
where each decimal digit is represented by a fixed number of bits.
● BCD adder is a circuit that adds two BCD digits in parallel and produces
a sum
● digit in BCD form.
● Since each input digit does not exceed 9, the output sum cannot be
greater than
19(9+9+1). For example: suppose we apply two BCD digits to 4-bit binary
adder.
● The adder will form the sum in binary and produce a result that may
range from 0 to 19.
● In following figure 7.5, these binary numbers are represented by K, Z8 ,
Z4, Z2 , and Z1
● K is the carry and subscripts under the Z represent the weights 8, 4, 2,
and 1 that can be assigned to the four bits in the BCD code.
● When binary sum is equal to or less than or equal to 9, corresponding
BCD number is identical and therefore no conversion is needed.
● The condition for correction and output carry can be expressed by the
Boolean function:
C= K + Z8 Z4 + Z8 Z2
● When it is greater than 9, we obtain non valid BCD representation, then
additional binary 6 to binary sum converts it to correct BCD
representation.
● The two decimal digits, together with the input-carry, are first added in
the top 4-bit binary adder to produce the binary sum. When the
output-carry is equal to 0, nothing is added to the binary sum.
● When C is equal to 1, binary 0110 is added to the binary sum using
bottom 4-bit binaryadder.
● The output carry generated from the bottom binary-adder may be
ignored.
Q.27 Explain Booth multiplication algorithm for multiplying binary integers in signed
2’s complement representation.
Ans. For explanation refer Ans 24.