Co Unit-5
Co Unit-5
SC [SEM-3]
UNIT-5
Computer Arithmetic and Parallel Processing: Data representation – fixed point, floating point,
addition and subtraction, Multiplication and division algorithms.
Parallel Processing:- Parallel Processing, Pipelining, Arithmetic Pipeline, Instruction Pipeline.
Hardware Implementation:
To implement the two arithmetic operations with hardware, it is first necessary that the two
numbers be stored in registers.
i. Let A and B be two registers that hold the magnitudes of the numbers, and AS and BS be two
flip-flops that hold the corresponding signs.
ii. The result of the operation may be transferred to a third register: however, a saving is
achieved if the result is transferred into A and AS. Thus A and AS together form an
accumulator register.
Consider now the hardware implementation of the algorithms above.
• First, a parallel-adder is needed to perform the microoperation A + B.
• Second, a comparator circuit is needed to establish if A > B, A = B, or A < B.
• Third, two parallel-subtractor circuits are needed to perform the microoperations A - B and
B - A. The sign relationship can be determined from an exclusive-OR gate with AS and BS as
inputs.
The below figure shows a block diagram of the hardware for implementing the addition and
subtraction operations. It consists of registers A and B and sign flip-flops AS and BS.
• Subtraction is done by adding A to the 2' s complement of B. The output carry is transferred
to flip-flop E, where it can be checked to determine the relative magnitudes of the two
numbers.
• The add-overflow flip-flop AVF holds the overflow bit when A and B are added.
Figure (i): Hardware for addition and subtraction with Signed-Magnitude Data
The complementer provides an output of B or the complement of B depending on the state of the
mode control M.
❖ When M = 0, the output of B is transferred to the adder, the input carry is 0, and the output
of the adder is equal to the sum A + B.
❖ When M= 1, the l's complement of B is applied to the adder, the input carry is 1, and output
This is equal to A plus the 2's complement of B, which is equivalent to the subtraction A - B.
Hardware Algorithm:
Multiplication Algorithms:
Multiplication of two fixed-point binary numbers in signed-magnitude representation is done with
paper and pencil by a process of successive shift and adds operations. This process is best
illustrated with a numerical example.
• The numbers copied down in successive lines are shifted one position to the left from the
previous number.
• Finally, the numbers are added and their sum forms the product.
The sign of the product is determined from the signs of the multiplicand and multiplier. If they are
alike, the sign of the product is positive. If they are unlike, the sign of the product is negative.
→ Initially, the multiplicand is in register B and the multiplier in Q, their corresponding signs
are in Bs and Qs, respectively
→ The sum of A and B forms a partial product which is transferred to the EA register.
→ Both partial product and multiplier are shifted to the right. This shift will be denoted by the
statement shr EAQ to designate the right shift.
→ The least significant bit of A is shifted into the most significant position of Q, the bit fromE is
shifted into the most significant position of A, and 0 is shifted into E. After the shift, one bit
of the partial product is shifted into Q, pushing the multiplier bits one position to the right.In
this manner, the rightmost flip-flop in register Q, designated by Qn, will hold the bit of the
multiplier, which must be inspected next.
Hardware Algorithm:
→ Initially, the multiplicand is in B and the multiplier in Q. Their corresponding signs are in Bs
and Qs, respectively. The signs are compared, and both A and Q are set to correspond to the
sign of the product since a double-length product will be stored in registers A and Q. Registers
A and E are cleared and the sequence counter SC is set to a number equal to the number of
bits of the multiplier.
→ After the initialization, the low-order bit of the multiplier in Qn is tested.
i. If it is 1, the multiplicand in B is added to the present partial product in A .
ii. If it is 0 , nothing is done. Register EAQ is then shifted once to the right to form the
new partial product.
→ The sequence counter is decremented by 1 and its new value checked. If it is not equal to
zero, the process is repeated and a new partial product is formed. The process stops when
SC = 0.
→ The final product is available in both A and Q, with A holding the most significant bits and Q
holding the least significant bits. A flowchart of the hardware multiply algorithm is shown in
the below figure (l).
QR. An extra flip-flop Qn+1, is appended to QR to facilitate a double bit inspection of the
multiplier. The flowchart for Booth algorithm is shown in Figure (o).
Hardware Algorithm for Booth Multiplication:
• AC and the appended bit Qn+1 are initially cleared to 0 and the sequence counter SC is
set to a number n equal to the number of bits in the multiplier. The two bits of the
multiplier in Qn and Qn+1 are inspected.
i. If the two bits are equal to 10, it means that the first 1 in a string of 1's has been
encountered. This requires a subtraction of the multiplicand from the partial product in
AC.
ii. If the two bits are equal to 01, it means that the first 0 in a string of 0's has been
encountered. This requires the addition of the multiplicand to the partial product in AC.
iii. When the two bits are equal, the partial product does not change.
iv. The next step is to shift right the partial product and the multiplier (including bit Qn+1).
This is an arithmetic shift right (ashr) operation which shifts AC and QR to the right and
leaves the sign bit in AC unchanged. The sequence counter is decremented and the
Example: multiplication of ( - 9) x ( - 13) = + 117 is shown below. Note that the multiplier in QR is
negative and that the multiplicand in BR is also negative. The 10-bit product appears in AC and QR
and is positive.
Division Algorithms:
→ Division of two fixed-point binary numbers in signed-magnitude representation is done with
paper and pencil by a process of successive compare, shift, and subtract operations.
→ The division process is illustrated by a numerical example in the below figure (q).
→ The divisor B consists of five bits and the dividend A consists of ten bits. The five most
significant bits of the dividend are compared with the divisor. Since the 5-bit number is
smaller than B, we try again by taking the sixth most significant bits of A and compare this
number with B. The 6-bit number is greater than B, so we place a 1 for the quotient bit. The
divisor is then shifted once to the right and subtracted from the dividend.
→ The difference is called a partial remainder because the division could have stopped here
to obtain a quotient of 1 and a remainder equal to the partial remainder. The process is
continued by comparing a partial remainder with the divisor.
→ If the partial remainder is greater than or equal to the divisor, the quotient bit is equal to 1.
The divisor is then shifted right and subtracted from the partial remainder.
→ If the partial remainder is smaller than the divisor, the quotient bit is 0 and no subtraction
is needed. The divisor is shifted once to the right in any case. Note that the result gives both
a quotient and a remainder.
Hardware Algorithm:
1. The dividend is in A and Q and the divisor in B . The sign of the result is transferred into Qs
to be part of the quotient. A constant is set into the sequence counter SC to specify the number of
bits in the quotient.
2. A divide-overflow condition is tested by subtracting the divisor in B from half of the bits of
the dividend stored in A. If A ≥ B, the divide-overflow flip-flop DVF is set and the operation is
terminated prematurely. If A < B, no divide overflow occurs so the value of the dividend is restored
by adding B to A.
3. The division of the magnitudes starts by shifting the dividend in AQ to the left with the high-
order bit shifted into E. If the bit shifted into E is 1, we know that EA > B because EA consists of a 1
followed by n-1 bits while B consists of only n -1 bits. In this case, B must be subtracted from EA and
1 inserted into Qn for the quotient bit.
4. If the shift-left operation inserts a 0 into E, the divisor is subtracted by adding its 2's
complement value and the carry is transferred into E . If E = 1, it signifies that A ≥ B; therefore, Qn
is set to 1 . If E = 0, it signifies that A < B and the original number is restored by adding B to A . In
the latter case we leave a 0 in Qn.
This process is repeated again with registers EAQ. After n times, the quotient is formed in
register Q and the remainder is found in register A
Division:-
The following diagram shows one possible way of separating the execution unit into eight functional units
operating in parallel.
The operation performed in each functional unit is indicated in each block if the diagram:
• The adder and integer multiplier performs the arithmetic operation with integer numbers.
• The floating-point operations are separated into three circuits operating in parallel.
• The logic, shift, and increment operations can be performed concurrently on different data. All units
are independent of each other, so one number can be shifted while another number is being
incremented.
Pipelining is a technique of decomposing a sequential process into sub operations, with each sub
process being executed in a special dedicated segment that operates concurrently with all other
segments.
Example:
Suppose we need to perform multiply and add operation with a stream of numbers
The operation to be performed on the numbers is decomposed into sub-operations with each
sub- operation to be implemented in a segment within a pipeline.
The sub-operations performed in each segment of the pipeline are defined as:
The following block diagram represents the combined as well as the sub-operations performed
in each segment of the pipeline.
Registers R1, R2, R3, and R4 hold the data and the combinational circuits operate in a
particular segment. The above figure shows R1 through R5 are registers that receive new data
with every clock pulse.
The Above diagram shows the first clock pulse transfers A1 and B1 intoR1 and R2. The second
clock pulse transfers the product of R1 and R2 into R3 and C1 into R4. The same clock pulse
transfers A2 and B2 into R1 and R2.The third clock pulse operates on all three segments
simultaneously.
It places A3 and B3 into R1 and R2, transfer the product of R1 and R2 into R3, transfers C2 into
R4, and places the sum of R3 and R4 into R5. It takes three clock pulses to fill up pipe and retrieve
the first output from R5.
The general structure of a four-segment pipeline is shown in figure. The operand pass through all
four segments in a fixed sequence. Each segment combinational circuit Si that performs a sub
operation over the data stream flowing through the pipe.
The above space time diagram consists of Horizontal axis display the time in clock cycles and
vertical axis gives the segment number. The diagram shows six tasks T1 through T6 executed in
four segments. Initially task T1 is handled by segment 1. After the first clock, segment 2 is busy
with T1, while segment 1 is busy with task T2. Continuing in manner.
To understand the concepts of arithmetic pipeline in a more convenient way, let us consider an
example of a pipeline unit for floating-point addition and subtraction.
The inputs to the floating-point adder pipeline are two normalized floating-point binary numbers
defined as:
X = A * 2a = 0.9504 * 103
Y = B * 2b = 0.8200 * 102
Where A and B are two fractions that represent the mantissa and a and b are the exponents.
The combined operation of floating-point addition and subtraction is divided into four segments. Each
segment contains the corresponding suboperation to be performed in the given pipeline. The suboperations
that are shown in the four segments are:
Note: Registers are placed after each suboperation to store the intermediate results.
1. Compare exponents by subtraction:
The exponents are compared by subtracting them to determine their difference. The larger exponent is
chosen as the exponent of the result.
The difference of the exponents, i.e., 3 - 2 = 1 determines how many times the mantissa associated with the
smaller exponent must be shifted to the right.
Z = X + Y = 1.0324 * 103
Z = 0.1324 * 104
Most of the digital computers with complex instructions require instruction pipeline to carry out
operations like fetch, decode and execute instructions.
In general, the computer needs to process each instruction with the following sequence of steps.
Each step is executed in a particular segment, and there are times when different segments may
take different times to operate on the incoming information. Moreover, there are times when two
or more segments may require memory access at the same time, causing one segment to wait until
another is finished with the memory.
The organization of an instruction pipeline will be more efficient if the instruction cycle is divided
into segments of equal duration. One of the most common examples of this type of organization is
a Four-segment instruction pipeline.
A four-segment instruction pipeline combines two or more different segments and makes it as a
single one. For instance, the decoding of the instruction can be combined with the calculation of
the effective address into one segment.
The following block diagram shows a typical example of a four-segment instruction pipeline. The
instruction cycle is completed in four segments.
Segment 1:
The instruction fetch segment can be implemented using first in, first out (FIFO) buffer.
Segment 2:
The instruction fetched from memory is decoded in the second segment, and eventually, the effective
address is calculated in a separate arithmetic circuit.
Segment 3:
An operand from memory is fetched in the third segment.
Segment 4:
The instructions are finally executed in the last segment of the pipeline organization.