0% found this document useful (0 votes)
46 views11 pages

Hardware For Multipication

Uploaded by

Sarafat Hossain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views11 pages

Hardware For Multipication

Uploaded by

Sarafat Hossain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

178 Chapter 3 Arithmetic for Computers

3.1 Introduction

Computer words are composed of bits; thus, words can be represented as binary
numbers. Chapter 2 shows that integers can be represented either in decimal or
binary form, but what about the other numbers that commonly occur? For example:
■ What about fractions and other real numbers?
■ What happens if an operation creates a number bigger than can be represented?
■ And underlying these questions is a mystery: How does hardware really
multiply or divide numbers?
e goal of this chapter is to unravel these mysteries including representation of
real numbers, arithmetic algorithms, hardware that follows these algorithms, and
the implications of all this for instruction sets. ese insights may explain quirks
that you have already encountered with computers. Moreover, we show how to use
this knowledge to make arithmetic-intensive programs go much faster.

Subtraction: Addition’s
Tricky Pal
3.2 Addition and Subtraction
No. 10, Top Ten
Courses for Athletes at a Addition is just what you would expect in computers. Digits are added bit by bit
Football Factory, David from right to le , with carries passed to the next digit to the le , just as you would
Letterman et al., Book of do by hand. Subtraction uses addition: the appropriate operand is simply negated
Top Ten Lists, 1990 before being added.

Binary Addition and Subtraction

Let’s try adding 6ten to 7ten in binary and then subtracting 6ten from 7ten in binary.
EXAMPLE
0000 0000 0000 0000 0000 0000 0000 0111two = 7ten
+ 0000 0000 0000 0000 0000 0000 0000 0110two = 6ten
= 0000 0000 0000 0000 0000 0000 0000 1101two = 13ten

e 4 bits to the right have all the action; Figure 3.1 shows the sums and
carries. e carries are shown in parentheses, with the arrows showing how
they are passed.

ANSWER Subtracting 6ten from 7ten can be done directly:


3.2 Addition and Subtraction 179

(0) (0) (1) (1) (0) (Carries)


... 0 0 0 1 1 1
... 0 0 0 1 1 0
. . . (0) 0 (0) 0 (0) 1 (1) 1 (1) 0 (0) 1

FIGURE 3.1 Binary addition, showing carries from right to left. e rightmost bit adds 1
to 0, resulting in the sum of this bit being 1 and the carry out from this bit being 0. Hence, the operation
for the second digit to the right is 0 1 1. is generates a 0 for this sum bit and a carry out of 1. e
third digit is the sum of 1 1 1, resulting in a carry out of 1 and a sum bit of 1. e fourth bit is 1
0 0, yielding a 1 sum and no carry.

0000 0000 0000 0000 0000 0000 0000 0111two = 7ten


– 0000 0000 0000 0000 0000 0000 0000 0110two = 6ten
= 0000 0000 0000 0000 0000 0000 0000 0001two = 1ten

or via addition using the two’s complement representation of 6:


0000 0000 0000 0000 0000 0000 0000 0111two = 7ten
+ 1111 1111 1111 1111 1111 1111 1111 1010two = –6ten
= 0000 0000 0000 0000 0000 0000 0000 0001two = 1ten

Recall that over ow occurs when the result from an operation cannot be
represented with the available hardware, in this case a 32-bit word. When can
over ow occur in addition? When adding operands with di erent signs, over ow
cannot occur. e reason is the sum must be no larger than one of the operands.
For example, 10 4 6. Since the operands t in 32 bits and the sum is no
larger than an operand, the sum must t in 32 bits as well. erefore, no over ow
can occur when adding positive and negative operands.
ere are similar restrictions to the occurrence of over ow during subtract, but
it’s just the opposite principle: when the signs of the operands are the same, over ow
cannot occur. To see this, remember that c a c ( a) because we subtract by
negating the second operand and then add. erefore, when we subtract operands
of the same sign we end up by adding operands of di erent signs. From the prior
paragraph, we know that over ow cannot occur in this case either.
Knowing when over ow cannot occur in addition and subtraction is all well and
good, but how do we detect it when it does occur? Clearly, adding or subtracting
two 32-bit numbers can yield a result that needs 33 bits to be fully expressed.
e lack of a 33rd bit means that when over ow occurs, the sign bit is set with
the value of the result instead of the proper sign of the result. Since we need just one
extra bit, only the sign bit can be wrong. Hence, over ow occurs when adding two
positive numbers and the sum is negative, or vice versa. is spurious sum means
a carry out occurred into the sign bit.
Over ow occurs in subtraction when we subtract a negative number from a
positive number and get a negative result, or when we subtract a positive number
from a negative number and get a positive result. Such a ridiculous result means a
borrow occurred from the sign bit. Figure 3.2 shows the combination of operations,
operands, and results that indicate an over ow.
180 Chapter 3 Arithmetic for Computers

Result
Operation Operand A Operand B indicating overflow
A+B ≥0 ≥0 <0
A+B <0 <0 ≥0
A–B ≥0 <0 <0
A–B <0 ≥0 ≥0

FIGURE 3.2 Overflow conditions for addition and subtraction.

We have just seen how to detect over ow for two’s complement numbers in a
computer. What about over ow with unsigned integers? Unsigned integers are
commonly used for memory addresses where over ows are ignored.
e computer designer must therefore provide a way to ignore over ow in
some cases and to recognize it in others. e MIPS solution is to have two kinds of
arithmetic instructions to recognize the two choices:
■ Add (add), add immediate (addi), and subtract (sub) cause exceptions on
over ow.
■ Add unsigned (addu), add immediate unsigned (addiu), and subtract
unsigned (subu) do not cause exceptions on over ow.
Because C ignores over ows, the MIPS C compilers will always generate the
unsigned versions of the arithmetic instructions addu, addiu, and subu, no
matter what the type of the variables. e MIPS Fortran compilers, however, pick
the appropriate arithmetic instructions, depending on the type of the operands.
Appendix B describes the hardware that performs addition and subtraction,
Arithmetic Logic which is called an Arithmetic Logic Unit or ALU.
Unit (ALU) Hardware
that performs addition, Elaboration: A constant source of confusion for addiu is its name and what happens
subtraction, and usually to its immediate eld. The u stands for unsigned, which means addition cannot cause an
logical operations such as over ow exception. However, the 16-bit immediate eld is sign extended to 32 bits, just
AND and OR. like addi, slti, and sltiu. Thus, the immediate eld is signed, even if the operation
is “unsigned.”

Hardware/ e computer designer must decide how to handle arithmetic over ows. Although
some languages like C and Java ignore integer over ow, languages like Ada and
Software Fortran require that the program be noti ed. e programmer or the programming
Interface environment must then decide what to do when over ow occurs.
MIPS detects over ow with an exception, also called an interrupt on many
exception Also computers. An exception or interrupt is essentially an unscheduled procedure
called interrupt on
many computers. An call. e address of the instruction that over owed is saved in a register, and the
unscheduled event computer jumps to a prede ned address to invoke the appropriate routine for that
that disrupts program exception. e interrupted address is saved so that in some situations the program
execution; used to detect can continue a er corrective code is executed. (Section 4.9 covers exceptions in
over ow.
3.2 Addition and Subtraction 181

more detail; Chapter 5 describes other situations where exceptions and interrupts interrupt An exception
occur.) that comes from outside
MIPS includes a register called the exception program counter (EPC) to contain of the processor. (Some
architectures use the
the address of the instruction that caused the exception. e instruction move from
term interrupt for all
system control (mfc0) is used to copy EPC into a general-purpose register so that exceptions.)
MIPS so ware has the option of returning to the o ending instruction via a jump
register instruction.

Summary
A major point of this section is that, independent of the representation, the nite
word size of computers means that arithmetic operations can create results that
are too large to t in this xed word size. It’s easy to detect over ow in unsigned
numbers, although these are almost always ignored because programs don’t want to
detect over ow for address arithmetic, the most common use of natural numbers.
Two’s complement presents a greater challenge, yet some so ware systems require
detection of over ow, so today all computers have a way to detect it.
Some programming languages allow two’s complement integer arithmetic Check
on variables declared byte and half, whereas MIPS only has integer arithmetic Yourself
operations on full words. As we recall from Chapter 2, MIPS does have data transfer
operations for bytes and halfwords. What MIPS instructions should be generated
for byte and halfword arithmetic operations?
1. Load with lbu, lhu; arithmetic with add, sub, mult, div; then store using
sb, sh.
2. Load with lb, lh; arithmetic with add, sub, mult, div; then store using
sb, sh.
3. Load with lb, lh; arithmetic with add, sub, mult, div, using AND to mask
result to 8 or 16 bits a er each operation; then store using sb, sh.
Elaboration: One feature not generally found in general-purpose microprocessors is
saturating operations. Saturation means that when a calculation over ows, the result
is set to the largest positive number or most negative number, rather than a modulo
calculation as in two’s complement arithmetic. Saturation is likely what you want for media
operations. For example, the volume knob on a radio set would be frustrating if, as you
turned it, the volume would get continuously louder for a while and then immediately very
soft. A knob with saturation would stop at the highest volume no matter how far you turned
it. Multimedia extensions to standard instruction sets often offer saturating arithmetic.

Elaboration: MIPS can trap on over ow, but unlike many other computers, there is
no conditional branch to test over ow. A sequence of MIPS instructions can discover
182 Chapter 3 Arithmetic for Computers

over ow. For signed addition, the sequence is the following (see the Elaboration on page
89 in Chapter 2 for a description of the xor instruction):

addu $t0,
$t1, $t2 # $t0 = sum, but don’t trap
xor $t3,
$t1, $t2 # Check if signs differ
slt $t3,
$t3, $zero # $t3 = 1 if signs differ
bne $t3,
$zero, No_overflow # $t1, $t2 signs ≠,
# so no overflow
xor $t3, $t0, $t1 # signs =; sign of sum match too?
# $t3 negative if sum sign different
slt $t3, $t3, $zero # $t3 = 1 if sum sign different
bne $t3, $zero, Overflow # All 3 signs ≠; goto overflow

For unsigned addition ($t0 = $t1 + $t2), the test is


addu $t0, $t1, $t2 # $t0 = sum
nor $t3, $t1, $zero # $t3 = NOT $t1
# (2’s comp – 1: 232 – $t1 – 1)
sltu $t3, $t3, $t2 # (232 – $t1 – 1) < $t2
# ⇒ 232 – 1 < $t1 + $t2
bne $t3,$zero,Overflow # if(232–1<$t1+$t2) goto overflow

Elaboration: In the preceding text, we said that you copy EPC into a register via
mfc0 and then return to the interrupted code via jump register. This directive leads to
an interesting question: since you must rst transfer EPC to a register to use with jump
register, how can jump register return to the interrupted code and restore the original
values of all registers? Either you restore the old registers rst, thereby destroying your
return address from EPC, which you placed in a register for use in jump register, or you
restore all registers but the one with the return address so that you can jump—meaning
an exception would result in changing that one register at any time during program
execution! Neither option is satisfactory.
To rescue the hardware from this dilemma, MIPS programmers agreed to reserve
registers $k0 and $k1 for the operating system; these registers are not restored on
exceptions. Just as the MIPS compilers avoid using register $at so that the assembler
can use it as a temporary register (see Hardware/Software Interface in Section 2.10),
compilers also abstain from using registers $k0 and $k1 to make them available for the
operating system. Exception routines place the return address in one of these registers
and then use jump register to restore the instruction address.

Elaboration: The speed of addition is increased by determining the carry in to the


high-order bits sooner. There are a variety of schemes to anticipate the carry so that
the worst-case scenario is a function of the log2 of the number of bits in the adder.
These anticipatory signals are faster because they go through fewer gates in sequence,
but it takes many more gates to anticipate the proper carry. The most popular is carry
lookahead, which Section B.6 in Appendix B describes.
3.3 Multiplication 183

3.3 Multiplication

Now that we have completed the explanation of addition and subtraction, we are Multiplication is
ready to build the more vexing operation of multiplication. vexation, Division is
First, let’s review the multiplication of decimal numbers in longhand to remind as bad; e rule of
ourselves of the steps of multiplication and the names of the operands. For reasons three doth puzzle me,
that will become clear shortly, we limit this decimal example to using only the And practice drives me
digits 0 and 1. Multiplying 1000ten by 1001ten: mad.
Anonymous,
Multiplicand 1000ten
Elizabethan manuscript,
Multiplier x 1001ten
1570
1000
0000
0000
1000
Product 1001000ten

e rst operand is called the multiplicand and the second the multiplier.
e nal result is called the product. As you may recall, the algorithm learned in
grammar school is to take the digits of the multiplier one at a time from right to
le , multiplying the multiplicand by the single digit of the multiplier, and shi ing
the intermediate product one digit to the le of the earlier intermediate products.
e rst observation is that the number of digits in the product is considerably
larger than the number in either the multiplicand or the multiplier. In fact, if we
ignore the sign bits, the length of the multiplication of an n-bit multiplicand and an
m-bit multiplier is a product that is n m bits long. at is, n m bits are required
to represent all possible products. Hence, like add, multiply must cope with
over ow because we frequently want a 32-bit product as the result of multiplying
two 32-bit numbers.
In this example, we restricted the decimal digits to 0 and 1. With only two
choices, each step of the multiplication is simple:
1. Just place a copy of the multiplicand (1 multiplicand) in the proper place
if the multiplier digit is a 1, or
2. Place 0 (0 multiplicand) in the proper place if the digit is 0.
Although the decimal example above happens to use only 0 and 1, multiplication
of binary numbers must always use 0 and 1, and thus always o ers only these two
choices.
Now that we have reviewed the basics of multiplication, the traditional next
step is to provide the highly optimized multiply hardware. We break with tradition
in the belief that you will gain a better understanding by seeing the evolution of
the multiply hardware and algorithm through multiple generations. For now, let’s
assume that we are multiplying only positive numbers.
184 Chapter 3 Arithmetic for Computers

Multiplicand
Shift left
64 bits

Multiplier
64-bit ALU Shift right
32 bits

Product Control test


Write
64 bits

FIGURE 3.3 First version of the multiplication hardware. e Multiplicand register, ALU,
and Product register are all 64 bits wide, with only the Multiplier register containing 32 bits. (Appendix B
describes ALUs.) e 32-bit multiplicand starts in the right half of the Multiplicand register and is shi ed le
1 bit on each step. e multiplier is shi ed in the opposite direction at each step. e algorithm starts with
the product initialized to 0. Control decides when to shi the Multiplicand and Multiplier registers and when
to write new values into the Product register.

Sequential Version of the Multiplication Algorithm and


Hardware
is design mimics the algorithm we learned in grammar school; Figure 3.3 shows
the hardware. We have drawn the hardware so that data ows from top to bottom
to resemble more closely the paper-and-pencil method.
Let’s assume that the multiplier is in the 32-bit Multiplier register and that the 64-
bit Product register is initialized to 0. From the paper-and-pencil example above,
it’s clear that we will need to move the multiplicand le one digit each step, as it may
be added to the intermediate products. Over 32 steps, a 32-bit multiplicand would
move 32 bits to the le . Hence, we need a 64-bit Multiplicand register, initialized
with the 32-bit multiplicand in the right half and zero in the le half. is register
is then shi ed le 1 bit each step to align the multiplicand with the sum being
accumulated in the 64-bit Product register.
Figure 3.4 shows the three basic steps needed for each bit. e least signi cant
bit of the multiplier (Multiplier0) determines whether the multiplicand is added to
the Product register. e le shi in step 2 has the e ect of moving the intermediate
operands to the le , just as when multiplying with paper and pencil. e shi right
in step 3 gives us the next bit of the multiplier to examine in the following iteration.
ese three steps are repeated 32 times to obtain the product. If each step took a
clock cycle, this algorithm would require almost 100 clock cycles to multiply two
32-bit numbers. e relative importance of arithmetic operations like multiply
varies with the program, but addition and subtraction may be anywhere from 5 to
100 times more popular than multiply. Accordingly, in many applications, multiply
can take multiple clock cycles without signi cantly a ecting performance. Yet
Amdahl’s Law (see Section 1.10) reminds us that even a moderate frequency for a
slow operation can limit performance.
3.3 Multiplication 185

Start

Multiplier0 = 1 1. Test Multiplier0 = 0


Multiplier0

1a. Add multiplicand to product and


place the result in Product register

2. Shift the Multiplicand register left 1 bit

3. Shift the Multiplier register right 1 bit

No: < 32 repetitions


32nd repetition?

Yes: 32 repetitions

Done

FIGURE 3.4 The first multiplication algorithm, using the hardware shown in Figure 3.3. If
the least signi cant bit of the multiplier is 1, add the multiplicand to the product. If not, go to the next step.
Shi the multiplicand le and the multiplier right in the next two steps. ese three steps are repeated 32
times.

is algorithm and hardware are easily re ned to take 1 clock cycle per step.
e speed-up comes from performing the operations in parallel: the multiplier
and multiplicand are shi ed while the multiplicand is added to the product if the
multiplier bit is a 1. e hardware just has to ensure that it tests the right bit of
the multiplier and gets the preshi ed version of the multiplicand. e hardware is
usually further optimized to halve the width of the adder and registers by noticing
where there are unused portions of registers and adders. Figure 3.5 shows the
revised hardware.
186 Chapter 3 Arithmetic for Computers

Multiplicand
32 bits

32-bit ALU

Shift right Control


Product
Write test
64 bits

FIGURE 3.5 Refined version of the multiplication hardware. Compare with the rst version in
Figure 3.3. e Multiplicand register, ALU, and Multiplier register are all 32 bits wide, with only the Product
register le at 64 bits. Now the product is shi ed right. e separate Multiplier register also disappeared. e
multiplier is placed instead in the right half of the Product register. ese changes are highlighted in color.
( e Product register should really be 65 bits to hold the carry out of the adder, but it’s shown here as 64 bits
to highlight the evolution from Figure 3.3.)

Hardware/ Replacing arithmetic by shi s can also occur when multiplying by constants. Some
compilers replace multiplies by short constants with a series of shi s and adds.
Software Because one bit to the le represents a number twice as large in base 2, shi ing
Interface the bits le has the same e ect as multiplying by a power of 2. As mentioned in
Chapter 2, almost every compiler will perform the strength reduction optimization
of substituting a le shi for a multiply by a power of 2.

A Multiply Algorithm
EXAMPLE
Using 4-bit numbers to save space, multiply 2ten 3ten, or 0010two 0011two.

Figure 3.6 shows the value of each register for each of the steps labeled
ANSWER according to Figure 3.4, with the nal value of 0000 0110two or 6ten. Color is
used to indicate the register values that change on that step, and the bit circled
is the one examined to determine the operation of the next step.
3.3 Multiplication 187

Iteration Step Multiplier Multiplicand Product


0 Initial values 0011 0000 0010 0000 0000
1 1a: 1 ⇒ Prod = Prod + Mcand 0011 0000 0010 0000 0010
2: Shift left Multiplicand 0011 0000 0100 0000 0010
3: Shift right Multiplier 0001 0000 0100 0000 0010
2 1a: 1 ⇒ Prod = Prod + Mcand 0001 0000 0100 0000 0110
2: Shift left Multiplicand 0001 0000 1000 0000 0110
3: Shift right Multiplier 0000 0000 1000 0000 0110
3 1: 0 ⇒ No operation 0000 0000 1000 0000 0110
2: Shift left Multiplicand 0000 0001 0000 0000 0110
3: Shift right Multiplier 0000 0001 0000 0000 0110
4 1: 0 ⇒ No operation 0000 0001 0000 0000 0110
2: Shift left Multiplicand 0000 0010 0000 0000 0110
3: Shift right Multiplier 0000 0010 0000 0000 0110

FIGURE 3.6 Multiply example using algorithm in Figure 3.4. e bit examined to determine the
next step is circled in color.

Signed Multiplication
So far, we have dealt with positive numbers. e easiest way to understand how
to deal with signed numbers is to rst convert the multiplier and multiplicand to
positive numbers and then remember the original signs. e algorithms should
then be run for 31 iterations, leaving the signs out of the calculation. As we learned
in grammar school, we need negate the product only if the original signs disagree.
It turns out that the last algorithm will work for signed numbers, provided that
we remember that we are dealing with numbers that have in nite digits, and we are
only representing them with 32 bits. Hence, the shi ing steps would need to extend
the sign of the product for signed numbers. When the algorithm completes, the
lower word would have the 32-bit product.

Faster Multiplication
Moore’s Law has provided so much more in resources that hardware designers can
now build much faster multiplication hardware. Whether the multiplicand is to be
added or not is known at the beginning of the multiplication by looking at each of
the 32 multiplier bits. Faster multiplications are possible by essentially providing
one 32-bit adder for each bit of the multiplier: one input is the multiplicand ANDed
with a multiplier bit, and the other is the output of a prior adder.
A straightforward approach would be to connect the outputs of adders on the
right to the inputs of adders on the le , making a stack of adders 32 high. An
alternative way to organize these 32 additions is in a parallel tree, as Figure 3.7
shows. Instead of waiting for 32 add times, we wait just the log2 (32) or ve 32-bit
add times.
188 Chapter 3 Arithmetic for Computers

Mplier31 • Mcand Mplier30 • Mcand Mplier29 • Mcand Mplier28 • Mcand Mplier3 • Mcand Mplier2 • Mcand Mplier1 • Mcand Mplier0 • Mcand

32 bits 32 bits ... 32 bits 32 bits

32 bits 32 bits

1 bit 1 bit ... ... ... 1 bit 1 bit

32 bits

Product63 Product62 ... Product47..16 ... Product1 Product0

FIGURE 3.7 Fast multiplication hardware. Rather than use a single 32-bit adder 31 times, this hardware “unrolls the loop” to use 31
adders and then organizes them to minimize delay.

In fact, multiply can go even faster than ve add times because of the use of carry
save adders (see Section B.6 in Appendix B) and because it is easy to pipeline
such a design to be able to support many multiplies simultaneously (see Chapter 4).

Multiply in MIPS
MIPS provides a separate pair of 32-bit registers to contain the 64-bit product,
called Hi and Lo. To produce a properly signed or unsigned product, MIPS has two
instructions: multiply (mult) and multiply unsigned (multu). To fetch the integer
32-bit product, the programmer uses move from lo (mflo). e MIPS assembler
generates a pseudoinstruction for multiply that speci es three general-purpose
registers, generating mflo and mfhi instructions to place the product into registers.

Summary
Multiplication hardware simply shi s and add, as derived from the paper-and-
pencil method learned in grammar school. Compilers even use shi instructions
for multiplications by powers of 2. With much more hardware we can do the adds
in parallel, and do them much faster.

Hardware/ Both MIPS multiply instructions ignore over ow, so it is up to the so ware to
check to see if the product is too big to t in 32 bits. ere is no over ow if Hi is
Software 0 for multu or the replicated sign of Lo for mult. e instruction move from hi
Interface (mfhi) can be used to transfer Hi to a general-purpose register to test for over ow.

You might also like