+ Chapter 10
Computer Arithmetic
+
Arithmetic & Logic Unit (ALU)
Part of the computer that actually performs arithmetic
and logical operations on data
All of the other elements of the computer system are
there mainly to bring data into the ALU for it to process
and then to take the results back out
Based on the use of simple digital logic devices that
can store binary digits and perform simple Boolean
logic operations
+
Integer Representation
In the binary number system arbitrary numbers can be
represented with:
The digits zero and one
The minus sign (for negative numbers)
The period, or radix point (for numbers with a fractional
component)
For purposes of computer storage and processing we
do not have the benefit of special symbols for the
minus sign and radix point
Only binary digits (0,1) may be used to represent
numbers
Sign-Magnitude Representation
There are several alternative • All of these alternatives involve treating
the most significant (leftmost) bit in the
conventions used to represent word as a sign bit
negative as well as positive • If the sign bit is 0 the number is positive
integers • If the sign bit is 1 the number is negative
Sign-magnitude
representation is the simplest
form that employs a sign bit
• Addition and subtraction require a
consideration of both the signs of the
Drawbacks: numbers and their relative magnitudes
to carry out the required operation
• There are two representations of 0
Because of these drawbacks,
sign-magnitude
representation is rarely used
in implementing the integer
portion of the ALU
Table 10.1
Characteristics of Twos Complement Representation and
Arithmetic
Table 10.2
Alternative Representations for 4-Bit Integers
+
Range Extension
Range of numbers that can be expressed is extended
by increasing the bit length
In sign-magnitude notation this is accomplished by
moving the sign bit to the new leftmost position and fill
in with zeros
This procedure will not work for twos complement
negative integers
Rule is to move the sign bit to the new leftmost position
and fill in with copies of the sign bit
For positive numbers, fill in with zeros, and for negative
numbers, fill in with ones
This is called sign extension
Fixed-Point Representation
Programmer
can use the
same
representation
The radix
for binary
point (binary
fractions by
point) is fixed
scaling the
and assumed
numbers so
to be to the
that the binary
right of the
point is
rightmost digit
implicitly
positioned at
some other
location
+
Negation
Twos complement operation
Take the Boolean complement of each bit of the integer
(including the sign bit)
Treating the result as an unsigned binary integer, add 1
+18 = 00010010 (twos complement)
bitwise complement = 11101101
+ 1
11101110 = -18
The negative of the negative of that number is itself:
-18 = 11101110 (twos
complement)
bitwise complement = 00010001
+ 1
00010010 = +18
+
Negation Special Case 1
0 = 00000000 (twos
complement)
Bitwise complement = 11111111
Add 1 to LSB + 1
Result 100000000
Overflow is ignored, so:
-0=0
+
Negation Special Case 2
-128 = 10000000 (twos complement)
Bitwise complement = 01111111
Add 1 to LSB + 1
Result 10000000
So:
-(-128) = -128 X
Monitor MSB (sign bit)
It should change during negation
Overflow
OVERFLOW RULE:
If two numbers are
Rule
added, and they are
both positive or both
+ negative, then overflow
occurs if and only if the
result has the opposite
sign.
Subtraction
SUBTRACTION RULE:
To subtract one
number (subtrahend) Rule
from another
+
(minuend), take the
twos complement
(negation) of the
subtrahend and add it
+
Floating-Point Representation
Principles
With a fixed-point notation it is possible to represent a
range of positive and negative integers centered on or
near 0
By assuming a fixed binary or radix point, this format
allows the representation of numbers with a fractional
component as well
Limitations:
Very large numbers cannot be represented nor can very
small fractions
The fractional part of the quotient in a division of two large
numbers could be lost
+
Floating-Point
Significand
The final portion of the word
Any floating-point number can be expressed in many
ways
The following are equivalent, where the significand
is expressed in binary form:
0.110 * 25
110 * 22
0.0110 * 26
Normal number
The most significant digit of the significand is nonzero
IEEE Standard 754
Standard was developed
to facilitate the portability
of programs from one
Most important floating-
processor to another and
point representation is
to encourage the
defined
development of
sophisticated, numerically
oriented programs
Standard has been widely
IEEE 754-2008 covers
adopted and is used on
both binary and decimal
virtually all contemporary
floating-point
processors and arithmetic
representations
coprocessors
+
IEEE 754-2008
Defines the following different types of floating-point formats:
Arithmetic format
All the mandatory operations defined by the standard are
supported by the format. The format may be used to represent
floating-point operands or results for the operations described in
the standard.
Basic format
This format covers five floating-point representations, three binary
and two decimal, whose encodings are specified by the standard,
and which can be used for arithmetic. At least one of the basic
formats is implemented in any conforming implementation.
Interchange format
A fully specified, fixed-length binary encoding that allows data
interchange between different platforms and that can be used for
storage.
Table 10.3 IEEE 754 Format Parameters
* not including implied bit and not including sign bit
+ Additional Formats
Extended Precision Formats
Extendable Precision Format
Provide additional bits in the
exponent (extended range) and in
the significand (extended precision) Precision and range are defined
under user control
Lessens the chance of a final result
that has been contaminated by May be used for intermediate
excessive roundoff error calculations but the standard
places no constraint or format
Lessens the chance of an or length
intermediate overflow aborting a
computation whose final result would
have been representable in a basic
format
Affords some of the benefits of a
larger basic format without incurring
the time penalty usually associated
with higher precision
Table 10.4
IEEE Formats
Table 10.5
Interpretation of IEEE 754 Floating-Point Numbers (page 2 of 3)
(a) binary64 format
+
Precision Considerations
Rounding
IEEE standard approaches:
Round to nearest:
The result is rounded to the nearest representable
number.
Round toward +∞ :
The result is rounded up toward plus infinity.
Round toward -∞:
The result is rounded down toward negative infinity.
Round toward 0:
The result is rounded toward zero.
+
Interval Arithmetic
Provides an efficient method for Minus infinity and
monitoring and controlling errors in
rounding to plus are
floating-point computations by
producing two values for each useful in implementing
result interval arithmetic
The two values correspond to the
lower and upper endpoints of an
interval that contains the true
Truncatio
result
The width of the interval indicates
n toward zero
Round
the accuracy of the result Extra bits are ignored
If the endpoints are not Simplest technique
representable then the interval
endpoints are rounded down and
up respectively
A consistent bias toward zero in
the operation
If the range between the upper and Serious bias because it affects
lower bounds is sufficiently narrow every operation for which
then a sufficiently accurate result there are nonzero extra bits
has been obtained
+
IEEE Standard for Binary Floating-Point Arithmetic
Infinity
Is treated as the limiting case of real arithmetic, with the
infinity values given the following interpretation:
- ∞ < (every finite number) < + ∞
For example:
5 + (+ ∞ ) = + ∞ 5÷ (+ ∞ ) = +0
5 - (+ ∞ ) = - ∞ (+ ∞ ) + (+ ∞ ) =+∞
5 + (- ∞ ) = - ∞ (- ∞ ) + (- ∞) =-∞
5 - (- ∞ ) =+∞ (- ∞ ) - (+ ∞ ) =-∞
5 * (+ ∞ ) = + ∞ (+ ∞ ) - (- ∞ ) =+∞
Table 10.7
Operations that Produce a Quiet NaN