NAChapter 1
NAChapter 1
Preliminaries
To understand algorithms well, we should know some things about how number arithmetics,
addition, subtraction, multiplication and division, are performed on computers.
In this Chapter we first introduce computer arithmetic, which includes how numbers are
stored and operated in computer, how much is machine error, and then, we introduce some
pitfalls in machine operations and how to avoid loss of significance.
Definition 1.1.1 (Binary number) Binary numbers are expressed as: (· · · b2 b1 b0 .b−1 b−2 · · ·)2 ,
where each binary digit bi is 0 or 1. This binary number is equivalent to the base 10 number:
Example 1.1.1 Convert the binary numbers: (a) (100.0)2 ; (b) −(0.11)2 ; (c) (101101.1011)2
into decimal numbers.
(101.0)2 = 0 × 2−1 + 1 × 20 + 0 × 21 + 1 × 22 = 5;
1
2 CHAPTER 1. PRELIMINARIES
(b) Adding up the digits after the point times the negative powers of 2 leads to
3
−(0.11)2 = −(1 × 2−1 + 1 × 2−2 ) = − ;
4
(101101)2 = 1 × 20 + 0 × 21 + 1 × 22 + 1 × 23 + 1 × 24 = 29,
11
(0.1011)2 = 1 × 2−1 + 0 × 2−2 + 1 × 2−3 + 1 × 2−4 = ,
16
we obtain
11 11
(101101.1011)2 = (101101)2 + (0.1011)2 = 29 + = 29 .
16 16
Theorem 1.1.1 Conversion of decimal integers to binary is obtained by dividing the decimal
number by 2 successively and recording the remainders from the bottom to the top.
57 ÷ 2 = 28 R 1
28 ÷ 2 = 14 R 0
14 ÷ 2 = 7R0
.
7÷2 = 3R1
3÷2 = 1R1
1÷2 = 0R1
When the quotient becomes 0, the process stops, since the additional equation 0 ÷ 2 = 0 R 0
is trivial. Then, the binary number is obtained by writing the remainders (binary digits in the
rightmost column) from the bottom to the top: 57 = (111001)2 .
Theorem 1.1.2 Decimal fractions are converted to binary numbers by multiplying the decimal
fraction and the resulted fractions by 2 successively, and recording the integer parts from the top
to the bottom.
Example 1.1.3 Convert the decimal fractions: 0.6 and −0.75 to the binary numbers.
Solution. Multiplying 0.6 by 2 leads to an integer 1 and a fraction 0.2, recording the integer
1 and multiplying the resulted fraction 0.2 by 2 leads to an integer number 0 and a fraction
0.4. Keeping doing this process a sequence of binary digits is obtained. The process is like as
1.1. BINARY NUMBERS 3
follows:
0.6 × 2 = 0.2 + 1
0.2 × 2 = 0.4 + 0
0.4 × 2 = 0.8 + 0
0.8 × 2 = 0.6 + 1
0.6 × 2 = 0.2 + 1
0.2 × 2 = 0.4 + 0
0.4 × 2 = 0.8 + 0
0.8 × 2 = 0.6 + 1
.. ..
. .
When the new fractional part becomes 0, the process stops, otherwise, the process goes on.
The digits are written from top to bottom, 0.6 = (0.10011001)2 = 0.1001)2 , will be the binary
fraction. The symbol 1001 means the four digits will repeat for ever.
Similarly, we consider the binary number of 0.75.
0.75 × 2 = 0.5 + 1
0.5 × 2 = 0+ 1
Example 1.1.4 Convert the numbers 57.6 and −57.75 to the binary numbers.
which is a repeated binary number. The bar over the digits means the four digits are infinitely
repeated.
Similarly, for −57.75 we have
Thus, x = (0.1101)2 = 15 .
13
By Examples 1.1.3-1.1.5 we see that every decimal number can be uniquely converted into
a binary number and vice versa.
4. (a) Suppose x = (0.1001)2 . Convert this binary number to decimal; (b) use the result in
(a) to convert (1.11001)2 to decimal number.
5. "dec2bin" and "bin2dec" are the two commands in the Matlab library. Use these two
functions to check your results in Exercises 1-3.
In scientific notation all numbers are written in the form ±a × 10b (a times ten raised to the
power of b), where the coefficient a is any real number, called the significand or mantissa, and
the exponent b is chosen so that the absolute value of a remains at least one but less than ten
(1 ≤ |a| < 10). For example,
Similar to scientific notations for decimal numbers, the binary numbers in Examples 1.1.1
and 1.1.2 can be rewritten as
1.2. FLOATING-POINT NUMBERS AND ROUND-OFF ERRORS 5
(101.0)2 = (1.01)2 × 22 ;
−(0.11)2 = −(1.1)2 × 2−1 ;
(101101.1011)2 = (1.011011011)2 × 25 ;
(111001.1001)2 = (1.110011001)2 × 25 .
Shifting point rule:Power of 2 will be added m if binary point is moved m places on the left.
On the other hand, power of 2 will be subtracted by m if binary point is moved m places on the
right.
Those binary numbers with infinite many binary digits are stored on computers as binary
numbers with a fixed number of binary digits, and the redundant digits will be removed by the
IEEE rounding rules. Binary numbers stored on computers are called floating-point numbers.
Definition 1.2.1 (IEEE rounding rule) The IEEE rounding to nearest rule(IEEE754 Float-
ing Point Standard) states that for storing binary numbers r in double precision as the form
r0 ,
(a) if the 53rd bit to the right of the binary point is 0, then round down;
(b) If the 53rd bit is 1 and all the bits after the 53rd bit are not all zero, then round up (or
add 1 to the 52nd bit);
(c) The 53rd bit to the right of the binary point is 1 and all the bits after the 53rd bit are all
zero, in this case, if the 52nd bit is 1, then add 1 to the 52nd bit; otherwise, if the 52nd bit is 0,
then round down.
(d) The rules for storing binary numbers in single and long double precisions are similar-
ly given. Computers with single precision and long double precision store 23 and 64 digits
respectively.
Example 1.2.1 By the IEEE rounding rules, give the binary number of r = (1.110011001)2 × 25
stored on computers with 32-bit accuracy(called single precision).
Solution. In terms of the rounding rules, the stored number on computers with single precision
for r is
r0 = (1.11001 1001 1001 1001 1001 1)2 × 25 ,
which has 23 significant digits. The 24th bit to the right of the binary point is 0, then round
down(truncate after the 23rd bit). If the 24th bit is 1, then round up (add 1 to the 23rd bit).
Binary number r0 is called the binary floating-point representation of r, where
1.10101101100110011001101
6 CHAPTER 1. PRELIMINARIES
is called the mantissa, 5 is the exponent. Binary number r0 approximates r and has the same 23
significant digits as r.
Definition 1.2.2 (Floating point number) A floating point number has three parts: the sign
(+ or -), a mantissa, which contains the string of significant bits, and an exponent.
The form of a normalized floating point number is: ±1.bbb... × 2 p , where b = 0 or 1, p is
an M-bit binary number representing the exponent.
Remark 1.2.1 (a) Normalization means that the leading or leftmost bit must be the digit 1;
(b) A floating-point number is a rational number, because it has a finite number of digits and
can be represented as one integer divided by another. For example (1.1011)2 × 22 is (110.11)2 ,
which is equal to
3 27
1 × 2 + 1 × 22 + 1 × 2−1 + 1 × 2−2 = 6 + = .
4 4
(c) In a normalized floating point number, sign, exponent and mantissa are stored together
in a computer word, sign exponent mantissa
For example, the IEEE floating-point representation of the real number (r = 57.6) in Exam-
ple 1.1.4 by single precision is
The lengths of the significand and exponent determine the precision to which numbers can
be represented. There are three commonly used levels of precision for floating point numbers,
single precision (32), double precision (64), and long double precision (80). The details of
the standards for the representation in the three levels of precision are shown below.
Definition 1.2.3 (Round-off error) Most real numbers have to be rounded off in order to be
represented as t-digit floating point numbers. The difference between the floating point number
x0 and the original number x is called the round-off error.
1.2. FLOATING-POINT NUMBERS AND ROUND-OFF ERRORS 7
Definition 1.2.4 (Absolute and relative error) If x is a real number and x0 is its floating-point
approximation, then the absolute value of the difference x0 − x, |x0 − x|, is called the absolute
|x0 −x|
error and that of the quotient (x0 − x)/x, |x| , is called the relative error.
Example 1.2.2 Find the round-off error, absolute and relative error of the floating-point number
r0 stored on computers with single precision in Example 1.2.1.
Solution: Note that the correct value is r = 57.6 = (1.11001 1001 1001)2 × 25 , and that the
number stored on computers with single precision is r0 = (1.11001 1001 1001 1001 1001 1)2 ×
25 . Thus the difference or the truncation error is:
|r − r0 | 1.1001 × 2−20
ErrR = =
|r| (1.11001 1001 1001)2 × 25
1.5625 × 2−20
= ≈ 0.2712 × 2−20 .
57.6
Next we consider how to describe significant digits in terms of error. For example, let
√
a = 3 = 0.1732050807568877 · · · × 101 be the correct value, and a∗ = 1.732051 be an
approximate value obtained by using rounding rules on the decimal number. By counting the
number of the decimal digits we see that a∗ has 7 significant digits. The absolute error of a∗ is
Both a∗ and b∗ have the same significant digits and satisfy that the general form |a − a ∗ | ≤
0.5 × 10m−n .
We conclude and give the relation between significant digits and error.
8 CHAPTER 1. PRELIMINARIES
Theorem 1.2.1 (Absolute error determines number of significant digits) Let a∗ be an approx-
imate value of the correct value a and that
a∗ = ±0.a1 a2 · · · an × 10m
Theorem 1.2.2 (Relation between significant digits and relative error) Let a∗ be an approx-
imate value of the correct value a and have n significant digits, that is
a∗ = ±0.a1 a2 · · · an × 10m
Proof. Let the approximate value a∗ have n significant digits. By Theorem 1.2.1, we see that
1
|a − a∗ | ≤ × 10m−n .
2
Note that
|a − a∗ | 1
er = ∗
≤ × 101−n .
|a | 2a1
1.2. FLOATING-POINT NUMBERS AND ROUND-OFF ERRORS 9
Conversely, if
|a − a∗ | 1
er = ≤ × 101−n ,
∗
|a | 2(a1 + 1)
then, by the following two relations:
|a − a∗ |
|a − a∗ | = |a∗ | · = |a∗ | · er ,
|a∗ |
|a∗ | ≤ (a1 + 1) × 10m−1 ,
we obtain
1
|a − a∗ | = |a∗ | · er ≤ (a1 + 1) × 10m−1 · × 101−n
2(a1 + 1)
= 0.5 × 10m−n .
Example 1.2.3 Determine absolute error, relative error and numbers of significant digits of the
√ √
following approximate values of 2 = 1.41421356237... and 10 + 2 = 11.41421356237....
(a) x1 = 1.414213; (b) x2 = 1.414214; (c) x3 = 1.414213321;
(d) y1 = 11.414213; (b) y2 = 11.414214; (c) y3 = 11.414213321.
By the definition of significant digits we see that x1 , x2 , and x3 have 6, 7, and 7 significant
digits, and that y1 , y2 , and y3 have 7, 8, and 8 significant digits, since
The two theorems above can be used as a stopping criterion in programs to set tolerance if
number of significant digits is given.
Apart from rounding-off error, additional round-off errors may occur when arithmetic op-
erations on computers are applied to floating-point numbers. To see this error and pitfalls in
computer operations, in next section we introduce arithmetic operations for binary numbers.
1. Convert the following decimal numbers to binary and express them as a floating point
number with single precision by using the Rounding to Nearest Rule.
2. Convert the following decimal numbers to binary and express them as a floating point
number fl(x) with single precision by using the Rounding to Nearest Rule.
3. Do the following sums by hand in IEEE double precision computer arithmetic, using the
rounding rules.
4. Decide whether 1 + x > 1 in double precision floating point arithmetic with the rounding
rules.
Give the numbers of significant digits of the approximate values xi and yi with i = 1, 2, 3,
and their absolute error and relative error. Then test the Theorem about the relation
between significant digits and error.
1.3. OPERATIONS OF FLOATING-POINT NUMBERS AND ERROR ANALYSIS 11
The four operations of binary numbers are addition, subtraction, multiplication and divi-
sion. The simplest arithmetic operation for binaries is addition. For example,
0 1 1 0 1
+ 1 0 1 1 1 = 36.
= 1 0 0 1 0 0
Definition 1.3.1 (Addition) Adding two single-digit binary numbers is relatively simple, using
a form of carrying:
0 + 0 → 0, 0 + 1 → 1, 1 + 0 → 1, 1 + 1 → 0, carry 1,
since 1 + 1 = 2 = (10)2 = 0 + 1 × 21 . Adding two "1" digits produces a digit "0", while 1 will
have to be added to the next column.
If addition of two binary single numbers equals or exceeds the value of the radix (2), the
digit to the left is incremented. This is known as carrying.
Definition 1.3.2 (Subtraction) Subtraction of binary numbers works in much the same way as
addition:
0 − 0 → 0 0 − 1 → 1, borrow 1, 1 − 0 → 1, 1 − 1 → 0.
Subtracting a "1" digit from a "0" digit produces the digit "1", while 1 will have to be subtracted
from the next column. This is known as borrowing. The principle is the same as for carrying.
When 0 − 1 happens, the procedure is to "borrow" the deficit divided by the radix from the left,
subtracting it from the next positional value.
For example,
1 1 0 1 1 1 0
− 1 0 1 1 1
− − − − − − − −
= 1 0 1 0 1 1 1
Since there are only two digits in binary, there are only two possible outcomes of each partial
multiplication: (i) If the digit in b is 0, the partial product is also 0; (ii) If the digit in b is 1, the
partial product of 1 and a is equal to a.
Solution. The binary numbers (1011)2 and (1010)2 are multiplied as follows:
1 0 1 1 (= a)
× 1 0 1 0 (= b)
− − − − − − − − − −
0 0 0 0
1 0 1 1
0 0 0 0
+ 1 0 1 1
− − − − − − − − − −
= 1 1 0 1 1 1 0
1 0 1 . 1 0 1
× 1 1 0 . 0 1
− − − − − − − − − − − −
1 . 0 1 1 0 1
0 0 . 0 0 0 0
.
0 0 0 . 0 0 0
1 0 1 1 . 0 1
+ 1 0 1 1 0 . 1
− − − − − − − − − − − −
=1 0 0 0 1 1 . 0 0 1 0 1
Definition 1.3.4 (Division) Binary division is again similar to its decimal counterpart. The
procedure is illustrated by an example: compute (11011)2 ÷ (101)2 .
1.3. OPERATIONS OF FLOATING-POINT NUMBERS AND ERROR ANALYSIS 13
1 0 1
− − − − − −
1 0 1 1 1 0 1 1
− 1 0 1
− − − − −
0 1 1
− 0 0 0
− − − − −
1 1 1
− 1 0 1
− − − − −
1 0
Thus, (11011)2 = (101)2 · (101)2 + (10)2 , the remainder is (10)2 ; in decimal form, it is written
as: 27 = 5 · 5 + 2.
Most real numbers have to be rounded off in order to be represented as t-digit floating point num-
bers on computers. When arithmetic operations are applied to floating-point numbers, additional
round-off errors may occur.
In this section we talk about errors in numerical computation on computers.
Theorem 1.3.1 (Absolute error under basic operations) Let a∗ and b∗ be approximate values
for a and b respectively, and e(a∗ ) = |a − a∗ |, e(b∗ ) = |b − b∗ |. Then under the basic operations:
addition, subtraction, multiplication and division, the errors of the results are
Example 1.3.2 (Error in computation of rectangular area) The area of a rectangle is: A =
h · l, where h is the height and l is the width of the rectangle. Suppose that h∗ = 80 m and
l∗ = 110 m are the measured values of h and l, and e(h∗ ) = |h−h∗ | ≤ 0.1 m, e(l∗ ) = |l−l∗ | ≤ 0.2 m.
Give the error of the computed area A∗ = h∗ · l∗ .
e(A∗ ) = |A − A∗ | = |h · l − h∗ · l∗ | = |h · l − h · l∗ + h · l∗ − h∗ · l∗ |
≤ |h| · |l − l∗ | + |l∗ | · |h − h∗ | ≈ |h∗ | · |l − l∗ | + |l∗ | · |h − h∗ | ≤ 80 · 0.2 + 110 · 0.1 = 27 (m2 ).
Thus the absolute error of A∗ is approximately 27 m2 , and the relative error of A∗ is approxi-
mately e(A∗ )/A∗ = 27/(80 · 110) = 0.0031.
Definition 1.3.5 (Machine addition of floating point numbers) Machine addition consists of
lining up the binary points of the two numbers to be added so that both of the numbers have the
same exponent, adding them, and then storing the result again as a floating point number.
1.3. OPERATIONS OF FLOATING-POINT NUMBERS AND ERROR ANALYSIS 15
Example 1.3.3 Carry out the addition of 1 with 2−53 on computers with double precision by
using the rules for machine addition.
Solution. In terms of machine addition of floating point numbers, the addition of the two num-
bers would appear as follows:
1. 000000000000000000000000000000000000000000000000000 × 20
=
+ 0. 000000000000000000000000000000000000000000000000000 1 × 20
= 1. 000000000000000000000000000000000000000000000000000 1 × 20 .
This sum is saved as 1.0 × 20 = 1 by the rounding-to-nearest-rule. From this example we see
that if a big number adds a very small number, the result would be the same as the big number.
This is a pitfall in machine addition.
By the addition of the two numbers in Example 1.3.3, we see that on computers with double
precision the smallest number greater than 1 is 1 + 1 × 2−52 . The distance between 1 and 1 + 1 ×
2−52 , denoted by mach , mach = 2−52 , is the machine error.
Definition 1.3.6 (Loss of significance) When two nearly equal numbers are subtracted, signif-
icant digits are lost. This phenomenon is called loss of significance.
For example, we use seven significant digits to do the subtraction: 113.4567 − 113.4566:
1 1 3 . 4 5 6 7
− 1 1 3 . 4 5 6 6
− − − − − − − − −
= 0 0 0 . 0 0 0 1
Two input numbers have seven-digit accuracy, but after subtraction the result has only one-digit
accuracy. This operation loses many significant digits. In programming and computation by a
computer, loss of significance should be avoided by restructuring the calculation and reducing
operation counts.
Note 3: In programming, equivalent mathematical expressions are selected to avoid subtrac-
tion of two nearly equal numbers.
√
√ 9.01 − 3 √ 0.01
(a) 9.01 − 3 = √ ( 9.01 + 3) = √ ;
9.01 + 3 9.01 + 3
(b) 1 − cos(0.001) = 1 − (1 − 2 sin2 (0.0005)) = 2 sin2 (0.0005);
1 − cos(x) 1 − cos2 (x) 1
(c) = = .
2
sin (x) (1 + cos(x)) sin (x) 1 + cos(x)
2
Example 1.3.5 Give the solutions of the equation x2 + 1012 x − 2 = 0 and pay attention to loss
of significance.
Solution. By the quadratic formula for solutions of quadratic equations, we see that the solution
x1 and x2 are
√ √
−1012 − 1024 + 8 −1012 + 1024 + 8
x1 = , and x2 = .
2 2
√
Note that 1012 and 1024 + 8 are nearly equal to each other, and that the calculation −1012 +
√
1024 + 8 by computers will lead to loss of significance. Then, by multiplying the numerator
√
and denominator of x2 by −1012 − 1024 + 8, we have
4
x2 = √ .
1012 + 1024 + 8
Thus the formulas being used on computers or developed into programs are:
√
−1012 − 1024 + 8 4
x1 = , and x2 = √ .
2 1012 + 1024 + 8
In general we obtain
Theorem 1.3.2 (Quadratic formula for machine operation) For the solution of the quadratic
equation, ax2 − bx + c = 0, where a and or c are very small compared with b so that b2 − 4ac is
nearly equal to b2 , (a) if b is positive in this situation, the roots should be computed by
√
−b − b2 − 4ac −2c
x1 = , and x2 = √ ;
2a b + b2 − 4ac
(b) if b is negative and b2 − 4ac very close to b2 , then, the roots are best computed by
√
−b + b2 − 4ac 2c
x1 = , and x2 = √ .
2a −b + b2 − 4ac
1.3. OPERATIONS OF FLOATING-POINT NUMBERS AND ERROR ANALYSIS 17
If we use the quadratic formula to develop a program to solve a quadratic equation, the above
expressions should be considered and used.
This requires only 4 multiplications and 4 additions. So the second way is the best.
In general, we obtain the following
c1 + c2 x + c3 x2 + c4 x3 + c5 x4 ,
where r1 , r2 , r3 and r4 are distinctive numbers and called the base points (this form is from
Newton’s interpolation), evaluation of this polynomial will require the form
18 CHAPTER 1. PRELIMINARIES
This method is called Nested multiplication or Horner’s method, or Qin Jiushao Algorithm.
Implementation of Qin Jiushao’s method for the polynomial of four is as follows:
v1 = (x − r4 )c5 ; → v2 = (x − r3 )(c4 + v1 );
v3 = (x − r2 )(c3 + v2 ); → v4 = (x − r1 )(c2 + v3 );
v5 = c1 + v4 .
Next we give a Matlab function to implement the nest multiplication for polynomials.
function y=nest(d,c,x,r)
if nargin < 4, r=zeros(d,1);
end
y=c(d+1);
for i=d:-1:1
y=y.*(x-r(i))+c(i);
end
When we study sequences and series, some kinds of operations will be repeated or iterated.
In these cases error magnification or stability will be considered.
Note 5: Stable algorithms should be used.
Z 1
Example 1.3.8 Let {an }∞
n=0 be a sequence of numbers defined by an = xn e x−1 dx. Test the
0
iterative formula: an = 1 − nan−1 and then try to work out the values an (n = 0, 1, 2, . . . , 16).
we obtain that
It looks that the computation by using the formula an = 1 − nan−1 (the formula in theory) is
incorrect. Next we explain the reason.
Let a∗n (n = 0, 1, . . .) be the values provided by computers satisfying the formula: a∗n =
1 − na∗n−1 (this is the practical formula on computers). Subtraction of the two formula leads that
an − a∗n = −n(an−1 − a∗n−1 ) = (−1)2 n(n − 1)(an−2 − a∗n−2 ) = · · · = (−1)n n!(a0 − a∗0 ).
This error equation means that the error in the initial step is enlarged n! (10! = 3628800) times
at the n-th step. This phenomenon is called unstable.
On the other hand, from the mathematical formula an = 1 − nan−1 we see that
1
an−1 = (1 − an ).
n
Note that Z 1 Z n
1
0 < an = n x−1
x e ds < xn dx = ,
0 0 n+1
1
so we suppose a20 = 1/21. By using the formula an−1 = (1 − an ) and the following commands:
n
% To compute the sequence by a stable formula
clear; clc; a(20)=1/21;
for i=20:(-1):2
a(i-1)=1/(i)*(1-a(i));
end
a(1:11)
1. Identify for which values of x there is subtraction of nearly equal numbers, and find an
alternate form that avoids the problem.
1 − sec x 1 − (1 − x)3 1 1
(a) ; (b) ; (c) − .
tan2 x x 1+x 1−x
2. (1) Explain how to most accurately compute the two roots of the equation x2 + bx −
10−12 = 0, where b is a number greater than 100.
(2) Find the roots of the equation x2 + 3x − 8−14 = 0 with three-digit accuracy.
3. Assume that the sequence {yn } satisfies that yn = 15yn−1 − 2, n = 1, 2, . . . , and that the
initial value y0 = 1.425. (a) Compute the values y2 , y3 , . . . , y9 ; (b) Derive the error
equation for the formula; (c) Is the algorithm or the formula stable? If not, give a stable
one to compute yi (i = 2, 3, . . . , 9).
4. The volume of a sphere is: V = 43 πr3 where r is the radius of the sphere. Let r∗ denote the
approximate value of r and e(r∗ ) = |r − r∗ | denote the absolute error. If e(v∗ ) = |V − V ∗ |,
V ∗ = 34 π(r∗ )3 , and e(v∗ ) ≤ 0.01, then how much of e(r∗ ) is required?
√
5. Let f (x) = ln(x − x2 − 1). (a) Compute f (50) with six significant digits and its error;
√
(b) Alternatively, if the equivalent form of f (x), f (x) = − ln(x + x2 − 1) is used, answer
the questions in (a); (c) Use Matlab and the two expressions of f (x) to compute f (1010 ),
observe the difference.
6. Use the Matlab function nest.m to evaluate the values of the following polynomials at
points x1 = 0, x2 = 1, x3 = 12, x4 = 2.546.
(a) p(x) = 1 + x − 3x2 + 6x3 ; (b) q(x) = 2.5 + 1.2x + 4x2 − 5.6x3 + 6.8x4 − 1.2x5 ;
(c) r(x) = 1 + x + x2 + · · · + x25 .
7. Write Matlab code to solve the equation x2 + 108 x + 1 = 0 and compare with results
given by the Matlab command roots.
√
8. Assume that a∗ is an approximate value for the correct value a = 20 and that the
relative error of a∗ is less than 10−3 . How many significant digits does a∗ have?
22 CHAPTER 1. PRELIMINARIES
Definition 1.4.1 (Vector space Pn [x]) For any integer n ≥ 1, let Pn [x] be the set of polynomials
with the form: an−1 xn−1 + · · · + a1 x + a0 , where an−1 , an−2 , · · · , a0 are any real numbers. Then,
Pn [x] under the addition "+" and scalar multiplication forms a linear space or vector space,
whose dimension is n, and the standard basis is 1, x, x2 , · · · , xn−1 .
Theorem 1.4.1 (Two bases of Pn [x]) For distinctive real numbers x0 , x1 , · · · , xn−1 , it can be
proved that (a) 1, x − x0 , (x − x0 )(x − x1 ), (x − x0 )(x − x1 ) · · · (x − xn−2 ), and (b) L0 (x), L1 (x), · · · ,
Ln−1 (x) form two bases of Pn [x], where
n−1
(x − x1 )(x − x2 ) · · · (x − xn−1 ) Y x − xk
L0 (x) = = ,
(x0 − x1 )(x0 − x2 ) · · · (x0 − xn−1 ) k=1,k,0 x0 − xk
n
(x − x0 ) · · · (x − x j−1 )(x − x j+1 ) · · · (x − xn−1 ) Y x − xk
L j (x) = = .
(x j − x0 ) · · · (x j − x j−1 )(x j − x j+1 ) · · · (x j − xn−1 ) k=0,k, j x j − xk
In other words, the zero points of the polynomial function f (x) or solutions of the equation
f (x) = 0 are the roots of the polynomial f (x).
Remark 1.4.1 For polynomials of degree one, two, three and four, their roots can be solved by
formulas (here omitted). For polynomials with higher degrees, the Abel Ruffini theorem asserts
that there can not exist a general formula involving arithmetic operations and radicals that
express the roots of a polynomial of degree 5 or greater in terms of its coefficients.
Theorem 1.4.3 (Intermediate value theorem) Let f (x) be a continuous function on the inter-
val [a, b], and y be a value between f (a) and f (b). Then, there exists a number c in [a, b] such
that f (c) = y.
Then
lim f (xn ) = f ( lim xn ) = f (x0 ).
n→∞ n→∞
Theorem 1.4.5 (Mean value theorem) Let f be a continuously differentiable function on [a, b].
Then there exists a number c in (a, b) such that
f (b) − f (a)
f 0 (c) = .
b−a
Theorem 1.4.6 (Roll’s theorem) Let the function f (x) be continuously differentiable on the
closed interval [a, b] and f (a) = f (b). Then there exists at least a number c in (a, b) such
that f 0 (c) = 0.
Theorem 1.4.7 (Taylor’s theorem with remainder) Let x0 and x be real numbers and let f be
k + 1 times continuously differentiable on the closed interval between x0 and x. Then there exists
a number c between x and x0 such that
f 00 (x0 )
f (x) = f (x0 ) + f 0 (x0 )(x − x0 ) + (x − x0 )2 + · · ·
2!
f (k) (x0 ) f (k+1) (c)
+ (x − x0 )k + (x − x0 )k+1 .
k! (k + 1)!
The resulted polynomial x − x0 without the truncation error is called the Taylor polynomial
of degree k for f at x0 . The final term is called the Taylor remainder.
Theorem 1.4.8 (Zero point theorem) Let f be a continuous function on [a, b], satisfying f (a) f (b) <
0. Then f has a root between a and b, that is, there exists a number r satisfying a < r < b and
f (r) = 0.
Theorem 1.4.9 (Mean value theorem for integrals) Let f be a continuous function on the
closed interval [a, b], and let g be an integrable function that does not change sign on [a, b].
Then there exists a number c between a and b such that
Z b Z b
f (x)g(x)dx = f (c) g(x)dx.
a a
Theorem 1.4.11 (Identities for trigonometric functions) Some identities for trigonometric func-
tions are shown below.
α+β α−β α+β α−β
sin(α) + sin(β) = 2 sin cos , sin(α) − sin(β) = 2 cos sin ,
2 2 2 2
α+β α−β α+β α−β
cos(α) + cos(β) = 2 cos cos , cos(α) − cos(β) = −2 sin sin ,
2 2 2 2
sin(α + β) sin(α − β)
tan(α) + tan(β) = , tan(α) − tan(β) = .
cos(α) cos(β) cos(α) cos(β)
1. For distinctive numbers b1 , b2 , . . . , b5 prove that the two sets of polynomials {1, x, · · · , x4 }
and {1, x − b1 , (x − b1 )(x − b2 ), · · · , (x − b1 )(x − b2 )(x − b3 )(x − b4 )} form two bases of
the vector space P5 [x] = {a0 + a1 x + · · · + a4 x4 }.
2. Use the Intermediate Value Theorem to prove that f (c) = 0 for some c in [0, 1], where
(a) f (x) = x3 − 4x + 1; (b) f (x) = 4 cos(pix) − 3; (c) f (x) = 8x4 − 8x2 + 1.
3. Find c in [a, b] = [0, 1] satisfying the Mean Value Theorem for f (x) on [0, 1]:
(a) f (x) = 12 x2 ; (b) f (x) = 1
x+1 .
4. Find c satisfying the Mean Value Theorem for Integrals with f (x), g(x) on [0, 1]:
(a) f (x) = x, g(x) = 2x; (b) f (x) = x2 , g(x) = x; (b) f (x) = x, g(x) = e x .
5. Find the Taylor polynomial of degree 2 about the point x = 0 for the following:
2
(a) f (x) = e x ; (b) f (x) = cos(4x); (c) f (x) = ln(x + 1).
6. Find the degree 5 Taylor polynomial p(x) centered at x = 0 for f (x) = cos(x), and find
an error upper bound for the error in approximating f (x) = cos(x) for x ∈ [−π/4, π/4] by
p(x).
7. Review orthogonal polynomials and their properties.
8. Review some examples of differential equations with analytical solutions.