Prob 3
Prob 3
MAT 460/560
Fall Semester 2009-10
Homework Assignment 2 Solution
Section 1.2
3. Suppose 𝑝∗ must approximate 𝑝 with relative error at most 10−3 . Find the largest interval in
which 𝑝∗ must lie for each value of 𝑝.
(a) 150
Solution We must have ∣𝑝∗ − 150∣/∣150∣ ≤ 10−3 , or ∣𝑝∗ − 150∣ ≤ 0.15, which yields the
interval [149.85, 150.15].
(b) 900
Solution We must have ∣𝑝∗ − 900∣ ≤ 0.9, which yields [899.1, 900.9].
(c) 1500
Solution We must have ∣𝑝∗ − 1500∣ ≤ 1.5, which yields [1498.5, 1501.5].
(d) 90
Solution We must have ∣𝑝∗ − 90∣ ≤ 0.09, which yields [89.91, 90.09].
5. Use three-digit rounding arithmetic to perform the following calculations. Compute the ab-
solute error and relative error with the exact value determined to at least five digits.
(a) 133 + 0.921
Solution 𝑝 = 133.921 and 𝑝∗ = 134, so the absolute error is 0.079 and the relative error
is 5.90 × 10−4 .
(b) 133 − 0.499
Solution 𝑝 = 132.501 and 𝑝∗ = 133, so the absolute error is 0.499 and the relative error
is 3.77 × 10−3 .
(c) (121 − 0.327) − 119
Solution 𝑝 = 1.673 and 𝑝∗ = 121 − 119 = 2, so the absolute error is 0.327 and the
relative error is 0.195.
(d) (121 − 119) − 0.327
Solution 𝑝 = 1.673 and 𝑝∗ = 1.67, so the absolute error is 0.003 and the relative error
is 1.79 × 10−3 .
13
− 67
(e) 14
2𝑒−5.4
Solution 𝑝 = 1.95354 and 𝑝∗ = (0.929 − 0.857)/(5.44 − 5.4) = 1.80, so the absolute error
is 0.154 and the relative error is 0.0786.
1
3
(f) −10𝜋 + 6𝑒 − 62
Solution 𝑝 = −15.1546 and 𝑝∗ = −31.4 + 16.3 − 0.048 = −15.1, so the absolute error is
0.0546 and the relative error is 3.6 × 10−3 .
(g) 29 ⋅ 79
( ) ( )
9. The first three nonzero terms of the Maclaurin series for the arctangent function are 𝑥 −
(1/3)𝑥3 + (1/5)𝑥5 . Compute the absolute error and relative error in the following approxi-
mations of 𝜋 using the polynomial in place of the arctangent:
Solution We have
[ ( )3 ( )5 ( )3 ( )5 ]
1 1 1 1 1 1
𝜋 ≈ 4 − (1/3) + (1/5) + − (1/3) + (1/5)
2 2 2 3 3 3
[ ]
1 1 1 1 1 1
≈ 4 − + + − +
2 24 160 3 81 1215
≈ 3.14557613168724.
Since the exact value of 𝜋, to 15 significant digits, is 3.14159265358979, it follows that the
absolute error is 3.983 × 10−3 and the relative error is (3.983 × 10−3 )/𝜋 ≈ 1.268 × 10−3 .
(b) 16 arctan 15 − 4 arctan 239
( ) ( 1 )
Solution We have
[ ( )3 ( )5 ]
1 1 1
𝜋 ≈ 16 − (1/3) + (1/5) −
5 5 5
[ ) ]
1 3 1 5
( ) (
1
4 − (1/3) + (1/5)
239 239 239
[ ] [ ]
1 1 1 1 1 1
≈ 16 − + −4 − +
5 375 15625 239 40955757 3899056325995
≈ 3.14162102932503.
Since the exact value of 𝜋, to 15 significant digits, is 3.14159265358979, it follows that the
absolute error is 2.838 × 10−5 and the relative error is (2.838 × 10−5 )/𝜋 ≈ 9.032 × 10−6 .
2
11. Let
𝑥 cos 𝑥 − sin 𝑥
𝑓 (𝑥) = .
𝑥 − sin 𝑥
(a) Find lim𝑥→0 𝑓 (𝑥).
Solution If we substitute 𝑥 = 0, we obtain 0/0, which is an indeterminate form. Using
l’Hospital’s Rule three times, we obtain
𝑥 cos 𝑥 − sin 𝑥
lim 𝑓 (𝑥) = lim
𝑥→0 𝑥→0 𝑥 − sin 𝑥
cos 𝑥 − 𝑥 sin 𝑥 − cos 𝑥
= lim
𝑥→0 1 − cos 𝑥
−𝑥 sin 𝑥
= lim
𝑥→0 1 − cos 𝑥
− sin 𝑥 − 𝑥 cos 𝑥
= lim
𝑥→0 sin 𝑥
− cos 𝑥 − cos 𝑥 + 𝑥 sin 𝑥
= lim
𝑥→0 cos 𝑥
= −2.
(c) Replace each trigonometric function with its third Maclaurin polynomial, and repeat
part (b).
Solution The third Maclaurin polynomial for cos 𝑥 is 1 − 12 𝑥2 , and the third Maclaurin
polynomial for sin 𝑥 is 𝑥 − 16 𝑥3 . Substituting these polynomials for cos 𝑥 and sin 𝑥 in
𝑓 (𝑥), we obtain the function
𝑥 1 − 21 𝑥2 − 𝑥 − 61 𝑥3
[ ] [ ]
𝑓3 (𝑥) =
𝑥 − 𝑥 − 16 𝑥3
[ ]
3
𝑥 − 12 𝑥3 − 𝑥 + 61 𝑥3
= 1 3
6𝑥
− 13 𝑥3
= 1 3
6𝑥
= −2.
(d) The actual value is 𝑓 (0.1) = −1.99899998. Find the relative error for the values obtained
in parts (b) and (c).
Solution The relative error for the value obtained in part (b) is
∣ − 1.941 − (−1.99899998)∣
= 0.029,
∣ − 1.99899998∣
while the relative error for the value obtained in part (c) is
∣ − 2 − (−1.99899998)∣
= 0.0005.
∣ − 1.99899998∣
15. Use the 64-bit long real format to find the decimal equivalent of the following floating-point
machine numbers.
(a) 0 10000001010 1001001100000000000000000000000000000000000000000000
Solution The sign bit 𝑠 is 0, the exponent 𝑐 is represented by 10000001010 in binary,
which is 210 + 23 + 21 = 1024 + 8 + 2 = 1034 in decimal, and the mantissa 𝑓 is
1 1 1 1 147
𝑓 = 2−1 + 2−4 + 2−7 + 2−8 = + + + = .
2 16 128 256 256
Therefore, the value of the floating point number, denoted by 𝑥, is
𝑥 = (−1)𝑠 2𝑐−1023 (1 + 𝑓 )
( )
0 1034−1023 147
= (−1) 2 1+
256
403
= 211
256
403
= 211 8
2
= 8 ⋅ 403
= 3224.
4
(c) 0 01111111111 0101001100000000000000000000000000000000000000000000
Solution The sign bit 𝑠 is 0, the exponent 𝑐 is given by
9
∑ 210 − 1
2𝑖 = = 1023,
2−1
𝑖=0
𝑥 = (−1)𝑠 2𝑐−1023 (1 + 𝑓 )
( )
0 1023−1023 83
= (−1) 2 1+
256
339
=
256
= 1.32421875.
17. Suppose two points (𝑥0 , 𝑦0 ) and (𝑥1 , 𝑦1 ) are on a straight line with 𝑦1 ∕= 𝑦0 . Two formulas
are available to find the 𝑥-intercept of the line:
𝑥0 𝑦1 − 𝑥1 𝑦0 (𝑥1 − 𝑥0 )𝑦0
𝑥= and 𝑥 = 𝑥0 − .
𝑦1 − 𝑦0 𝑦1 − 𝑦0
5
or
(𝑥1 − 𝑥0 )𝑦0
𝑥 = 𝑥0 − ,
𝑦1 − 𝑦0
which is precisely the second formula. If we use a common denominator, then we obtain
𝑦1 − 𝑦0 (𝑥1 − 𝑥0 )𝑦0
𝑥 = 𝑥0 −
𝑦1 − 𝑦0 𝑦1 − 𝑦0
𝑥0 (𝑦1 − 𝑦0 ) − (𝑥1 − 𝑥0 )𝑦0
=
𝑦1 − 𝑦0
(𝑥0 𝑦1 − 𝑥0 𝑦0 ) − (𝑥1 𝑦0 − 𝑥0 𝑦0 )
=
𝑦1 − 𝑦0
𝑥 0 𝑦1 − 𝑥 1 𝑦0
=
𝑦1 − 𝑦0
which is precisely the first formula.
(b) Use the data (𝑥0 , 𝑦0 ) = (1.31, 3.24) and (𝑥1 , 𝑦1 ) = (1.93, 4.76) and three-digit rounding
arithmetic to compute the 𝑥-intercept both ways. Which method is better and why?
Solution Using the first formula, we obtain
1.31 ⋅ 4.76 − 1.93 ⋅ 3.24
𝑥 =
4.76 − 3.24
6.24 − 6.25
=
1.52
= 0.00658.
The exact value, to three significant digits, is -0.0116, so clearly the second formula is
better. The first formula suffers from catastrophic cancellation in the numerator.
19. The two-by-two linear system
𝑎𝑥 + 𝑏𝑦 = 𝑒,
𝑐𝑥 + 𝑑𝑦 = 𝑓,
6
where 𝑎, 𝑏, 𝑐, 𝑑, 𝑒, 𝑓 are given, can be solved for 𝑥 and 𝑦 a follows:
𝑐
set 𝑚 = , provided 𝑎 ∕= 0;
𝑎
𝑑1 = 𝑑 − 𝑚𝑏;
𝑓1 = 𝑓 − 𝑚𝑒;
𝑓1
𝑦 = ;
𝑑1
𝑒 − 𝑏𝑦
𝑥 = .
𝑎
Implement this algorithm using MATLAB and solve the following linear systems.
(a)
(b)
Solution The following function solves the general two-by-two linear system, given the values
of 𝑎, 𝑏, 𝑐, 𝑑, 𝑒 and 𝑓 .
function [x,y]=hw1prob1219(a,b,c,d,e,f)
% display error message if a is zero
if a==0,
error(’a must be nonzero’)
end
m=c/a;
d1=d-m*b;
f1=f-m*e;
y=f1/d1;
x=(e-b*y)/a;
In the following MATLAB session, this function is used to solve the systems in parts (a) and
(b).
7
>> [x,y]=hw1prob1219(1.130,-6.990,8.110,12.20,14.20,-0.1370)
x =
2.44459190435176
y =
-1.63628199543384
>> [x,y]=hw1prob1219(1.013,-6.099,-18.11,112.2,14.22,-0.1376)
x =
4.974388755065191e+002
y =
80.28948694672958
±0.𝑑1 𝑑2 𝑑3 𝑑4 × 10𝑛 , 1 ≤ 𝑑1 ≤ 9, 0 ≤ 𝑑𝑖 ≤ 9,
𝑖 = 2, 3, 4, ∣𝑛∣ ≤ 15.
( )
𝑚
What is the largest value of 𝑚 for which the binomial coefficient can be computed
𝑘
for all 𝑘 by the definition without causing overflow?
Solution The largest number that can be represented in this floating-point system is
0.9999 × 1015 = 999, 900, 000, 000, 000. Using the definition of the binomial coefficient,
overflow will occur if 𝑚! is larger than this number. This is the case if 𝑚! ≥ 18, since
18! = 6, 402, 373, 705, 728, 000 and 17! = 355, 687, 428, 096, 000. Therefore the largest
value of 𝑚 for which the binomial coefficient can be computed without causing overflow
is 17.
8
( )
𝑚
(b) Show that can also be computed by
𝑘
( ) (𝑚) (𝑚 − 1) ( )
𝑚 𝑚−𝑘+1
= ⋅⋅⋅ .
𝑘 𝑘 𝑘−1 1
Solution We have
( )
𝑚 𝑚!
=
𝑘 𝑘!(𝑚 − 𝑘)!
1 ⋅ 2 ⋅ 3 ⋅ ⋅ ⋅ ⋅ ⋅ (𝑚 − 1) ⋅ 𝑚
=
(1 ⋅ 2 ⋅ 3 ⋅ ⋅ ⋅ ⋅ ⋅ (𝑘 − 1) ⋅ 𝑘)(1 ⋅ 2 ⋅ 3 ⋅ ⋅ ⋅ ⋅ ⋅ (𝑚 − 𝑘 − 1) ⋅ (𝑚 − 𝑘))
(1 ⋅ 2 ⋅ 3 ⋅ ⋅ ⋅ ⋅ ⋅ (𝑚 − 𝑘 − 1) ⋅ (𝑚 − 𝑘))((𝑚 − 𝑘 + 1) ⋅ ⋅ ⋅ ⋅ ⋅ (𝑚 − 1) ⋅ 𝑚)
=
(1 ⋅ 2 ⋅ 3 ⋅ ⋅ ⋅ ⋅ ⋅ (𝑘 − 1) ⋅ 𝑘)(1 ⋅ 2 ⋅ 3 ⋅ ⋅ ⋅ ⋅ ⋅ (𝑚 − 𝑘 − 1) ⋅ (𝑚 − 𝑘))
(𝑚 − 𝑘 + 1) ⋅ ⋅ ⋅ ⋅ ⋅ (𝑚 − 1) ⋅ 𝑚
=
1 ⋅ 2 ⋅ 3 ⋅ ⋅ ⋅ ⋅ ⋅ (𝑘 − 1) ⋅ 𝑘
(𝑚) (𝑚 − 1) (
𝑚−𝑘+1
)
= ⋅⋅⋅ .
𝑘 𝑘−1 1
( )
𝑚
(c) What is the largest value of 𝑚 for which the binomial coefficient can be computed
3
by the formula in part (b) without causing overflow?
Solution We have
( )
𝑚 𝑚𝑚−1𝑚−2 𝑚(𝑚 − 1)(𝑚 − 2)
= = .
3 3 2 1 6
To avoid overflow, this coefficient must not exceed 0.9999 × 1015 , which implies that 𝑚
must satisfy
𝑚(𝑚 − 1)(𝑚 − 2) ≤ 6(0.9999) × 1015 ≤ 5.9994 × 1015 .
Since 𝑚(𝑚 − 1)(𝑚 − 2) ≤ 𝑚3 for any nonnegative integer 𝑚, it follows that the largest
value of 𝑚 for which the binomial coefficient can be computed is not less than the
largest value of 𝑚 for which 𝑚3 ≤ 5.9994 × 1015 . This value is (5.9994 × 1015 )1/3 ≈
(1.81706 × 105 ) ≈ 181, 706.
If we let 𝑚 = 181, 706, we obtain 𝑚(𝑚−1)(𝑚−2) ≈ 5.9993×1015 . To see if 𝑚 can be any
larger, we try 𝑚 = 181, 707 and obtain 𝑚(𝑚−1)(𝑚−2) ≈ 5.9993998×1015 , so this value
is acceptable as well. However, if we try 𝑚 = 181, 708, we obtain 𝑚(𝑚 − 1)(𝑚 − 2) ≈
5.9994988 which is too large, so we conclude that the largest value of 𝑚 is 181,707.
(d) Use the equation in part (b) and four-digit chopping arithmetic to compute the number
of possible 5-card hands in a 52-card deck. Compute the actual and relative errors.
9
Solution The number of possible 5-card hands in a 52-card deck is
( )
52 52 51 50 49 48
= .
5 5 4 3 2 1
The actual value is 2,598,960, so the absolute error is 1,960 and the relative error is
7.541 × 10−4 .
10