0% found this document useful (0 votes)
874 views10 pages

Prob 3

This document contains the solutions to homework assignment 2 for a class on numerical analysis. It includes solutions to problems involving approximating values with relative error, performing calculations with three-digit rounding, using Maclaurin series to approximate values of pi, evaluating limits of functions as values approach 0, and converting floating point numbers to decimal values.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
874 views10 pages

Prob 3

This document contains the solutions to homework assignment 2 for a class on numerical analysis. It includes solutions to problems involving approximating values with relative error, performing calculations with three-digit rounding, using Maclaurin series to approximate values of pi, evaluating limits of functions as values approach 0, and converting floating point numbers to decimal values.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Jim Lambers

MAT 460/560
Fall Semester 2009-10
Homework Assignment 2 Solution

Section 1.2
3. Suppose 𝑝∗ must approximate 𝑝 with relative error at most 10−3 . Find the largest interval in
which 𝑝∗ must lie for each value of 𝑝.
(a) 150
Solution We must have ∣𝑝∗ − 150∣/∣150∣ ≤ 10−3 , or ∣𝑝∗ − 150∣ ≤ 0.15, which yields the
interval [149.85, 150.15].
(b) 900
Solution We must have ∣𝑝∗ − 900∣ ≤ 0.9, which yields [899.1, 900.9].
(c) 1500
Solution We must have ∣𝑝∗ − 1500∣ ≤ 1.5, which yields [1498.5, 1501.5].
(d) 90
Solution We must have ∣𝑝∗ − 90∣ ≤ 0.09, which yields [89.91, 90.09].
5. Use three-digit rounding arithmetic to perform the following calculations. Compute the ab-
solute error and relative error with the exact value determined to at least five digits.
(a) 133 + 0.921
Solution 𝑝 = 133.921 and 𝑝∗ = 134, so the absolute error is 0.079 and the relative error
is 5.90 × 10−4 .
(b) 133 − 0.499
Solution 𝑝 = 132.501 and 𝑝∗ = 133, so the absolute error is 0.499 and the relative error
is 3.77 × 10−3 .
(c) (121 − 0.327) − 119
Solution 𝑝 = 1.673 and 𝑝∗ = 121 − 119 = 2, so the absolute error is 0.327 and the
relative error is 0.195.
(d) (121 − 119) − 0.327
Solution 𝑝 = 1.673 and 𝑝∗ = 1.67, so the absolute error is 0.003 and the relative error
is 1.79 × 10−3 .
13
− 67
(e) 14
2𝑒−5.4
Solution 𝑝 = 1.95354 and 𝑝∗ = (0.929 − 0.857)/(5.44 − 5.4) = 1.80, so the absolute error
is 0.154 and the relative error is 0.0786.

1
3
(f) −10𝜋 + 6𝑒 − 62
Solution 𝑝 = −15.1546 and 𝑝∗ = −31.4 + 16.3 − 0.048 = −15.1, so the absolute error is
0.0546 and the relative error is 3.6 × 10−3 .
(g) 29 ⋅ 79
( ) ( )

Solution 𝑝 = 2/7 = 0.285714, and 𝑝∗ = (0.222)(1.29) = 0.286, so the absolute error is


2.86 × 10−4 , and the relative error is 10−3 .
𝜋− 22
(h) 1
7
17
Solution 𝑝 = −0.0214963 and 𝑝∗ = (3.14 − 3.13)/(1/17) = 0, so the absolute error is
0.0215 and the relative error is 1.

9. The first three nonzero terms of the Maclaurin series for the arctangent function are 𝑥 −
(1/3)𝑥3 + (1/5)𝑥5 . Compute the absolute error and relative error in the following approxi-
mations of 𝜋 using the polynomial in place of the arctangent:

(a) 4 arctan 12 + arctan 31


[ ( ) ( )]

Solution We have
[ ( )3 ( )5 ( )3 ( )5 ]
1 1 1 1 1 1
𝜋 ≈ 4 − (1/3) + (1/5) + − (1/3) + (1/5)
2 2 2 3 3 3
[ ]
1 1 1 1 1 1
≈ 4 − + + − +
2 24 160 3 81 1215
≈ 3.14557613168724.

Since the exact value of 𝜋, to 15 significant digits, is 3.14159265358979, it follows that the
absolute error is 3.983 × 10−3 and the relative error is (3.983 × 10−3 )/𝜋 ≈ 1.268 × 10−3 .
(b) 16 arctan 15 − 4 arctan 239
( ) ( 1 )

Solution We have
[ ( )3 ( )5 ]
1 1 1
𝜋 ≈ 16 − (1/3) + (1/5) −
5 5 5
[ ) ]
1 3 1 5
( ) (
1
4 − (1/3) + (1/5)
239 239 239
[ ] [ ]
1 1 1 1 1 1
≈ 16 − + −4 − +
5 375 15625 239 40955757 3899056325995
≈ 3.14162102932503.

Since the exact value of 𝜋, to 15 significant digits, is 3.14159265358979, it follows that the
absolute error is 2.838 × 10−5 and the relative error is (2.838 × 10−5 )/𝜋 ≈ 9.032 × 10−6 .

2
11. Let
𝑥 cos 𝑥 − sin 𝑥
𝑓 (𝑥) = .
𝑥 − sin 𝑥
(a) Find lim𝑥→0 𝑓 (𝑥).
Solution If we substitute 𝑥 = 0, we obtain 0/0, which is an indeterminate form. Using
l’Hospital’s Rule three times, we obtain
𝑥 cos 𝑥 − sin 𝑥
lim 𝑓 (𝑥) = lim
𝑥→0 𝑥→0 𝑥 − sin 𝑥
cos 𝑥 − 𝑥 sin 𝑥 − cos 𝑥
= lim
𝑥→0 1 − cos 𝑥
−𝑥 sin 𝑥
= lim
𝑥→0 1 − cos 𝑥
− sin 𝑥 − 𝑥 cos 𝑥
= lim
𝑥→0 sin 𝑥
− cos 𝑥 − cos 𝑥 + 𝑥 sin 𝑥
= lim
𝑥→0 cos 𝑥
= −2.

(b) Use four-digit rounding arithmetic to evaluate 𝑓 (0.1).


Solution We have
(0.1) cos 0.1 − sin 0.1
𝑓 (0.1) =
0.1 − sin 0.1
(0.1)(0.995) − 0.09983

0.1 − 0.09983
0.0995 − 0.09983

0.00017
−0.00033

0.00017
≈ −1.941.

(c) Replace each trigonometric function with its third Maclaurin polynomial, and repeat
part (b).
Solution The third Maclaurin polynomial for cos 𝑥 is 1 − 12 𝑥2 , and the third Maclaurin
polynomial for sin 𝑥 is 𝑥 − 16 𝑥3 . Substituting these polynomials for cos 𝑥 and sin 𝑥 in
𝑓 (𝑥), we obtain the function

𝑥 1 − 21 𝑥2 − 𝑥 − 61 𝑥3
[ ] [ ]
𝑓3 (𝑥) =
𝑥 − 𝑥 − 16 𝑥3
[ ]

3
𝑥 − 12 𝑥3 − 𝑥 + 61 𝑥3
= 1 3
6𝑥
− 13 𝑥3
= 1 3
6𝑥
= −2.

(d) The actual value is 𝑓 (0.1) = −1.99899998. Find the relative error for the values obtained
in parts (b) and (c).
Solution The relative error for the value obtained in part (b) is
∣ − 1.941 − (−1.99899998)∣
= 0.029,
∣ − 1.99899998∣
while the relative error for the value obtained in part (c) is
∣ − 2 − (−1.99899998)∣
= 0.0005.
∣ − 1.99899998∣
15. Use the 64-bit long real format to find the decimal equivalent of the following floating-point
machine numbers.
(a) 0 10000001010 1001001100000000000000000000000000000000000000000000
Solution The sign bit 𝑠 is 0, the exponent 𝑐 is represented by 10000001010 in binary,
which is 210 + 23 + 21 = 1024 + 8 + 2 = 1034 in decimal, and the mantissa 𝑓 is
1 1 1 1 147
𝑓 = 2−1 + 2−4 + 2−7 + 2−8 = + + + = .
2 16 128 256 256
Therefore, the value of the floating point number, denoted by 𝑥, is

𝑥 = (−1)𝑠 2𝑐−1023 (1 + 𝑓 )
( )
0 1034−1023 147
= (−1) 2 1+
256
403
= 211
256
403
= 211 8
2
= 8 ⋅ 403
= 3224.

(b) 1 10000001010 1001001100000000000000000000000000000000000000000000


Solution This number is identical to the number in part (a), except that the sign bit 𝑠
is 1 instead of 0, so the value is −3224.

4
(c) 0 01111111111 0101001100000000000000000000000000000000000000000000
Solution The sign bit 𝑠 is 0, the exponent 𝑐 is given by
9
∑ 210 − 1
2𝑖 = = 1023,
2−1
𝑖=0

and the mantissa 𝑓 is


1 1 1 1 83
𝑓 = 2−2 + 2−4 + 2−7 + 2−8 = + + + = .
4 16 128 256 256
Therefore, the value of the floating point number, denoted by 𝑥, is

𝑥 = (−1)𝑠 2𝑐−1023 (1 + 𝑓 )
( )
0 1023−1023 83
= (−1) 2 1+
256
339
=
256
= 1.32421875.

(d) 0 01111111111 0101001100000000000000000000000000000000000000000001


Solution This number is identical to the one in part (c), except that there is an ad-
ditional digit in the mantissa corresponding to 2−52 . It follows that the value of this
number, denoted by 𝑥, is

𝑥 = 1.32421875 + 2−52 ≈ 1.3242187500000002220446049250313.

17. Suppose two points (𝑥0 , 𝑦0 ) and (𝑥1 , 𝑦1 ) are on a straight line with 𝑦1 ∕= 𝑦0 . Two formulas
are available to find the 𝑥-intercept of the line:

𝑥0 𝑦1 − 𝑥1 𝑦0 (𝑥1 − 𝑥0 )𝑦0
𝑥= and 𝑥 = 𝑥0 − .
𝑦1 − 𝑦0 𝑦1 − 𝑦0

(a) Show that both formulas are algebraically correct.


Solution The equation of the line is
𝑦1 − 𝑦0
𝑦= (𝑥 − 𝑥0 ) + 𝑦0 .
𝑥1 − 𝑥0
Setting 𝑦 = 0 and solving for 𝑥, we obtain
𝑥1 − 𝑥0
−𝑦0 = 𝑥 − 𝑥0
𝑦1 − 𝑦0

5
or
(𝑥1 − 𝑥0 )𝑦0
𝑥 = 𝑥0 − ,
𝑦1 − 𝑦0
which is precisely the second formula. If we use a common denominator, then we obtain
𝑦1 − 𝑦0 (𝑥1 − 𝑥0 )𝑦0
𝑥 = 𝑥0 −
𝑦1 − 𝑦0 𝑦1 − 𝑦0
𝑥0 (𝑦1 − 𝑦0 ) − (𝑥1 − 𝑥0 )𝑦0
=
𝑦1 − 𝑦0
(𝑥0 𝑦1 − 𝑥0 𝑦0 ) − (𝑥1 𝑦0 − 𝑥0 𝑦0 )
=
𝑦1 − 𝑦0
𝑥 0 𝑦1 − 𝑥 1 𝑦0
=
𝑦1 − 𝑦0
which is precisely the first formula.
(b) Use the data (𝑥0 , 𝑦0 ) = (1.31, 3.24) and (𝑥1 , 𝑦1 ) = (1.93, 4.76) and three-digit rounding
arithmetic to compute the 𝑥-intercept both ways. Which method is better and why?
Solution Using the first formula, we obtain
1.31 ⋅ 4.76 − 1.93 ⋅ 3.24
𝑥 =
4.76 − 3.24
6.24 − 6.25
=
1.52
= 0.00658.

Using the second formula, we obtain


(1.93 − 1.31)3.24
𝑥 = 1.31 −
4.76 − 3.24
0.62 ⋅ 3.24
= 1.31 −
1.52
2.01
= 1.31 −
1.52
= 1.31 − 1.32
= −0.01.

The exact value, to three significant digits, is -0.0116, so clearly the second formula is
better. The first formula suffers from catastrophic cancellation in the numerator.
19. The two-by-two linear system

𝑎𝑥 + 𝑏𝑦 = 𝑒,
𝑐𝑥 + 𝑑𝑦 = 𝑓,

6
where 𝑎, 𝑏, 𝑐, 𝑑, 𝑒, 𝑓 are given, can be solved for 𝑥 and 𝑦 a follows:
𝑐
set 𝑚 = , provided 𝑎 ∕= 0;
𝑎
𝑑1 = 𝑑 − 𝑚𝑏;
𝑓1 = 𝑓 − 𝑚𝑒;
𝑓1
𝑦 = ;
𝑑1
𝑒 − 𝑏𝑦
𝑥 = .
𝑎
Implement this algorithm using MATLAB and solve the following linear systems.

(a)

1.130𝑥 − 6.990𝑦 = 14.20


8.110𝑥 + 12.20𝑦 = −0.1370

(b)

1.013𝑥 − 6.099𝑦 = 14.22


−18.11𝑥 + 112.2𝑦 = −0.1376

Solution The following function solves the general two-by-two linear system, given the values
of 𝑎, 𝑏, 𝑐, 𝑑, 𝑒 and 𝑓 .

function [x,y]=hw1prob1219(a,b,c,d,e,f)
% display error message if a is zero
if a==0,
error(’a must be nonzero’)
end
m=c/a;
d1=d-m*b;
f1=f-m*e;
y=f1/d1;
x=(e-b*y)/a;

In the following MATLAB session, this function is used to solve the systems in parts (a) and
(b).

7
>> [x,y]=hw1prob1219(1.130,-6.990,8.110,12.20,14.20,-0.1370)

x =

2.44459190435176

y =

-1.63628199543384

>> [x,y]=hw1prob1219(1.013,-6.099,-18.11,112.2,14.22,-0.1376)

x =

4.974388755065191e+002

y =

80.28948694672958

25. The binomial coefficient ( )


𝑚 𝑚!
=
𝑘 𝑘!(𝑚 − 𝑘)!
describes the number of ways of choosing a subset of 𝑘 objects from a set of 𝑚 elements.

(a) Suppose decimal machine numbers are of the form

±0.𝑑1 𝑑2 𝑑3 𝑑4 × 10𝑛 , 1 ≤ 𝑑1 ≤ 9, 0 ≤ 𝑑𝑖 ≤ 9,
𝑖 = 2, 3, 4, ∣𝑛∣ ≤ 15.
( )
𝑚
What is the largest value of 𝑚 for which the binomial coefficient can be computed
𝑘
for all 𝑘 by the definition without causing overflow?
Solution The largest number that can be represented in this floating-point system is
0.9999 × 1015 = 999, 900, 000, 000, 000. Using the definition of the binomial coefficient,
overflow will occur if 𝑚! is larger than this number. This is the case if 𝑚! ≥ 18, since
18! = 6, 402, 373, 705, 728, 000 and 17! = 355, 687, 428, 096, 000. Therefore the largest
value of 𝑚 for which the binomial coefficient can be computed without causing overflow
is 17.

8
( )
𝑚
(b) Show that can also be computed by
𝑘
( ) (𝑚) (𝑚 − 1) ( )
𝑚 𝑚−𝑘+1
= ⋅⋅⋅ .
𝑘 𝑘 𝑘−1 1

Solution We have
( )
𝑚 𝑚!
=
𝑘 𝑘!(𝑚 − 𝑘)!
1 ⋅ 2 ⋅ 3 ⋅ ⋅ ⋅ ⋅ ⋅ (𝑚 − 1) ⋅ 𝑚
=
(1 ⋅ 2 ⋅ 3 ⋅ ⋅ ⋅ ⋅ ⋅ (𝑘 − 1) ⋅ 𝑘)(1 ⋅ 2 ⋅ 3 ⋅ ⋅ ⋅ ⋅ ⋅ (𝑚 − 𝑘 − 1) ⋅ (𝑚 − 𝑘))
(1 ⋅ 2 ⋅ 3 ⋅ ⋅ ⋅ ⋅ ⋅ (𝑚 − 𝑘 − 1) ⋅ (𝑚 − 𝑘))((𝑚 − 𝑘 + 1) ⋅ ⋅ ⋅ ⋅ ⋅ (𝑚 − 1) ⋅ 𝑚)
=
(1 ⋅ 2 ⋅ 3 ⋅ ⋅ ⋅ ⋅ ⋅ (𝑘 − 1) ⋅ 𝑘)(1 ⋅ 2 ⋅ 3 ⋅ ⋅ ⋅ ⋅ ⋅ (𝑚 − 𝑘 − 1) ⋅ (𝑚 − 𝑘))
(𝑚 − 𝑘 + 1) ⋅ ⋅ ⋅ ⋅ ⋅ (𝑚 − 1) ⋅ 𝑚
=
1 ⋅ 2 ⋅ 3 ⋅ ⋅ ⋅ ⋅ ⋅ (𝑘 − 1) ⋅ 𝑘
(𝑚) (𝑚 − 1) (
𝑚−𝑘+1
)
= ⋅⋅⋅ .
𝑘 𝑘−1 1
( )
𝑚
(c) What is the largest value of 𝑚 for which the binomial coefficient can be computed
3
by the formula in part (b) without causing overflow?
Solution We have
( )
𝑚 𝑚𝑚−1𝑚−2 𝑚(𝑚 − 1)(𝑚 − 2)
= = .
3 3 2 1 6

To avoid overflow, this coefficient must not exceed 0.9999 × 1015 , which implies that 𝑚
must satisfy
𝑚(𝑚 − 1)(𝑚 − 2) ≤ 6(0.9999) × 1015 ≤ 5.9994 × 1015 .
Since 𝑚(𝑚 − 1)(𝑚 − 2) ≤ 𝑚3 for any nonnegative integer 𝑚, it follows that the largest
value of 𝑚 for which the binomial coefficient can be computed is not less than the
largest value of 𝑚 for which 𝑚3 ≤ 5.9994 × 1015 . This value is (5.9994 × 1015 )1/3 ≈
(1.81706 × 105 ) ≈ 181, 706.
If we let 𝑚 = 181, 706, we obtain 𝑚(𝑚−1)(𝑚−2) ≈ 5.9993×1015 . To see if 𝑚 can be any
larger, we try 𝑚 = 181, 707 and obtain 𝑚(𝑚−1)(𝑚−2) ≈ 5.9993998×1015 , so this value
is acceptable as well. However, if we try 𝑚 = 181, 708, we obtain 𝑚(𝑚 − 1)(𝑚 − 2) ≈
5.9994988 which is too large, so we conclude that the largest value of 𝑚 is 181,707.
(d) Use the equation in part (b) and four-digit chopping arithmetic to compute the number
of possible 5-card hands in a 52-card deck. Compute the actual and relative errors.

9
Solution The number of possible 5-card hands in a 52-card deck is
( )
52 52 51 50 49 48
= .
5 5 4 3 2 1

Using four-digit chopping arithmetic, we obtain


( )
52
≈ (10.4)(12.75)(16.66)(24.5)(48)
5
≈ (132.6)(16.66)(24.5)(48)
≈ (2209)(24.5)(48)
≈ (54, 120)(48)
≈ 2, 597, 000.

The actual value is 2,598,960, so the absolute error is 1,960 and the relative error is
7.541 × 10−4 .

10

You might also like