0% found this document useful (0 votes)
74 views3 pages

Assign 1 MTH308 Sol

This document contains true/false questions and explanations about numerical errors and conditioning in floating-point arithmetic. It addresses topics like ill-conditioned problems, algorithm stability, precision vs accuracy, sources of error, and properties of floating-point number systems like non-associativity, cancellation, and exceptional values.

Uploaded by

Shubh Bajaj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views3 pages

Assign 1 MTH308 Sol

This document contains true/false questions and explanations about numerical errors and conditioning in floating-point arithmetic. It addresses topics like ill-conditioned problems, algorithm stability, precision vs accuracy, sources of error, and properties of floating-point number systems like non-associativity, cancellation, and exceptional values.

Uploaded by

Shubh Bajaj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 3

1.1.

True
1.2. False
1.3. False
1.4. False
1.5. False
1.6. False
1.7. False
1.8. False
1.9. False
1.10. False
Explanation:
1.1: A problem is considered ill-conditioned if small changes in the problem data
results in large changes in the solution.
1.2: Using higher-precision arithmetic can help mitigate the effects of an ill-
conditioned problem, but it will not make the problem better conditioned.
1.3: The conditioning of a problem is independent of the algorithm used to solve
it.
1.4: A good algorithm will produce an accurate solution for a well-conditioned
problem, but it will not produce an accurate solution for an ill-conditioned
problem.
1.5: The choice of algorithm can affect the propagated data error, as different
algorithms may have different levels of numerical stability.
1.6: A stable algorithm applied to a well-conditioned problem will produce an
accurate solution, but a stable algorithm applied to an ill-conditioned problem may
not produce an accurate solution.
1.7: Real arithmetic operations on exactly representable floating-point numbers may
not be representable as a floating-point number due to finite precision in the
floating-point system.
1.8: Floating-point numbers are not distributed uniformly throughout their range,
as the spacing between numbers increases as the absolute value of the numbers
increase.
1.9: Floating-point addition is not associative or commutative due to the finite
precision of the numbers.
1.10: The underflow level is the smallest positive number that can be represented
in the floating-point system, not the smallest positive number that perturbs the
number 1 when added to it.
1.11. False. The mantissa in IEEE double precision floating-point arithmetic is
typically 53 bits long, while the mantissa in IEEE single precision is typically 24
bits long.
1.12. A well-posed problem is one that has a unique solution, the solution depends
continuously on the data, and the problem is stable in the sense that small changes
in the data lead to small changes in the solution.
1.13. Three sources of error in scientific computation include: truncation error,
round-off error, and numerical instability.
1.14. Truncation error occurs when a mathematical operation is approximated by a
simpler operation, while rounding error occurs when the result of a calculation is
rounded to a specific number of digits.
1.15. Absolute error is the difference between the exact value and the
approximation, while relative error is the ratio of the absolute error to the exact
value.
1.16. Computational error refers to errors introduced by the numerical methods and
algorithms used to solve a problem, while propagated data error refers to errors
that result from inaccuracies in the input data.
1.17. Precision refers to the number of digits used to represent a value, while
accuracy refers to how close a measurement or calculation is to the true value.
1.18. (a) The conditioning of a problem refers to how sensitive the solution of the
problem is to changes in the data. (b) The conditioning of a problem is not
affected by the algorithm used to solve it. (c) The conditioning of a problem is
affected by the precision of the arithmetic used to solve it.
1.19. If a computational problem has a condition number of 1, it is considered
well-conditioned as it is not sensitive to small changes in the data.
1.20. Absolute condition number is the ratio of the maximum change in the output to
the maximum change in the input, while relative condition number is the ratio of
the absolute condition number to the absolute value of the output.
1.21. An inverse problem is one where the goal is to determine the cause of a given
effect, rather than the effect of a given cause. The conditioning of a problem and
its inverse are related in that the inverse of a well-conditioned problem is also
well-conditioned, while the inverse of an ill-conditioned problem is also ill-
conditioned.
1.22. (a) The backward error in a computed result refers to the difference between
the exact solution of a problem and the solution that is actually computed. (b) An
approximate solution to a given problem is considered good according to backward
error analysis if the backward error is small.
1.23. (a) Propagated data error is not affected by the stability of the algorithm.
(b) The accuracy of the computed result is affected by the stability of the
algorithm. (c) The conditioning of the problem is not affected by the stability of
the algorithm.
1.24. (a) Forward error is the difference between the exact solution of a problem
and the approximate solution that is computed, while backward error is the
difference between the exact solution and the input to the algorithm that would
result in the computed approximate solution. (b) Forward error and backward error
are related quantitatively by the condition number of the problem.
1.25. In a given floating-point number system, the machine numbers are distributed
along the real line in a uniform fashion with a fixed distance between each number,
called the "step size."
1.26. Overflow is generally more harmful because it can lead to incorrect or
meaningless results, while underflow can just lead to a loss of precision.
1.27. In floating-point arithmetic, an overflow can occur in (c) Multiplication if
the product of two numbers is greater than the largest representable number in the
floating-point system.
1.28. In floating-point arithmetic, an underflow can occur in (a) Addition, (b)
Subtraction, (c) Multiplication, and (d) Division when the result of the operation
is smaller than the smallest representable number in the floating-point system.
1.29. Two reasons why floating-point number systems are usually normalized are: it
allows for a more efficient use of the bits used to represent the number and it
also ensures that the numbers are uniformly distributed over the range of
representable numbers.
1.30. In a floating-point system, the maximum relative error in representing a
given real number by a machine number is determined by the unit roundoff, which is
the smallest representable difference between two distinct numbers in the system.
1.31. (a) In a floating-point system, "rounding toward zero" means that the number
is rounded to the nearest representable number with a smaller absolute value.
"Round to nearest" means that the number is rounded to the nearest representable
number. (b) "Round to nearest" is more accurate than "rounding toward zero" because
it reduces the chance of large errors. (c) The difference in the unit roundoff
between these two rounding rules can be significant, especially for numbers close
to zero.
1.32. In a p-digit binary floating-point system with rounding to nearest, the value
of the unit roundoff emach is 2^-p.
1.33. In a floating-point system with gradual underflow (subnormal numbers), the
representation of each number is still unique. This is because subnormal numbers
are represented with a denormalized mantissa and an exponent that is smaller than
the minimum normal exponent.
1.34. In a floating-point system, the product of two machine numbers is not usually
exactly representable in the floating-point system because the product of two
numbers can have more digits than can be represented in the system, so it must be
rounded to the nearest representable number.
1.35. In a floating-point system, the quotient of two nonzero machine numbers is
not always exactly representable in the floating-point system because the quotient
of two numbers can have more digits than can be represented in the system, so it
must be rounded to the nearest representable number.
1.36. (a) An example of floating-point addition not being associative would be:
(0.1 + 0.2) + 0.3 = 0.6 and 0.1 + (0.2 + 0.3) = 0.6000000000000001 due to the fact
that 0.1 and 0.2 cannot be exactly represented in a floating-point system and thus
the operation is not associative.
(b) An example of floating-point multiplication not being associative would be:
(1e16 * 0.1) * 0.1 = 1e15 and 1e16 * (0.1 * 0.1) = 999999999999999.0 due to the
fact that the intermediate results of the operation are not exactly representable
in the floating-point system and thus the operation is not associative.
1.37. (a) Cancellation occurs in a floating-point system when subtracting two
numbers that are very close in value, resulting in a loss of precision.
(b) The occurrence of cancellation does not imply that the true result of the
specific operation causing it is not exactly representable in the floating-point
system because cancellation can occur even when the true result is representable in
the system.
(c) Cancellation is usually bad because it can lead to a significant loss of
precision in the final result.
1.38. An example of a number whose decimal representation is finite but whose
binary representation is not is the decimal number 0.1, which has a non-repeating
decimal representation but its binary representation is not exact.
1.39. Examples of floating-point arithmetic operations that would produce the
exceptional value Inf: division by zero, square root of a negative number,
logarithm of a non-positive number. Examples of floating-point arithmetic
operations that would produce the exceptional value NaN: 0/0, sqrt(-1), log(-1).
1.40. In a floating-point system with base 0, precision p, and rounding to nearest,
the maximum relative error in representing any nonzero real number within the range
of the system is 2^-p.
1.41. The cancellation that occurs when two numbers of similar magnitude are
subtracted is often bad because it can lead to a significant loss of precision in
the final result. This is due to the fact that the significant digits in the two
numbers being subtracted are likely to cancel out, leaving only the least
significant digits, which can introduce a large relative error. Even though the
result may be exactly correct for the actual operands involved, the loss of
precision can cause problems when the result is used in further calculations.
1.44. (a) The unit roundoff, epsilon (epsilon mach or emach), is the smallest
representable positive number in a floating-point system that is greater than zero.
It represents the precision of the system. The underflow level, UFL, is the
smallest representable positive number in a floating-point system that is greater
than zero. It represents the smallest positive number that can be represented in
the system.
(b) The unit roundoff depends only on the number of digits in the mantissa field.
(c) The underflow level depends only on the number of digits in the exponent field.
(d) The unit roundoff does not depend on the rounding rule used.
(e) The underflow level is not affected by allowing subnormal numbers.
1.45. To minimize rounding error when summing a monotonically decreasing, finite
sequence of positive numbers, the sequence should be summed in decreasing order.
This is because adding the larger numbers first will minimize the impact of the
relative error caused by the smaller numbers.
1.46. Cancellation is considered a form of rounding error because it is caused by
the loss of precision in a floating-point system, which results in an approximation
of the true value.

You might also like