0% found this document useful (0 votes)

14 views18 pages

MTH 214 Accuracy in Numerical Calculations and Error Analysis

The document discusses the representation of numbers in computer memory, focusing on binary digits, integer and floating point representations, and their arithmetic operations. It explains how numbers are stored in binary format, including the use of sign bits and normalization in floating point representation. Additionally, it covers the processes for addition, subtraction, and multiplication of floating point numbers, highlighting potential errors such as overflow and underflow.

Uploaded by

fateemaah5

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views18 pages

MTH 214 Accuracy in Numerical Calculations and Error Analysis

Uploaded by

fateemaah5

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Accuracy in Numerical Calculations and Error Analysis

REPRESENTATION OF NUMBERS IN COMPUTER MEMORY

The smallest unit of information stored in the memory is the binary digit and abbreviated
as BIT. It represents either “0” or “1”. The instructions given to a computer and data to
be processed in group of bits are classified as follows:

NIBBLE: A string of four bits or binary representation of four bits is called a NIBBLE.

BYTE: The string of eight bits is called a BYTE

COMPUTER WORD: is a string of bits whose size called the WORD LENGTH or WORD
SIZE is fixed for a specific computer though it may vary from computer to computer.

WORD LENGTH: The WORD LENGTH may be 1 Byte, 2 Bytes or 4 Bytes or even larger.

All computers are designed to use binary digits to represent numbers and other
information. The memory is organized into strings of bits called words. Computers read
decimal numbers supplied by humans but convert them automatically into binary numbers
for internal use. These binary numbers may also be expressed in the octal or hexadecimal
form. For output, the numbers are reconverted to decimal form for human use.

INTEGER REPRESENTATION

Decimal numbers are first converted into binary equivalent and then expressed in either
integer or floating point form. In the integer representation the decimal or binary point is
always fixed to the right of the least significant digit and therefore fractions are not
included. The magnitude of the number is restricted to 2n -1, where n is the word length in
bits. Negative numbers are stored by using the 2’s compliment. This is done by taking the
1’s compliment of the binary representation of the positive number and then adding 1 to it.

Example: Represent -13 in binary form

Solution 13 = 01101

1’s compliment = 10010

+00001

2’s compliment = 10011

Thus -13 = 10011

Page 1 of 18
Note: The extra bit to the left most of the binary number indicates the sign bit. “0”
indicating the number is positive and “1” indicating the number is negative. If one bit is
reserved to represent the sign of the number, there are only n-1 bits to represent the
number. Thus a 16-bit word can contain numbers from

-215 to 215 -1 (i.e. -32768 to 32767).

Example: Show that the number -32768 is represented in a 16-bit word as follows:

1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

-32768 = (-32767) + (-1)

32767 = 0111 1111 1111 1111

1’s complement = 1000 0000 0000 0000

= +0000 0000 0000 0001

-32767 = 1000 0000 0000 0001 (x)

1 = 0000 0000 0000 0001

1’s complement = 1111 1111 1111 1110

= +0000 0000 0000 0001

-1 = 1111 1111 1111 1111 (y)

-32768 = 1000 0000 0000 0000 (x) + (y)

FLOATING POINT REPRESENTATION

Consider the decimal number 345.876, which may be written as:

(i) 345.876 (ii) 0.345876 x 103

If a register is capable of storing 6 digits and a sign bit and this register is split into two
parts i.e. one part containing the integral portion of the number and other part containing
the fractional portion and the decimal located between the two parts of the register. The

Page 2 of 18
first drawback of this scheme is that the range of numbers which can be represented
using this scheme is limited from -999.999 to +999.999.

In the floating point representation, the number is written as a fraction multiplied by a

power of 10. The fractional part is known as mantissa and the power of ten which
multiplies the fraction is known as the exponent or index or power. 3.456 x 104 in the
scientific form will be written as 3.456E4 in the floating point format.

A floating point number is said to be normalized if the most significant digit of the
mantissa is non-zero. The shifting of the decimal point to the left of the most significant
digit is called normalization and the real numbers represented in this form are known as
normalization floating point number. The mantissa of the floating point number satisfies
the following inequality:

For positive numbers: 0.1<= mantissa <= 1.0

For negative numbers: -1.0<= mantissa <= -0.1

e.g. 0.003456E-3 is expressed as 0.3456E-5 in the normalized floating point

representation. The real numbers are stored in computer as normalized binary floating
point numbers. A floating point number are represented in four consecutive bytes. The
mantissa is represented by lower order three bytes and the exponent in the most
significant byte.

Bias Exponent= 8 Sign bit Mantissa = 23 bits

bits
Fig.1. IEEE32-bit format for storing real numbers in floating point format

The actual exponent values represented will be -128 to +127. A normal exponent of 8-bit
normally can represent exponent values as 0 to 255. Thus we are adding 128 in the biased
exponent. All these are positive values and therefore this technique eliminates the use of
sign bit. The sign of the floating point number is negative and if it is 1, then the floating
point number is positive. The rest of 23 bits represents the bit pattern of the mantissa.

Example: Represent IEEE 32-bit format for 12.6875 in normalized floating point form

Solution:

*The binary equivalent of the decimal number 12.6875 is 1100.1011

*The normalized floating point number of 1100.1011 is 0.11001011 x 24.

Page 3 of 18
*The mantissa 11001011, when extended to 23 bits, by adding 0’s on the right, becomes

1 1 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
*The exponent of floating point number is 4. Thus we are adding constant 128 in the
biased exponent. The modified exponent becomes 132 and its binary equivalent is
10000100.

* The given number is positive. The sign bit is 0.

*Combining the result of all the above steps, we get the final representative of 12.6875 is

1 0 0 0 0 1 0 0 0 1 1 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Example: Represent IEEE 32-bit format for -12.6875 in normalized floating point form.

*The binary equivalent of the decimal number 12.6875 is 1100.1011

*The normalized floating point number of 1100.1011 is 0.11001011 x 24.

*The mantissa 11001011, when extended to 23 bits, by adding 0’s on the right, becomes

1 1 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
*Taking 2’s complement of the above 23-bit pattern, we get

0 0 1 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
*The exponent of floating point number is 4. Thus we are adding constant 128 in the
biased exponent. The modified exponent becomes 132 and its binary equivalent is
10000100.

*The given number is negative. Thus the sign bit is 1.

* Combining the result of all the above steps, we get the final representation of -12.6875
is

1 0 0 0 0 1 0 0 1 0 0 1 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
FLOATING POINT ARITHMETIC OPERATIONS

To perform arithmetic operations with numbers in normalized floating point having four-
digit mantissa and two-digit exponent. Let assume that mantissa retained by a hypothetical
computer is from -0.9999 to +0.9999 and exponent is from -99 to +99.

Page 4 of 18
ADDITION OPERATION

For addition it is necessary that both operands have the same exponent. If the exponents
are not equal, then the exponent of the numbers with smaller exponent is made equal to
the larger exponent and its mantissa is modified. This is done by shifting the decimal point
to the left by number of places equal to the positive difference between the two
exponents. Now the mantissas of the numbers are added.

The next phase normalizes the result. Normalization consists of shifting significant digits
left until the most significant digit is non-zero. Each shift causes a decrement of the
exponent, and could cause an exponent underflow. Finally, the result must be rounded off.

To perform arithmetic operations with numbers in normalized floating point form, the
mantissa retained by a hypothetical computer is from -0.9999 to 0.9999.

In case of sum, the mantissa of the sum (before normalization) can be maximum of 1.9999
now we need to shift the decimal point to the left by one position in order to normalize it.
As a result, the exponent of the sum is increased by 1. Thus because of normalization
process, the exponent of the sum may become greater than +99. This exceeds the largest
number which a computer can store. This condition is called overflow, and the computer
will indicate this error.

Example: Add 0.8906E7 to 0.8761E5

Solution

The decimal point of the mantissa of 0.8761 is shifted by 2 (7 - 5) position to the left and
the exponent is increased by 2. The number after normalization becomes 0.0087E7,
whereas the digit 1 is chopped off. Now these numbers can be added as follows:

Addend 0.0087E7

Augend 0.8906E7

Sum 0.8993E7

Page 5 of 18
Example: Add 0.9754E8 to 0.9871E9

Solution

The decimal point of the mantissa of 0.9754 is shifted by 1 (9 – 8) position to the left and
the exponent is increased by 1. The number after normalization becomes 0.0975E9,
whereas the digit 4 is chopped off. Now these numbers can be added as follows.

Addend 0.9871E9

Augend 0.0975E9

Sum 1.0846E9

Since the mantissa of the sum is greater than 1.0, so the decimal point is shifted to left
by one position and the exponent is increased by 1. The result after normalization becomes
0.1084E10, whereas the digit 6 is chopped off.

Example: Add 0.9896E99 to 0.9278E98.

Solution

The decimal point of the mantissa of 0.9278 is shifted by 1 (99 – 98) position to the left
and the exponent is increased by 1. The number after normalization becomes 0.0927E99,
whereas the digit 8 is chopped off. Now these numbers can be added as follows:

Addend 0.9896E99

Augend 0.0927E99

Sum 1.0823E99

Since the mantissa of the sum is greater than 1.0, so the decimal point is shifted to the
left by one position and the exponent is increased by 1. The result after normalization
becomes 0.1082E100, the digit 3 is chopped off. This number is greater than the largest
number which a hypothetical computer can handle, it is a case of overflow and the
computer will indicate.

Page 6 of 18
SUBTRACTION OPERATION

Similar to Addition, in subtraction it is necessary that both operands have the same
exponent. If the exponents are not equal, then the exponent of the numbers with smaller
exponent is made equal to the larger exponent and its mantissa is modified. This is done by
shifting the decimal point to the left by number of places equal to the positive difference
between the two exponents. Now the mantissa of these numbers are subtracted.

In the case of difference, the mantissa of the different (before normalization) will be in
the range 0.0000 to 0.9999. These extreme values will occur in case the numbers are equal
to the mantissa of the number to be subtracted is 0.0000 respectively. Therefore, the
decimal may have to shift to right by more than one position. As a result, the exponent is
decreased by one for each shift of the mantissa. Thus because of normalization process,
the exponent of the difference may become less than -99, which is smaller than the
smallest number which a computer can store. This condition is called underflow, and
computer will indicate this error.

Example: Subtract 0.6434E3 from 0.4217E5

Solution:

The number 0.6434E3 after normalization obtained is 0.0064E5. The subtraction process
is as

Minuend 0.4217E5

Subtrahend 0.0064E5

Difference 0.4153E5

Example: Subtract 0.7673E7 from 0.7678E7.

Solution:

Since the exponents are already equal, the given numbers can be directly subtracted as

Minuend 0.7678E7

Subtrahend 0.7673E7

Difference 0.0005E7

Page 7 of 18
Since the last three significant digits are zero, thus normalize the number 0.0005E7. This
can be done by shifting the decimal point to right by three positions and the exponent is
decreased by 3. Therefore, the resultant difference becomes 0.5000E4

Example: Subtract 0.8678E-99 from 0.8691E-99

Solution:

Since the exponents are already equal, the given numbers can be directly subtracted as:

Minuend 0.8691E-99

Subtrahend 0.8678E-99

Difference 0.0013E-99

Since the last two significant digits are zero, thus normalizing the number 0.0013E-99.
This can be done by shifting the decimal point to the right by two positions and the
exponent is decreased by 2. Therefore, the resultant difference becomes 0.1300E-101.
Since the number is smaller than the smallest number which our hypothetical computer can
handle. It is a case of underflow and the computer will indicate this error.

MULTIPLICATION OPERATION

To multiply two numbers given in the normalized floating point form, multiply their
mantissas and their exponents are added. After multiplication of the mantissas, the
resulting mantissa is normalized and the exponent is adjusted. The magnitude of the
product will be greater than 1.0 but less than 10.0. Therefore, at most, the decimal point
of the mantissa of the product will be shifted one position left. As a result, the exponent
of the product can become +99. Now if mantissa is positive, this results in overflow and if
mantissa is negative, then the result is an underflow.

Example: Multiply 0.7891E8 by 0.7632E4.

Solution: 0.7891E8 x 0.7632E4 = (0.7891 x 0.7632) E (8+4) = 0.60224112E12

A computer retained with 4-decimal digit mantissa, so mantissa becomes 0.6022E12 after
truncating the mantissa of the product to four digits.

Page 8 of 18
Example: Multiply 0.1191E8 by 0.1232E-4

Solution: 0.1191E8 x 0.1232E-4 = (0.1191 x 0.1232) E (8 - 4) =0.01467312E4

Since the last significant digit is zero, thus normalizing the number 0.1467312E3. We
obtain 0.1467E3 after truncating the mantissa of the product to four digits.

Example: Multiply 0.3467E89 by 0.6789E45

Solution: 0.3467E89 x 0.6789E45 = (0.3467 x 0.6789) E (89 + 45) = 0.23537463E134

We obtain 0.2353E134 after truncating the mantissa of the product to four digits. Since
this number is greater than the largest number so it is a case of overflow and the
computer will indicate this error.

Example: Multiply 0.8965E-57 by 0.3218E-55

Solution: 0.8965E-57 x 0.3218E-55 = (0.8965 x 0.3218) E (-57-55) = 0.2884937 E (-112)

We obtain 0.2884E-112 after truncating the mantissa of the product to four digits. Since
this number is smaller than the smallest number so it is a case of underflow and the
computer will indicate the error.

DIVISION OPERATION

To divide two numbers given in the normalized floating point form, we divide their
mantissas and their exponents are subtracted. After division of the mantissa, the
resulting mantissa is normalized and the exponent is suitably adjusted. Note that the
magnitude of the quotient will be greater than 1.0 but less than 10.0. Therefore, at most,
the decimal point of the mantissa of the quotient will shift by one position left. As a
result, the exponent of the quotient is increased by 1. Thus because of normalization
process, the exponent of the quotient can become +99. Now if mantissa is positive, this
results in overflow and if mantissa is negative this result in underflow.

Example: Divide 0.7896E7 by 0.8532E3

Solution: 0.7896E7 / 0.8532E3 = (0.7896/0.8532) E (7 - 3) = 0.925457102E4

We obtain 0.9254E4 after truncating the mantissa of the product to four digits.

Page 9 of 18
Example: Divide 0.9542E-18 by 0.8532E91

Solution: 0.9542E-18/ 0.8532E91 = (0.9542/0.8532) E (-18-91)

= 1.118377872 E-109

The mantissa of the quotient obtained is greater than 1.0, therefore the decimal point is
shifted one position to the left and the exponent is increased by 1. The result obtained is
0.1118377872E-108. Now we obtain 0.1118E-108 after truncating the mantissa of the
product to four digits. Since this number is smaller than the smallest number so it is the
case of underflow and the computer will indicate this error.

Example: Divide 0.9643E21 by 0.7215E-91

Solution: 0.9643E21/ 0.7215E-91 = (0.9643/0.7215) E (21 – (- 91)) = 1.336521137E112

The mantissa of the quotient obtained is greater than 1.0, therefore the decimal point is
shifted one position to the left and the exponent is increased by 1. The result obtained is
0.1336521137E113. Now we obtained 0.1336E113 after truncating the mantissa of the
product of four digits. Since this number is larger than the largest number so it is the
case of overflow and the computer will indicate this error.

ERRORS IN ARITHMETIC

In Integral arithmetic, while all arithmetic operations are exact, we might come across the
following two situations:

1. An operation may result in a large number that is beyond the range of the numbers that
the computer can handle.

2. An integer division may result in truncation of the remainder.

The floating point arithmetic system is prone to the following errors

1. Error due to inexact, representation of a decimal number in a binary form. E.g., consider
the decimal number 0.1. The binary equivalent of this number is 0.0001100110011. The
binary equivalent has a repeating fraction and therefore must be terminated at some
point.

2. Error due to rounding method used by the computer in order to limit the number of
significant digits.

Page 10 of 18
3. Floating point subtraction may induce a special phenomenon. It is possible that some
mantissa positions in the result are unspecified. This happens when two nearly equal
numbers are subtracted. This is known as subtractive cancellation. If the operands
themselves represent approximate values, the loss of significance is serious since it
greatly reduces the number of significant digits.

4. Overflow or underflow can occur in floating point operations when the result is outside
the limits of floating point number system of the computer.

LAWS OF ARITHMETIC

Due to errors introduced in floating point arithmetic the associative and distribution laws
of arithmetic are not always satisfied. That is

(i ) x  ( y  z )  ( x  y )  z
(ii ) x * ( y * z )  ( x * y ) * z
(iii) x * ( y  z )  ( x * y )  ( x * z )

Although failure of these laws to be satisfied affects relatively few computations, it can
be very critical in some occasion.

EXACT AND APPROXIMATE NUMBERS

(i) Exact Numbers: Exact Numbers are those in which there is no uncertainty or
approximation associated with them i.e. exact. E.g. 5, 10, -6, 1/10, 1, 5, etc.

(ii) Approximate Numbers: Approximate Numbers are those in which there is uncertainty
or approximation associated with them. E.g. , 5,  , 1 , etc. These numbers are appears
3
to be exact, but they cannot be expressed as exactly finite numbers of digits. These
numbers can be expressed as 2.7183, 2.236067, 3.1414, 0.3333… respectively. These
numbers are approximation to the true values and they are called approximate numbers.
Hence an approximate number is defined as the number which is approximated to an exact
number.

TYPES AND SOURCES OF ERRORS

An error is defined as the difference between the exact value and the approximate value
obtained from experimental observations.

Suppose X is the true value of a quantity and Xa is the approximate value, then

Page 11 of 18
Error = True Value- Approximate Value

E = X - Xa

In any Numerical Computation results, we may come across the following types and sources
of errors:

Inherent Errors

These are errors already contained in the statement of a problem before obtaining the
solution to the problem. Such errors arise as a result of the given data being approximated
or the limitations of mathematical tables, calculators, digital computer or inaccurate
measurements or observations which may be due to limitations of the measuring device.
E.g. Screw gauge, Vernier caliper, weighing machine, etc. can measure the quantity up to
smallest permissible value.

Rounding Errors

These errors arise due to rounding off of a number during computations. If a number is to
be rounded off to n significant digits, then the following rules are observed.

1. Discard all digits to the right of the nth digit

2. If the (n + 1)th digit is less than 5, nth digit remains unaltered.

3. If the (n + 1)th digit is greater than 5 or it is followed by a non-zero digit, then nth
digit is increased by one

4. If the (n + 1)th digit is 5 and the is followed by digits other than zero, then the
preceding digit is raised by one.

5. If the (n + 1)th digit is 5 or 5 followed by zeros, then the nth digit is left unchanged if
it is even.

6. If the (n + 1)th digit to be dropped is 5 or 5 followed by zeros, then the preceding digit
is raised by one, if it is odd.

Analytic Errors

These are errors introduced due to transforming a physical or mathematical problem into a
x3 x5 x7
computational problem. E.g. sin x  x     ... if we compute sin x by the formula
3! 5! 7!

Page 12 of 18
then it leads to an error. Similarly, the transformation x  x  0 into the equation
x2 x3
(1  x   )  x  0 involves an analytic error.
2! 3!

Truncation or Chopping off Errors

These errors are caused by leaving out the extra digits that are not required in a number
without rounding off. The difference between a numerical value X and its truncated value
XT is called truncation error. The following points must be taken into consideration during
truncation of a numerical value

(i) In truncation, the numerical value of a positive number is decreased and a negative
number is increased.

(ii) If we round off a large number to positive numbers to the same number of decimal
places then the average error due to rounding off is zero.

(iii) In case of truncation of a large number of positive numbers to the same number of
decimal places, the average truncation error is one half of the place value of the last
retained digit.

(iv) If a number is rounded off and the truncated to the same number of decimal places,
then truncation error is greater than the round off error.

(v) Round off error may be positive or negative but truncation error is always positive in
case of positive numbers and negative in case of negative numbers

Note: The maximum error due to truncation of a number cannot exceed the place value of
the last retained digit in the number.

Exercise: Find the truncation error in the result of the following function for x  1 when
5
we use:

x2 x3 x4
(a) first three terms (b) first four terms (c) first five terms   1  x     ...
x

2 3 4

Accumulated Error

In a sequence of computations, the error in one value may affect the computation of the
next value and the error gets added. This is called the accumulated error. The Relative

Page 13 of 18
Accumulated Error is the ratio of the accumulated error to the exact value of that
iteration.

Note: It is observed that the relative accumulated error is the same for all the values.

Modelling Errors

Mathematical models are the basis for numerical solutions. They are formulated to
represent physical processes using certain parameters involved in the situations. A model
is an approximate representation of the real system under consideration. In many
situations, it is impossible to include all the real problem and therefore, certain simplifying
assumptions are made.

Blunders

Blunders are errors that are caused due to human imperfection. Such errors may cause a
very serious disaster in the result. Since these errors are due to human mistakes, it should
be possible to avoid them to a large extent by acquiring a sound knowledge of all the
aspects of the problems as well as the numerical process.

MEASUREMENT OF ACCURACY

In numerical computation, the rounds off errors are difficult to estimate, so its effect on
the final result has to be reduced by some specific rules. However, the truncation errors
can be easily estimated and can be reduced effectively. Thus in any case, we need some
measures of accuracy of the results.

Absolute, Relative and Percentage Errors

If X is the True Value of a quantity and Xa is its Approximate Value, then I X – Xa I is

called the Absolute Error Ea . The Relative Error is defined as:

X  Xa X  Xa
Er  and the Percentage Error is defined as E p  * 100 . If Y be such a
X X
number that I X – Xa I<= Y, then Y is an upper limit on the magnitude of absolute error and
measures the absolute accuracy.

SOME IMPORTANT RULES ABOUT ERRORS

If an approximate number is normalized to n-decimal places and correct to k-decimal

places, then

Page 14 of 18
1. Absolute Error due to truncation is: I X – Xa I < 10n-k

X  Xa
2. Relative Error due to truncation is:  10k 1
X

3. Absolute Error due to rounding off is: I X – Xa I < 0.5x10n-k

X  Xa
4. Relative Error due to rounding off is:  0.5 * 10k 1
X

Exercise:

1. Round off the number 865230 to four significant figures and compute the absolute,
relative and percentage errors.

2. Let X = 0.0045895. Find the relative error if X is truncated to three decimal places

3. The computing value of a problem is 7.896. The absolute error in the computing value is
less than 1%. Find the range within which the true value must lie.

ERROR PROPAGATION

Generally, the result of an experiment is obtained by doing mathematical operations on

several measurements. Obviously, the final error depends not only upon the errors in
individual measurements but also on the nature of mathematical operations. The following
are the results for combination of errors.

1. Error Propagation in Sum or difference Operation:

Let there be two quantities A and B. if A and B are the corresponding absolute errors
in their measurement, then:

Measured value of A  A  A and Measured value of B  B  B

(i ) Let Z denote the Sum of A and B and Z be the corresponding absolute error
Clearly, Z  A  B and Z  Z  ( A  A)  ( B  B)
Z  Z  ( A  B)  (A  B) or  Z  (A  B)
Thus the max imum error in Z is, Z  (A  B)
i.e., the max imum error is the sum of the individual errors.

Page 15 of 18
(ii ) Let Z denote the difference of A and B , therefore, the corresponding Z can be obtain
Clearly Z  A  B and Z  Z  ( A  A)  ( B  B )
Z  Z  ( A  B )  ( A  B ) or  Z  ( A  B )
Thus the max imum is the sum of the individual errors
i.e., the max imum error in the difference is again the sum of the individual errors.

2. Error Propagation in a Product or Division Operation: Let there be two quantities A

and B.

Let A and B be the corresponding absolute errors in their measurements.

(i ) Let Z denote the product of A and B and Z be the corresponding absolute error
Clearly, Z  AB and Z  Z  ( A  A)( B  B ) or Z  Z  AB  BA  AB  AB
Dividing both sides by Z (  AB ), we get
Z AB BA AB A.B A.B
1     {Ignore as it contains the product of two small quantities
Z AB AB AB AB AB
A and B}
Z BA AB
or 1   1 
Z AB AB
Z BA AB Z BA AB
   or  
Z AB AB Z AB AB
Z A B
, and are the relative errors in the measurement of Z , A and B respectively.
Z A B

(ii ) Let Z denote the division of A and B and Z be the corresponding absolute error.
A  A A A B A B
Clearly, Z  A and Z  Z  or Z  Z  (1    )
B B  B B A B A B
Z A B A B
Dividing both sides by Z (  A ), we get 1   1  
B Z A B A B
A.B
{Ignore as it contains the product of two small quantities A and B}
AB
Z A B Z A B
   or  
Z A B Z A B
Z A B
, and are the relative errors in the measurement of Z , A and B respectively
Z A B
Hence when two quantities are multiplied (or divided ), the relative error in the product
(or quotient) is the sum of the relative errors in the quantities to be multiplied or divided.

Page 16 of 18
3. Error Propagation due to the power of a measured Quantity

In general, if Z  ( A p B q / C r ), then let there be quantities A, B, C and let A, B, C be the absolute
error in the measurement. From the above discussion that relative errors are always added when quantities
are multiplied and divided i.e.
Z A A B B C C
[   ...to p  terms ]  [   ...to q  terms ]  [   ...to r  terms ]
Z A A B B C C
Z A B C
p q r
Z A B C

Example: A physical quantity P is related to four observables a, b, c, and d as follows

P = a3b2/(cd)1/2 .The percentage errors of measurement in a, b, c and d are 1%, 3%, 4% and
2% respectively. What is the percentage errors in the quantity P? If the value of P
calculated using the above relation turns out to be 3.763, to what value of P should you
round off the result?

Solution: Given that P = a3b2/(cd)1/2 and applying the formula for the combination of
errors:

Z A B C
p q r
Z A B C

P a b 1 c 1 d
3 2  
P a b 2 c 2 d
P 1 3 1 4 1 2
3 2  
P 100 100 2 100 2 100

P 12
% error in P  * 100  * 100  12%
P 100
12
Now if P  3.763, then P  P *  3.763 * 0.12  0.45156
100

Example: The following observations were made during an experiment to find the value of g
using simple pendulum, l=90.0cm, time (t) for 20 vibrations is 36.0s. Find the percentage
l
error in the measurement of g. Given that time period of pendulum is T  2 . Length
g
is being measured to an accuracy of 0.1cm and time to 0.2s.

Page 17 of 18
l 4 l2
4 2 l ]
Solution: Given that T  2 , g  or g  [
g T2 ( t )2
20

g l t 0.1 0.2
 2  2  0.01222
g l t 90 36.0
Percentage error  0.01222 * 100%  1.222%

References:

1. Numerical Methods (A Programming Approach): By Girish Nayyar

2. Numerical Analysis (Fourth Edition): By G. Shanker Rao

Page 18 of 18

Fraction PPT Class 5
33% (3)
Fraction PPT Class 5
23 pages
Grade 5 PPT - Math - Q1 - Lesson 3
No ratings yet
Grade 5 PPT - Math - Q1 - Lesson 3
18 pages
Year 9 End of Year Non-Calculator Assessment Revision Booklet
0% (1)
Year 9 End of Year Non-Calculator Assessment Revision Booklet
18 pages
NMTC PYQ (Junior + Sub Junior Combined)
No ratings yet
NMTC PYQ (Junior + Sub Junior Combined)
69 pages
COA - Unit 2 Data Representation 1
No ratings yet
COA - Unit 2 Data Representation 1
59 pages
Worksheet 1A
No ratings yet
Worksheet 1A
2 pages
Machine Level Representation of Data Part 3
100% (1)
Machine Level Representation of Data Part 3
32 pages
Cacc
No ratings yet
Cacc
106 pages
#3 - Floating Point
No ratings yet
#3 - Floating Point
38 pages
L4
No ratings yet
L4
29 pages
Coa Unit 2
No ratings yet
Coa Unit 2
35 pages
Floating Point Representation: Reading: B&O 2.4
No ratings yet
Floating Point Representation: Reading: B&O 2.4
44 pages
Floating Point
No ratings yet
Floating Point
26 pages
Lecture Slides Week4
No ratings yet
Lecture Slides Week4
42 pages
ARCh Presentation1
No ratings yet
ARCh Presentation1
12 pages
3 Fixed and Floating Point DSP
No ratings yet
3 Fixed and Floating Point DSP
23 pages
Part 1
No ratings yet
Part 1
33 pages
5.3 Representing Data - The Binary Number System
No ratings yet
5.3 Representing Data - The Binary Number System
22 pages
Chapter 05
No ratings yet
Chapter 05
29 pages
3-EED220 Lecture 3
No ratings yet
3-EED220 Lecture 3
22 pages
Lecture11 Slides 1
No ratings yet
Lecture11 Slides 1
52 pages
Floating Point Numbers: CS031 September 12, 2011
No ratings yet
Floating Point Numbers: CS031 September 12, 2011
22 pages
Number Systems - Data Representation (Numbers)
No ratings yet
Number Systems - Data Representation (Numbers)
27 pages
Unit-1 COA
No ratings yet
Unit-1 COA
26 pages
Data Representation
No ratings yet
Data Representation
28 pages
Floating Points
No ratings yet
Floating Points
31 pages
9 Floating Point Numbers
No ratings yet
9 Floating Point Numbers
21 pages
Floating Point
No ratings yet
Floating Point
26 pages
CSC340 - HW3
No ratings yet
CSC340 - HW3
28 pages
Module 2 Roundoff and Truncation Errors
No ratings yet
Module 2 Roundoff and Truncation Errors
11 pages
HW 4 Sol
No ratings yet
HW 4 Sol
10 pages
Floating Point Representation
No ratings yet
Floating Point Representation
18 pages
FIXED and FLOAT
No ratings yet
FIXED and FLOAT
8 pages
Introduction To Numerical Computing: Statistics 580 Number Systems
No ratings yet
Introduction To Numerical Computing: Statistics 580 Number Systems
35 pages
COMPX203 Computer Systems: Number Representation
No ratings yet
COMPX203 Computer Systems: Number Representation
33 pages
Ass 1
No ratings yet
Ass 1
8 pages
Addition in Binary and Hexadecimal: 0 + 0 0 0 + 1 1 1 + 0 1 1 + 1 0 Carry 1
No ratings yet
Addition in Binary and Hexadecimal: 0 + 0 0 0 + 1 1 1 + 0 1 1 + 1 0 Carry 1
15 pages
IEEE Standard 754
No ratings yet
IEEE Standard 754
10 pages
Computer Organization
No ratings yet
Computer Organization
22 pages
IEEE Standard 754 Floating Point Numbers
No ratings yet
IEEE Standard 754 Floating Point Numbers
7 pages
Chap 02
No ratings yet
Chap 02
16 pages
Module2.1 of Nothing
No ratings yet
Module2.1 of Nothing
7 pages
CA Notes 01
No ratings yet
CA Notes 01
14 pages
Chapter 3.4 - Data Representation, Structure and Manipulation (Cambridge AL 9691)
No ratings yet
Chapter 3.4 - Data Representation, Structure and Manipulation (Cambridge AL 9691)
19 pages
Floating Point Numbers
No ratings yet
Floating Point Numbers
8 pages
Computer Architecture and Organization Unit1 &,2.
No ratings yet
Computer Architecture and Organization Unit1 &,2.
23 pages
3.1 Data Representation: 3.1.3 Real Numebrs and Normalized Floating-Point Representation
No ratings yet
3.1 Data Representation: 3.1.3 Real Numebrs and Normalized Floating-Point Representation
14 pages
Computer Arithmetic Representations
No ratings yet
Computer Arithmetic Representations
24 pages
Lecture 4 - Computer Arithmetic
No ratings yet
Lecture 4 - Computer Arithmetic
18 pages
Week 5: IEEE Floating Point Revision Guide For Phase Test
No ratings yet
Week 5: IEEE Floating Point Revision Guide For Phase Test
23 pages
EC-502 - Aritra Dutta
No ratings yet
EC-502 - Aritra Dutta
6 pages
Number Representation
No ratings yet
Number Representation
5 pages
Ieee Tex
No ratings yet
Ieee Tex
4 pages
Floating-Point Numbers and Operations Representation
No ratings yet
Floating-Point Numbers and Operations Representation
8 pages
Number Representation
No ratings yet
Number Representation
7 pages
What Are Floating Point Numbers?
No ratings yet
What Are Floating Point Numbers?
7 pages
HW 2
No ratings yet
HW 2
4 pages
Data Representation Workbook
No ratings yet
Data Representation Workbook
8 pages
1 5 Floating Point Representation
No ratings yet
1 5 Floating Point Representation
9 pages
Computer Organisation
No ratings yet
Computer Organisation
4 pages
Fixed Versus Floating Point
No ratings yet
Fixed Versus Floating Point
5 pages
4.16. Floating Point
No ratings yet
4.16. Floating Point
5 pages
Binary PDF
No ratings yet
Binary PDF
8 pages
2 CS1FC16 Information Representation
No ratings yet
2 CS1FC16 Information Representation
4 pages
Books For DELTA
No ratings yet
Books For DELTA
3 pages
DLD Number System and Conversion
No ratings yet
DLD Number System and Conversion
18 pages
Binomial Theorem PDF
No ratings yet
Binomial Theorem PDF
12 pages
Number System
No ratings yet
Number System
7 pages
Analytical Ability Part 4
No ratings yet
Analytical Ability Part 4
65 pages
Class 5
No ratings yet
Class 5
15 pages
Real Numbers and Approximation: Mathematics
No ratings yet
Real Numbers and Approximation: Mathematics
8 pages
Real Numbers Class 10 - Integrated PYQs - 1
No ratings yet
Real Numbers Class 10 - Integrated PYQs - 1
28 pages
High School Notes
No ratings yet
High School Notes
135 pages
V 19 I 5 May 2019 Part 2
No ratings yet
V 19 I 5 May 2019 Part 2
195 pages
Arithmetic em New
No ratings yet
Arithmetic em New
201 pages
Writing Assessment Criteria Updated 2018
No ratings yet
Writing Assessment Criteria Updated 2018
2 pages
Whole Numbers and Decimals
No ratings yet
Whole Numbers and Decimals
40 pages
WA-44-28-02 - 2022 - Loh Siew Hong V Nazirah Nathakumari Abdullah
No ratings yet
WA-44-28-02 - 2022 - Loh Siew Hong V Nazirah Nathakumari Abdullah
21 pages
Mic Project Syco
No ratings yet
Mic Project Syco
11 pages
Unit 1. Integer Numbers. Activities 2º ESO
No ratings yet
Unit 1. Integer Numbers. Activities 2º ESO
4 pages
Numbers Exercises
No ratings yet
Numbers Exercises
75 pages
02 Lesson2
No ratings yet
02 Lesson2
6 pages
Telematicaa
No ratings yet
Telematicaa
162 pages
Chapter 02 Part1 Itce101
No ratings yet
Chapter 02 Part1 Itce101
21 pages
Code CMN by Amine
No ratings yet
Code CMN by Amine
6 pages
Adding Binary Numbers
No ratings yet
Adding Binary Numbers
3 pages
Converting Between Improper Fractions and Mixed Numbers
No ratings yet
Converting Between Improper Fractions and Mixed Numbers
2 pages
Maths MC by Topic 1972, 1977-2017 With HKCEE - HKDSE Sample Papers
No ratings yet
Maths MC by Topic 1972, 1977-2017 With HKCEE - HKDSE Sample Papers
2 pages
Program 5: WAP To Add Two 1 Byte BCD Numbers
No ratings yet
Program 5: WAP To Add Two 1 Byte BCD Numbers
3 pages
Master Fracions Addition, Subtraction And Multiplication: Math Childern Book
From Everand
Master Fracions Addition, Subtraction And Multiplication: Math Childern Book
Mourad Boufadene
No ratings yet

MTH 214 Accuracy in Numerical Calculations and Error Analysis

Uploaded by

MTH 214 Accuracy in Numerical Calculations and Error Analysis

Uploaded by

Accuracy in Numerical Calculations and Error Analysis

REPRESENTATION OF NUMBERS IN COMPUTER MEMORY

BYTE: The string of eight bits is called a BYTE

Example: Represent -13 in binary form

1’s compliment = 10010

2’s compliment = 10011

Thus -13 = 10011

-215 to 215 -1 (i.e. -32768 to 32767).

-32768 = (-32767) + (-1)

32767 = 0111 1111 1111 1111

1’s complement = 1000 0000 0000 0000

= +0000 0000 0000 0001

-32767 = 1000 0000 0000 0001 (x)

1 = 0000 0000 0000 0001

1’s complement = 1111 1111 1111 1110

= +0000 0000 0000 0001

-1 = 1111 1111 1111 1111 (y)

-32768 = 1000 0000 0000 0000 (x) + (y)

FLOATING POINT REPRESENTATION

Consider the decimal number 345.876, which may be written as:

(i) 345.876 (ii) 0.345876 x 103

In the floating point representation, the number is written as a fraction multiplied by a

For positive numbers: 0.1<= mantissa <= 1.0

For negative numbers: -1.0<= mantissa <= -0.1

e.g. 0.003456E-3 is expressed as 0.3456E-5 in the normalized floating point

Bias Exponent= 8 Sign bit Mantissa = 23 bits

*The binary equivalent of the decimal number 12.6875 is 1100.1011

*The normalized floating point number of 1100.1011 is 0.11001011 x 24.

* The given number is positive. The sign bit is 0.

*The binary equivalent of the decimal number 12.6875 is 1100.1011

*The normalized floating point number of 1100.1011 is 0.11001011 x 24.

*The given number is negative. Thus the sign bit is 1.

Example: Add 0.8906E7 to 0.8761E5

Example: Add 0.9896E99 to 0.9278E98.

Example: Subtract 0.6434E3 from 0.4217E5

Example: Subtract 0.7673E7 from 0.7678E7.

Example: Subtract 0.8678E-99 from 0.8691E-99

Example: Multiply 0.7891E8 by 0.7632E4.

Solution: 0.7891E8 x 0.7632E4 = (0.7891 x 0.7632) E (8+4) = 0.60224112E12

Solution: 0.1191E8 x 0.1232E-4 = (0.1191 x 0.1232) E (8 - 4) =0.01467312E4

Example: Multiply 0.3467E89 by 0.6789E45

Solution: 0.3467E89 x 0.6789E45 = (0.3467 x 0.6789) E (89 + 45) = 0.23537463E134

Example: Multiply 0.8965E-57 by 0.3218E-55

Solution: 0.8965E-57 x 0.3218E-55 = (0.8965 x 0.3218) E (-57-55) = 0.2884937 E (-112)

Example: Divide 0.7896E7 by 0.8532E3

Solution: 0.7896E7 / 0.8532E3 = (0.7896/0.8532) E (7 - 3) = 0.925457102E4

Solution: 0.9542E-18/ 0.8532E91 = (0.9542/0.8532) E (-18-91)

Example: Divide 0.9643E21 by 0.7215E-91

Solution: 0.9643E21/ 0.7215E-91 = (0.9643/0.7215) E (21 – (- 91)) = 1.336521137E112

2. An integer division may result in truncation of the remainder.

The floating point arithmetic system is prone to the following errors

EXACT AND APPROXIMATE NUMBERS

TYPES AND SOURCES OF ERRORS

1. Discard all digits to the right of the nth digit

2. If the (n + 1)th digit is less than 5, nth digit remains unaltered.

Truncation or Chopping off Errors

Absolute, Relative and Percentage Errors

If X is the True Value of a quantity and Xa is its Approximate Value, then I X – Xa I is

SOME IMPORTANT RULES ABOUT ERRORS

If an approximate number is normalized to n-decimal places and correct to k-decimal

3. Absolute Error due to rounding off is: I X – Xa I < 0.5x10n-k

Generally, the result of an experiment is obtained by doing mathematical operations on

1. Error Propagation in Sum or difference Operation:

Measured value of A  A  A and Measured value of B  B  B

2. Error Propagation in a Product or Division Operation: Let there be two quantities A

Let A and B be the corresponding absolute errors in their measurements.

Example: A physical quantity P is related to four observables a, b, c, and d as follows

1. Numerical Methods (A Programming Approach): By Girish Nayyar

2. Numerical Analysis (Fourth Edition): By G. Shanker Rao

You might also like