Numerical Methods For Engineers ch1
Numerical Methods For Engineers ch1
1
PYTHON NUMPY
4
NumPy
• Throughout this course Python will be utilized as a
For NumPy examples you can follow CH4
programming language. Sample codes will be given in of,
python (Another alternative is the MATLAB).
“Python for Data Analysis’’ by William
• Python is an object based programming language. It is Wesley McKinney (O’Reilly) 2012
similar to the pseudocode description of the process,
highly plain. For computational aspects you can follow
CH1 of:
• NumPy is a special library written for numerical
Svein Linge, Hans Petter Langtangen,
calculations of array based operations.
‘’Programming for Computations – Python
A Gentle Introduction to Numerical
• SciPy is another library for scientific computing which Simulations with Python 3.6’’, (2020 )
focuses on algorithms. Second Edition, Springer-Open
5
For a fast start to python
• To run the supplied codes you will need a Python installation, you can
download and install Spyder v>5.0 or Anaconda development
environment.
6
Python code structure
• Consider a mathematical model for the second law
Code
of Newton:
1 2
𝑦 = 𝑣0 𝑡 − 𝑔𝑡
2
7
Python code structure
Code
• Consider a mathematical
model for the second law of
Newton:
1 2
𝑦 = 𝑣0 𝑡 − 𝑔𝑡
2
8
Python code structure
Code library
library function
9
Python code structure
Code
10
Python NumPy
• NumPy, short for Numerical Python, is the fundamental
package required for high performance scientific computing
and data analysis.
11
Python NumPy:
The NumPy ndarray, Multidimensional Array Object
Code
12
Python NumPy:
The NumPy ndarray, Multidimensional Array Object
Code
1 2 3 4
𝑎𝑟𝑟2 =
5 6 7 8 2×4
dim 𝑎𝑟𝑟2 = 2
13
Python NumPy:
The NumPy ndarray, Multidimensional Array Object
Code
𝑎𝑟𝑟2𝑇 × 𝑎𝑟𝑟2
𝑎𝑟𝑟2 ⋅ 𝑎𝑟𝑟2
14
Python NumPy:
The NumPy ndarray, Multidimensional Array Object
15
Python NumPy:
The NumPy ndarray, Multidimensional Array Object
16
Python NumPy:
The NumPy ndarray, Multidimensional Array Object
Initial Placeholders:
17
Python NumPy:
The NumPy ndarray, Multidimensional Array Object
Initial Placeholders:
18
Python NumPy:
The NumPy ndarray, Multidimensional Array Object
19
Python NumPy:
The NumPy ndarray, Multidimensional Array Object
𝑯𝑝 = 𝑩 × 𝑯 × 𝑩−𝟏
𝑯−𝟏
𝑝 × 𝑯𝑝 → 𝑰
20
Python NumPy:
The NumPy ndarray, Multidimensional Array Object
Indexing of a 2D NumPy array:
21
Python NumPy:
The NumPy ndarray, Multidimensional Array Object
Indexing and Slicing Operations:
22
Python NumPy:
The NumPy ndarray, Multidimensional Array Object
Indexing and Slicing Operations:
select
3rd row
23
Python NumPy:
The NumPy ndarray, Multidimensional Array Object
Indexing and Slicing Operations:
Transposition
24
EVALUATING A POLYNOMIAL
25
Evaluating a polynomial
• Polynomials are the basic representation forms to describe any function
• A real polynomial is a real valued function with real coefficients that let
certain algebraic operations: addition, multiplication and exponentiation
to non-negative integer exponent.
𝑃 𝑥 = 2𝑥 4 + 3𝑥 2 + 𝑥 − 1
26
Evaluating a polynomial
• How do you calculate 𝑃 0.1 ?
𝑃 𝑥 = 2𝑥 4 + 3𝑥 2 + 𝑥 − 1
4349
𝑃 0.1 = 2 ⋅ 0.1 ⋅ 0.1 ⋅ 0.1 ⋅ 0.1 + 3 ⋅ 0.1 ⋅ 0.1 + 0.1 − 1 = −
5000
27
Evaluating a polynomial
• First calculate the powers of 0.1
28
Evaluating a polynomial
• Nested form (Horner’s)
𝑃 𝑥 = 2𝑥 4 + 3𝑥 2 + 𝑥 − 1
𝑃 𝑥 = 2𝑥 3 + 3𝑥 + 1 𝑥 − 1
𝑃 𝑥 = (2𝑥 2 + 3)𝑥 + 1 𝑥 − 1
𝑃 𝑥 = 2 + 0 𝑥 + 0 𝑥 + 3)𝑥 + 1 𝑥 − 1
𝑃 𝑥 = 𝑐0 + 𝑐1 𝑥 + 𝑐2 𝑥 2 + ⋯ + 𝑐𝑛 𝑥 𝑛
can be represented as
29
Evaluating a polynomial
• Then a polynomial in the form
𝑃 𝑥 = 𝑐0 + 𝑐1 𝑥 + 𝑐2 𝑥 2 + ⋯ + 𝑐𝑛 𝑥 𝑛
can be represented as
𝑃 𝑥 = 𝑐0 + 𝑐1 𝑥 + 𝑐2 𝑥 2 + ⋯ + 𝑐𝑑 𝑥 𝑑
30
Evaluating a polynomial
A general degree 𝑑 (nested) polynomial can be
evaluated in 𝑑 multiplications and 𝑑 additions.
𝑃 𝑥 = 𝑐0 + 𝑐1 𝑥 + 𝑐2 𝑥 2 + ⋯ + 𝑐𝑑 𝑥 𝑑
do for j ← (n,0,-1)
𝑝 ← 𝑝 × 𝑥 + 𝑐𝑗
end do
31
Evaluating a polynomial
• Polynomials are constructed by choosing the coefficients from (preferably) a real set 𝑐𝑘 ∈ ℜ and
variable representations with integer exponents from a basis set, (preferably) the monomial basis.
𝑐2
𝑐0 𝑐𝑘−1 1, 𝑥, 𝑥 2 , … , 𝑥 𝑛
𝑐1 𝑐𝑘
do for j ← (n,0,-1)
𝑝 ← 𝑝 × 𝑥 + 𝑐𝑗
end do
𝑁
𝑃 𝑥 = 𝑐𝑘 𝑥 𝑘
𝑘=0
𝑘
• If all 𝑐𝑘 ’s are obtained by algebraic operations (×,÷, +, −, , 𝑥 𝑘 ), then 𝑃(𝑥) is an algebraic
polynomial. Some well known real numbers can not be obtained by algebraic operations.
32
Evaluating a polynomial
𝑘
• If all 𝑐𝑘 ’s are obtained by algebraic operations (×,÷, +, −, , 𝑥 𝑘 ), then 𝑃(𝑥) is an algebraic
polynomial. Some well known real numbers can not be obtained by algebraic operations.
• 𝜋 is an irrational number that can not be obtained by these operations. This number is defined by not
algebraic, but geometrical relationships.
• 𝑒 is an irrational number that requires an infinite sum to be calculated. These are called
transcendental constants.
• 2 = 1.414213562 … is an irrational number that has infinite decimals beyond the point. However,
irrational numbers can be obtained by algebraic operations. On the other hand, they can not be
represented by any rational numbers (they are not division of two integers that are in irreducible
form).
33
Evaluating a polynomial
• Think about the computation of the polynomial below, which stages should be completed in order to
store the result on the memory:
Step Operations
1 Store c, all elements are int
2 Store x and p (initial values)
3 Start Loop over j,
4 Update p
34
Evaluating a polynomial
• Think about the computation of the polynomial below, which stages should be completed in order to
store the result on the memory:
232 − 1 = 4294967295
231 − 1 = ±2147483647 (with sign bit)
35
Evaluating a polynomial
Example
What are the smallest and largest numbers that can be stored on a 𝑘 = 5 bit
integer definition?
0 1 1 1 1
+(1 × 23 + 1 × 22 + 1 × 21 + 1 × 20 )
± 23 22 21 20
1 1 1 1 1
2𝑘−1 − 1 = ±15
−(1 × 23 + 1 × 22 + 1 × 21 + 1 × 20 )
36
Evaluating a polynomial
• Result:
• We can store integers with arbitrary precision by increasing the memory field assigned to single
integer.
Step Operations
1 Store c, all elements are int
2 Store x and p (initial values) This polynomial can be stored in memory,
without any loss of precision (numerically
3 Start Loop over j, exact representation) for integer x.
4 Update p
We will not loose any significant bits.
37
Evaluating a polynomial
• Think about
𝑃 𝑥 = 1 + 5𝑥 + 0.2𝑥 2 + 𝜋𝑥 3
38
Evaluating a polynomial
• Think about
𝑃 𝑥 = 1 + 5𝑥 + 0.2𝑥 2 + 𝜋𝑥 3
0 0 0 0 1 ⋅ 0 0 0 1 0
1
• since 𝑐2 = 5 = 0.2
39
Evaluating a polynomial
𝜋 = 3.14159 26535 89793 23846 26433 83279 50288 41971 69399 37510 58209 74944
59230 78164 06286 20899 86280 34825 34211 70679 …
Real line 𝑥
3 𝜋
40
Evaluating a polynomial
• Storing 𝜋 and √5 requires much more attention.
real count
• Real numbers fill the space in an uncountable manner.
3→ 𝑞0
3.1 → 𝑞1
3.1415 → 𝑞4
3 𝜋
⋮
3.1
⋮
• There is no direct way to store irrational numbers on memory. 3.1415 … 2643 … → 𝑞∞
41
FLOATING POINT REPRESENTATION OF REAL NUMBERS
42
Floating point representation of real numbers
• Scientific notation and its binary version
Think about
43
Floating point representation of real numbers
• Scientific notation and its binary version
• Thus we use the power of 10 s in the expansion, represented by ⋅ 10 . Then the expansion
below is called the positional notation,
71 10 = 7 × 101 + 1 × 100
• To sum up all these numbers we can shift the decimal point to correct position, we mean:
44
Floating point representation of real numbers
• Base 60 of Babylonians
The notation rules are similar for other base systems (i.e. Babylonians 2000BC)
2
1
24 51 10
2= 1+ + 2 + 3 = 1.41421296
60 60 60
= 1. 24 51 10 60
Source:
https://en.wikipedia.org/wiki/Babylonian_cuneiform_numerals#/media/File:Babylonian_numerals.svg 45
Floating point representation of real numbers
• Base 60 of Babylonians
The notation rules are similar for other base systems (i.e. Babylonians 2000BC)
24 51 10
2= 1+ + + = 1.41421296
60 602 603
= 1. 24 51 10 60
Source:
https://en.wikipedia.org/wiki/Babylonian_cuneiform_numerals#/media/File:Babylonian_numerals.svg 46
Floating point representation of real numbers
• positional notation and its binary version
For computer systems binary representations are crucial. This is due to the electronic design of
arithmetic units where algebraic operations are easy to implement. Then a binary floating point
in positional notation is:
47
Floating point representation of real numbers
• Binary representation of floating point numbers
48
Floating point representation of real numbers
• Binary representation of floating point numbers
• Some fractional decimals do not have an ending binary representation. In these cases the
fractional part has repeating decimals (non-terminating).
fractional part (0.1), multiply by 2 fractional part (0.1), multiply by 2
num fraction Int num fraction Int
.1 × 2 0.2 0 .4 × 2 0.8 0
.2 × 𝟐 0.4 0 .8 × 2 0.6 1
.4 × 2 0.8 0 .6 × 2 0.2 1
.8 × 2 0.6 1 ⋮ ⋮ ⋮
.6 × 2 0.2 1
.2 × 𝟐 0.4 0 0.1 10 = 0.00011 2
49
Floating point representation of real numbers
• Binary representation of floating point numbers
Example
Find the first 4 bits in binary representation of 𝜋 = 3.14159265 …
Then 𝜋 ≈ 11.00 2
50
Floating point representation of real numbers
• Binary representation of floating point numbers
Example
51
Floating point representation of real numbers
• Binary representation of floating point numbers
Example
3
Find the binary representation of 7 10
. Use algebraic operations on this rational number.
52
Floating point representation of real numbers
• Binary representation of floating point numbers
Example
53
IEEE-754
54
Floating point representation in IEEE-754
• Before 80s, every company/facility was working with their own FP descriptions which might
carry out portability problems between systems.
• At the end of 70s, a commission held by IEEE consortium defined and determined unique
procedures to handle FP’s in computer systems. This process has been largely motivated by
the electronics industry.
• The model we will describe here is the so-called IEEE-754 Floating Point Standart (Institute
of Electrical and Electronics Engineers).
55
Floating point representation in IEEE-754
• In exponential notation a real number 𝑥 is expressed as;
𝑥 = ±𝑆 × 10𝐸
𝑥 = ±𝑆 × 2𝐸
𝐸 is an integer that satisfies 1 ≤ 𝑆 < 2 interval. Then the first bit of 𝑥 will always be 1, since
1 ≤ 𝑆. The exponent 𝐸 should be varied to satisfy this constraint.
56
Floating point representation in IEEE-754
The first bit of 𝑥 will always be 1, since 1 ≤ 𝑆. The exponent 𝐸 should be varied to satisfy this
constraint.
𝑥 = ±𝑆 × 2𝐸
Example
Represent 0.01011 2 in base10,
57
Floating point representation in IEEE-754
Example
Normalize 𝑥 = 0.00101 2
𝑥 = 0.00101 2
58
Floating point representation in IEEE-754
• In a computer, to store normalized numbers in memory, the word dedicated to a real
number is evaluated in three different sections. The 32-bit floating point description is as
follows:
59
Floating point representation in IEEE-754
• In a computer, to store normalized numbers in memory, the word dedicated to a real
number is evaluated in three different sections.
60
Floating point representation in IEEE-754
Example
Find the normalized binary expression of 𝑥 = 71 10
𝑥 = 1.000111 2 × 26
61
Floating point representation in IEEE-754
• A Toy System:
62
Floating point representation in IEEE-754
• A Toy System:
Example
If the binary allocation on the memory of 𝑥 is
0 1 1 0 1 0 0 1
63
Floating point representation in IEEE-754
• A Toy System:
Example
Find the max value of 𝑥 in base10 that can be represented by our Toy-8bit FP system?
0 0 1 1 1 1 1 1
111 2
𝑥 10 = 1.111 2 × 2 = 1.111 2 × 27 = 1111 2 × 24
= 15 × 24 = 240
64
Floating point representation in IEEE-754
• A Toy System:
Example
Find the min value of 𝑥 > 0 in base10 that can be represented by the Toy-8bit FP system?
0 1 1 1 1 0 0 0
No positive real numbers smaller than 𝑥 < 2−7 can be represented by Toy (8bit) system
65
Floating point representation in IEEE-754
• We have seen that the exponential descriptions of a FP can be achieved by finding
normalized expressions for S and E. These information can bring out FP numbers, however
with errors.
biased definition of 0
0 0.008 1.0
0.004
66
Floating point representation in IEEE-754
rounding
0 0.008
𝜀
0.004
67
Floating point representation in IEEE-754
• Although the number of real numbers between real 𝑝, 𝑞 are not finite, binary expression can
point for a finite number of reals.
• Any number beyond the binary expression capacity should be rounded to a near value.
68
APPROXIMATIONS AND ROUND -OFF ERRORS
69
Floating point representation in IEEE-754
• To remedy quantizing errors:
70
Floating point representation in IEEE-754
• To remedy quantizing errors for binary numbers:
71
Floating point representation in IEEE-754
• To remedy quantizing errors for binary numbers:
𝑥 = 0 0001 0011110010010100
1 1 1 1 1 1 1
𝑆 = 3 + 4 + 5 + 6 + 9 + 12 + 14 = 0.23663330078125
2 2 2 2 2 2 2
1
𝑆𝑟𝑜𝑢𝑛𝑑 = 2 = 0.25
2
1
𝑆𝑐ℎ𝑜𝑝 = 3 = 0.125
2
72
Floating point representation in IEEE-754
• Machine Epsilon (𝝐) is a measure of the precision of a FP number system
• 𝜖 is the distance between 1 and the next larger floating point number.
1
𝜖= 3
2
1.0 1+𝜖
73
Floating point representation in IEEE-754
• What is the measure of the precision of a 32-bit FP number including 1 sign bit, 8 exponent-
bit, 23 bit mantissa?
• Since the difference between mantissa will be equal to 𝑚 = 23, then epsilon will be equal
to the contribution of the last mantissa digit, 𝜖 = 2−23 ≈ 10−7 .
𝜖 = 2−𝑚
1.0 1+𝜖
74
Floating point representation in IEEE-754
• 64bit Floating Point Arithmetic in Computer
Example
In computer arithmetic, the real number 𝑥 is replaced by 𝑓𝑙𝑟𝑜𝑢𝑛𝑑 𝑥 in memory. Then what is 𝑓𝑙𝑟𝑜𝑢𝑛𝑑 2.4
? As discussed above, 𝑓𝑙𝑟𝑜𝑢𝑛𝑑 2.4 has integer and fractional parts,
2 = 10.0
0.4 = 0.01100
Rounding bit
2.4 = 10.01100 → 1.001100 × 21
+1. 𝟎𝟎 𝟏𝟏𝟎𝟎 𝟏𝟏𝟎𝟎 𝟏𝟏𝟎𝟎 𝟏𝟏𝟎𝟎 𝟏𝟏𝟎𝟎 𝟏𝟏𝟎𝟎 𝟏𝟏𝟎𝟎 𝟏𝟏𝟎𝟎 𝟏𝟏𝟎𝟎 𝟏𝟏𝟎𝟎 𝟏𝟏𝟎𝟎 𝟏𝟏𝟎𝟎 𝟏𝟏|00 1100
Rounding bit is zero, nothing will be added to the 52nd position. The error is 0011 × 2−53 × 2.
75
64bit Floating Point Rounding To The Nearest Rule for IEEE-754:
• 64bit Floating Point Rounding To The Nearest Rule for IEEE-754:
For double precision, if the 53rd bit to the right of the binary point is 0, round down (apply
chopping after 52nd bit).
52nd bit
53rd bit
1.00 … . . 011|0111 …
If 53rd bit is one, then add 1 to the 52nd bit, if and only is 52nd bit is one.
76
64bit Floating Point Rounding To The Nearest Rule for IEEE-754:
• 64bit Floating Point Arithmetic in Computer IEEE-754
Example
In computer arithmetic, the real number 𝑥 is replaced by 𝑓𝑙𝑟𝑜𝑢𝑛𝑑 𝑥 in memory. Then what is 𝑓𝑙𝑟𝑜𝑢𝑛𝑑 2.4
? As discussed above, 𝑓𝑙𝑟𝑜𝑢𝑛𝑑 2.4 has integer and fractional parts,
2 = 10.0
0.4 = 0.01100
Rounding bit
2.4 = 10.01100 → 1.001100 × 21
+1. 𝟎𝟎 𝟏𝟏𝟎𝟎 𝟏𝟏𝟎𝟎 𝟏𝟏𝟎𝟎 𝟏𝟏𝟎𝟎 𝟏𝟏𝟎𝟎 𝟏𝟏𝟎𝟎 𝟏𝟏𝟎𝟎 𝟏𝟏𝟎𝟎 𝟏𝟏𝟎𝟎 𝟏𝟏𝟎𝟎 𝟏𝟏𝟎𝟎 𝟏𝟏𝟎𝟎 𝟏𝟏|00 1100
Rounding bit is zero, nothing will be added to the 52nd position. The error is 0011 × 2−53 × 2.
77
64bit Floating Point Rounding To The Nearest Rule for IEEE-754:
• 64bit Floating Point Arithmetic in Computer IEEE-754
Example
For which positive integer 𝑘, the number 5 + 2−𝑘 can be represented exactly in 64bits (with no rounding
error).
5 = 101.0 = 1.01 × 22 (normalized)
5 + 2−𝑘 = 1. 01 … 0000 1 × 22 =
𝑘+1
78
64bit Floating Point Rounding To The Nearest Rule for IEEE-754:
• 64bit Floating Point Arithmetic in Computer IEEE-754
Example
For which positive integer 𝑘,
the number 5 + 2−𝑘 can be
represented exactly in 64bits
(with no rounding error).
(𝑘 + 1) + 1 = 52 → 𝑘 = 50
79
64bit Floating Point Rounding To The Nearest Rule for IEEE-754:
• 64bit Floating Point Arithmetic in Computer IEEE-754
Example
Find the largest integer 𝑘 for which 𝑓𝑙𝑟𝑜𝑢𝑛𝑑 19 + 2−𝑘 > 𝑓𝑙𝑟𝑜𝑢𝑛𝑑 (19) in double precision format.
80
64bit Floating Point Rounding To The Nearest Rule for IEEE-754:
• 64bit Floating Point Arithmetic in Computer IEEE-754
Example
Find the largest integer 𝑘 for which 𝑓𝑙𝑟𝑜𝑢𝑛𝑑 19 + 2−𝑘 > 𝑓𝑙𝑟𝑜𝑢𝑛𝑑 (19) in double precision format.
81
64bit Floating Point Rounding To The Nearest Rule for IEEE-754:
Problem
Find the integer 𝑘 which maximizes the value of 𝑓𝑙𝑐ℎ𝑜𝑝 2𝑘 + 2−𝑘 for a 𝑞-bit mantissa in normalized
format.
𝑥 = 1. 𝑏−1 𝑏−2 𝑏−3 … 𝑏−𝑞 2 × 2𝐸
2−𝑘 2𝑘
82
MEASURING ERRORS
83
Floating point Errors in IEEE-754
• Let 𝑥ො be a computed quantity of 𝑥. Then,
𝑥ො − 𝑥
𝐓𝐫𝐮𝐞 𝐑𝐞𝐥𝐚𝐭𝐢𝐯𝐞 𝐄𝐫𝐫𝐨𝐫 𝜀𝑇𝑟𝑒𝑙 =
𝑥
if the quantity exist. If the exact quantity 𝑥 is not known, then, as an approximation in iterative
algorithms, the current estimation can be used for the ratio.
𝑥ො𝑐𝑢𝑟𝑟𝑒𝑛𝑡 − 𝑥ො𝑝𝑟𝑒𝑣
𝐀𝐩𝐩𝐫𝐨𝐱𝐢𝐦𝐚𝐭𝐞 𝐑𝐞𝐥𝐚𝐭𝐢𝐯𝐞 𝐄𝐫𝐫𝐨𝐫 𝜀𝑟𝑒𝑙
Ƹ =
𝑥ො𝑐𝑢𝑟𝑟𝑒𝑛𝑡
84
Floating point Errors in IEEE-754
• Let 𝑓𝑙(𝑥) be the IEEE machine arithmetic model of 𝑥. Then the relative rounding error of 𝑥
is no more than one-half machine epsilon:
𝑓𝑙𝑟𝑜𝑢𝑛𝑑 (𝑥) − 𝑥 1
𝐑𝐞𝐥𝐚𝐭𝐢𝐯𝐞 𝐑𝐨𝐮𝐧𝐝𝐢𝐧𝐠 𝐄𝐫𝐫𝐨𝐫 = ≤ 𝜖𝑚𝑎𝑐ℎ
𝑥 2
𝑓𝑙𝑐ℎ𝑜𝑝 (𝑥) − 𝑥
𝐑𝐞𝐥𝐚𝐭𝐢𝐯𝐞 𝐂𝐡𝐨𝐩𝐩𝐢𝐧𝐠 𝐄𝐫𝐫𝐨𝐫 = ≤ 𝜖𝑚𝑎𝑐ℎ
𝑥
85
Floating point Errors in IEEE-754
• Let 𝑓𝑙(𝑥) be the IEEE machine arithmetic model 𝑥.
Example
Calculate 𝑓𝑙 𝑥 − 𝑥 for 𝑥 = 10/3, check the relative rounding error for 𝑥.
____________________________________________________________________________________
86
Floating point Errors in IEEE-754
Example
Let 𝑓𝑙(𝑥) be the (round to nearest) IEEE machine arithmetic model of 𝑥. Calculate 𝑓𝑙 𝑥 − 𝑥 for 𝑥 = 10/3, check
the relative rounding error for 𝑥 in a 64-bit representation.
____________________________________________________________________________________
𝑥 = 11. 01 = 1.101 × 21 (normalized)
𝑆 = 1. 1 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 𝟎 |1 01 …
52 𝑏𝑖𝑡𝑠 𝑚𝑎𝑛𝑡𝑖𝑠𝑠𝑎
87