intro-to-maths-computation
intro-to-maths-computation
Bsc. mathematics and computer science (Jomo Kenyatta University of Agriculture and
Technology)
Duncan K. Gathungu
November 2, 2023
Content
Introduction to Mathematical Computing
Contact Hours: 45 hours
Pre-Requisites: None
Purpose of the course:
This module focuses on showing how to use scientific soft wares to perform symbolic
and numeric computations, visualization, experimentation and much more. The mod-
ule introduces scientific computation with MATLAB/R.
Expected Learning outcomes:
Students will learn through the application of concepts and techniques covered in the
module to real data sets and models. Students will be encouraged to examine issues of
substantive interest in these studies. Successful students will be able to:
3. Work with linear models in MATLAB/R, the MATLAB/R syntax for writing
functions, iterations and conditions
Course Description
Interactive use of MATLAB/R. Basic data types. Writing scripts. Graphical facili-
ties. Writing your own functions. String processing. File input/output. Vectorization.
Numeric issues, Debugging. Introduction to Monte-Carlo methods. Reproducible re-
search. Interfacing to databases. Advanced aspects. In summary, the following topics
will be covered: Manipulation and management of data in the MATLAB/R environ-
ment, summarizing data numerically and graphically, fitting linear models in MAT-
LAB/R, functions, iterations, and conditions in MATLAB/R.
Teaching and Learning Methodology: Teaching will comprise 2 hours of formal lec-
tures and 3 hours of computer -based practical classes every day for five days. Students
will undertake computer-based data analysis practical’s involving use of MATLAB/R
to perform statistical analysis/modelling using real data including graphs and fitting
liner models. Assignments from practical classes are handed in for feedback.
Course Textbooks
1. Faraway J., (2014). Linear Models with R; 2nd Edition; Chapman and Hall; ISBN:
1439887330, ISBN: 978-1439887332.
3. Brian, H. & Daniel, V. (2019). Essential MATLAB for Engineers and Scientists (7th
Ed.). London, Elsevier. ISBN-13: 9780081029978
Reference Textbooks
1. Maindonald, J. and Braun, J, (2006). Data Analysis and Graphics Using R; 2nd
Revised Edition; Cambridge University Press; ISBN: 1139460536, 9781139460538.
3. Crawley, M.J., (2005). Statistics: An Introduction Using R; 1st Edition; Wiley, New
York; ISBN: 978-0-470-02298-6.
Course Journals
Reference Journals
Introduction
Computational mathematics is the use of computers to solve mathematical problems.
It is a broad field that encompasses a wide range of topics, including numerical anal-
ysis, scientific computing, and mathematical modeling. Numerical analysis is the study
of numerical methods for solving mathematical problems. Numerical methods are
approximate methods that use computers to find solutions to problems that are too
difficult or impossible to solve analytically. Scientific computing is the use of comput-
ers to solve problems in science and engineering. Scientific computing problems often
involve the solution of differential equations, optimization problems, and statistical
problems. Mathematical modeling is the use of mathematical equations to describe a
physical system. Mathematical models can be used to predict the behavior of a sys-
tem, to design new systems, and to test the validity of existing theories.
• It can be used to solve problems that are too difficult or impossible to solve ana-
lytically.
• It can be used to solve problems that are too large or complex to be solved by
hand.
• It can be difficult to choose the right numerical method for a particular problem.
• Data analysis
• Data visualization
• Numerical computing
• Symbolic computing
• Simulation
• Algorithm development
• Software development
MATLAB also has a powerful graphical user interface (GUI) that makes it easy to create
and edit plots, animations, and simulations.
In this introduction to mathematical computing using MATLAB, we will cover the
following topics:
MATLAB Desktop
MATLAB may be started via the Start menu or by clicking on the MATLAB icon on the
desktop. Upon startup, a new window will open containing the MATLAB ?desktop?
and one or more MATLAB windows will open within the MATLAB desktop as seen in
the Figure 1.
The main windows are: the Command Window, Command History, Current Folder, and
Workspace. You can customize the MATLAB windows that appear upon startup by
opening clicking on Layout in the Tool strip and checking (or unchecking) the windows
that you wish to appear on the MATLAB desktop.
1. Command Window: In the Command window, you can enter commands and
data, make calculations, and print results. You can write a script in the Com-
mand window and execute the script. However, writing a script directly into the
Command window is discouraged because it will not be saved, and if an error is
made, the entire script must be retyped. By using the up arrow (?)key on your
keyboard, the previous command can be retrieved (and edited)for re-execution.
2. Command History Window: This window lists a history of the commands that
you have executed in the Command Window. You can click on a command in
this window and it will be re-executed.
3. Current Folder Toolbar: This toolbar gives the path to the Current Folder. To run
a MATLAB script, the script needs to be in the folder listed in this toolbar.
4. Current Folder Window (on the left): This window lists all the files in the Cur-
rent Folder whose path is listed in the Current Folder Toolbar.By double clicking
on a file in this window, the file will open within MATLAB.
5. Script Window: To open this window, click on the New Script icon in the Tool-
strip in MATLAB?s desktop. This will open the Script window (see 2).
6. The Script window may be used to create, edit, and execute MATLAB scripts
(programs). Scripts are then saved as M-Files. These files have the extension .m,
such as heat.m. To execute the script, you can click the Save and Run icon (the
green arrow) in the Script window (see 2) or return to the Command window
and type in the name of the program(without the .m extension).
Example 1. If the newly created script is called heat.m, to run it in the command window, we
just type heat and press enter.
MATLAB Fundamentals
When using MATLAB in the mathematical computing the following should be consid-
ered
1. Variable names: Must start with a letter, can contain letters, digits, and un-
derscore character,can be of any length but must be unique within the first 19
characters Note: Do not use a variable name that is the same as a file name, a
MATLAB function name, or a self-written function name.
3. Semicolons are usually placed after variable definitions and program statements
when you do not want the command echoed to the screen. In the absence of a
semicolon, the defined variable appears on the screen, for example, if you entered
the following assignment in the Command Window:
Alternatively, if you add the semicolon after the assignment, then your command
is entered, but there is nothing printed to the screen, and the prompt immediately
appears for you to enter your next command:
9. The save command saves variables or data in the Workspace of the Current-
Folder. The data file name will have the .mat extension.
10. User-defined functions (also called self-written functions) are also saved as M-
files.
11. Scripts and functions are saved as ASCII text files. Thus, they may be written ei-
ther in the built-in Script window or in Notepad or in any word processor(saved
as a text file). Be aware that the single quotation mark in MicrosoftWord is not
the same as the one in MATLAB and will need to be changed inthe MATLAB
program.
12. The basic data structure in MATLAB is a matrix. For example a matrix
" #
1 3
A=
6 5
where the semicolon within the brackets indicates the start of a new row within
the matrix.
13. A specific element in the matrix can be accessed by specifying the row followed
by the column. For example from the above matrix A we can access number 3
which is in the 1st row and 2nd column as shown below.
typing
The colon in the expression A(:,1) implies all the rows in matrix A,and the 1
implies column 1. Typing
The first colon in the expression A(:,2:3) implies all the rows in A, and the
2:3 implies columns 2 and 3.
(b) Colon operator can also be used to generate a series of numbers. The syntax
is n =starting value : step size : final value. If the step size is omitted, the
defaultstep size is one. For example n = 1 : 8 gives
10
* Multiplication
/ Division
+ Addition
- Subtraction
^ Power / Exponentiation
11
pi π or 3.1426
√
i or j −1
Inf ∞
The last computed unassigned result to an expression typed
ans
in the Command window
sin sine
sinh hyperbolic sine
asin inverse sine
asinh inverse hyperbolic sine
cos cosine
cosh hyperbolic cosine
acos inverse cosine
acosh inverse hyperbolic cosine
tan tangent
tanh hyperbolic tangent
tanh hyperbolic tangent
atan inverse tangent
atanh inverse hyperbolic tangent
π
NB: x (radians) = x (degrees) × 180 .
12
exp exponential
log natural log
log10 common (base 10) logarithm
sqrt square root
erf error function
For example
13
For example
14
size(X) Gives the size i.e the number of rows and number of columns of a matrix
length(X) For the vectors, this gives the number of elements in X
linspace(X,Y,N) Generates N points between X and Y
Gives the sum of elements in X.
sum(X) For matrices, sum(X) gives a row vector containing the sum.
of elements in each column of the matrix
For vectors, it gives the maximum element in X
For matrices, max(X) gives a row vector containing the maximum
max(X) .
in each column of the matrix.
If X is a column vector, it gives the maximum absolute value of X
min(X) Same as max(X) but gives the minimum element.
For vectors this sorts elements of X in ascending order.
sort(X)
For matrices sorts each column in the matrix in ascending order.
factorial(X) n! = 1 × 2 × 3 × . . . × n
mod(x,y) Modulo operator gives the remainder from the division of x by y.
21. Sometimes it is necessary to preallocate a matrix of a given size. This can be done
by defining a a matrix of all zeros or ones. For example
0 0 0
A = zeros(3) = 0 0 0 ,
0 0 0
0 0
B = zeros(3, 2) = 0 0 ,
0 0
1 1 1
C = ones(3) = 1 1 1 ,
1 1 1
" #
1 1 1
D = ones(2, 3) = .
1 1 1
15
To generate the identity matrix i.e. main diagonal of ones we use ’eye’ .E.g.
1 0 0
I = eye(3) =
0 1 0 .
0 0 1
1. The disp() command prints only the items that are enclosed within the parenthe-
ses which can be a variable or alphanumeric information. Alphanumeric infor-
mation must be enclosed by single quotation marks. For example
2. The fprintf command prints formatted text next to the screen or to file, for exam-
ple
\n moves the cursor to a newline, \t moves the cursor several spaces along the
line. %f refers to a formatted floating-point number that is assigned to the vari-
able V. You can also specify the number of spaces and decimal places you may
wish to display. For example using %8.3f, is used to specify 8 places to be printed
to 2 decimal places.
16
fo=fopen(’filename’,’w’)
fprintf(fo,’format’,var1,var2,.....)
the format string contains the textfoemat for var1, var2, etc.
For example
17
4. Existing data file can also be entered into a program by using the command load,
for example
load filename.txt
x=filename(:,1);
y=filename(:,2);
Loops
Loops provide the means to repeat a series of statements with just a few lines of code.
1. for loop
The syntax for the for loop is
The step size maybe omitted and MATLAB will take the step size as 1.
For example, an index variable as m taking the values from 1 upto 20 the for loop
can be written as
Listing 1: Loop
1 for m=1:20
2 for l=1:20
3 fprintf(' %i %i\n', m,l);
4 end
5 end
MATLAB sets the index m to 1, carries out the statements between the for and
end statements, then returns to the top of the loop, changes m to 2, and repeats
the process. After the process has been carried out 20 times, the program exits
the loop without further executing any of the statements within the loop. All
statements that are not to be repeated should not be within the for loop. For
18
example, table headings that are not to be repeated should be outside the for
loop.
Also notice that the statements within the for loop are indented for easier reading
and debugging.
Example 3. In order to determine the position x of a person in a roller coaster, the posi-
tion is determined by the function x in terms of t as x = x0 + v cos (θ ) t, where v=10 m/s
is the velocity of travel and θ=30◦ is the angle of motion. If the initial position x0 = 0.0,
determine the position at different times from 0 to 10 seconds and print the output.
Listing 2: Example 1
2. while statement
In the while loop, MATLAB will carry out the statements between the while and
end statements as long as the condition in the while statement is satisfied. If an
index in the program is required, the use of the while loop statement (unlike the
for loop statement) requires that the program generate its own index, as shown
in the following example:
Listing 3: Example 1
19
Conditional operators
1. if loop
The syntax is given as
Listing 4: If statements
1 if logical expression
2 statement;
3
4 statement;
5 else
6 statement;
7
8 statement;
9 end
If the logical expression is true, then only the upper set of statements are exe-
cuted. If the logical expression is false, then only the bottom set of statements are
executed.
a==b; a<=b;
a<b; a>=b;
4. if-elseif ladder
The syntax is given as
20
The if-elseif ladder works from top down. If the top logical expression is true,
the statements related to that logical expression are executed, and the program
will leave the ladder. If the top logical expression is not true, the program moves
to the next logical expression. If that logical expression is true, the program will
execute the group of statements associated with that logical expression and leave
the ladder. If that logical expression is not true, the program moves to the next
logical expression and continues the process. If none of the logical expressions are
true, the program will execute the statements associated with the else statement.
The else statement is not required. In that case, if none of the logical expressions
are true, no statements within the ladder will be executed.
5. switch group
In some cases, the switch group may be used as an alternative to the if-elseif
ladder. This syntax is given as
21
10 end
where var takes on the possible values var1, var2, var3, etc.
If var equals var1, those statements associated with var1 are executed, and the
program leaves the switch group. If var does not equal var1, the program tests
if var equals var2, and if yes, the program executes those statements associated
with var2 and leaves the switch group. If var does not equal any of var1, var2,
etc., the program executes the statements associated with the otherwise state-
ment. If var1, var2, etc., are strings, they need to be enclosed by single quotation
marks. It should be noted that var cannot be a logical expression, such as var1 >=
80.
For example
MATLAB Graphics
1. Plot commands
MATLAB provides many different types of plots that can be accessed by clicking
the PLOTS tab in the desktop. For example
22
3 x = 0 : 0.01 : 1;
4 y = x.^2;
5 plot(x,y)
6 grid on
In general, MATLAB draws a piecewise linear function that connects the data
points; the graph will appear smooth if the spacing between the grid points is
sufficiently small.
The ’array operations’ that are built into MATLAB are very useful for generating
vectors of vertical coordinates. For example, if x is a vector, then x2 is undefined.
However, x.2 denotes the vector that is obtained by squaring the components of
x.
Remark 4. When you use the plot command, y does not have to be a function of x;
Example 5. Graph a unit circle centred at the origin. Without the command axis(’square’)
the graph would be an ellipse due to different scaling of the horizontal and vertical axes.
Try running the following code.
2. Multiple plots
Suppose that the vectors x1 and y1 contain horizontal and vertical coordinates for
a curve, and suppose that the vectors x2 and y2 contain the coordinates for an-
other curve.The command plot(x1,y1,x2,y2) plots both curves on the same graph.
The vectors x1 and x2 could be the same. This procedure can be generalized to
any number of curves.
23
If several curves are to be plotted simultaneously, and if they all use the same
vector of horizontal coordinates, then another method can be used to plot the
curves. Multiple curves on the same graph can be distinguished by color coding
the curves. Available color types are
black ’k’
blue ’b’
green ’g’
red ’r’
cyan ’c’
yellow ’y’
Multiple curves on the same graph can also be distinguished by using different
types of lines. The available line types are
solid default
dashed ’–’
dashed-dot ’-.’
dotted ’:’
Alternatively you can create a marker plot of discrete points by using one of these
marker styles:
24
point ’.’
plus ’+’
star ’*’
circle ’o’
x-mark ’x’
diamond ’d’
For example
3. Axis control
The axis command can be used to control the ranges of x- and y-coordinates that
are plotted. (Unless you say otherwise, Matlab will choose the ranges automat-
ically.) For example, the command axis([0 10 -1 1]) specifies that the graph win-
dow will show the region 0 ≤ x ≤ 10, −1 ≤ y ≤ 1. The same effect is obtained
by the sequence of commands v = [0 10 -1 1]; axis(v) .
25
The axis command should be invoked after the graph is plotted. In general, it is
possible to plot a graph once and then execute the axis command several times
to alter the appearance of the plot.
4. Labelling plots
Suppose that a plot is currently residing in the graphics window. Some com-
mands:
xlabel(’info’) Places the character string info immediately below the x-axis
ylabel(’info’) Places the character string info next to the y-axis
title(’info’) Places the character string info above the graph
Places the lower left corner of the character string
text(x,y,’info’)
info at position (x,y) in the graphics screen
gtext(’info’) Same as text except the text is placed graphically
5. Screen control
To clear the contents of the graphics window, type clf or clg .
The command hold on holds the current graph on the screen. Subsequent graph-
ing commands will add to the current plot; everything that is already in the
graphics window will be retained, and the axes will not change. The command
hold off turns off this mode.
26
Example 8. Suppose that the only open graphics window is Figure No. 1 . The following
commands plot the graph of y = x in Figure No. 1 , y = x2 in Figure No. 2 ,and y = x3
in Figure No. 3 . The third figure is then printed.
27
Example 9. Suppose that you want to plot a function of (x, y) for 0 ≤ x ≤ 1.5 and 0 ≤
y ≤ 1, with increment 0.5 in each variable. (In practice, the increment should generally
be much smaller than this.) Arrays containing values of these variables a regenerated by
the following commands.
The matrix X thus contains values of x-coordinates, and matrix Y contains values
of y coordinates. In the matrix Y, the row index increases with increasing values
of y. Don’t worry about the values of y being upside down; this is taken care of
automatically by the contour plot and surface plot routines.
28
2. Contour plot
The MATLAB function contour produces contour plots of functions of two real
variables; the MATLAB function contour3 produces three-dimensional contour
plots, in which contours are placed on a three-dimensional surface.
Example 10. Produce a contour plot of the function z = e−y sin x for 0 ≤ x ≤ π and
0 ≤ y ≤ 1.
In this example the contours are not labelled, the contour level is chosen by MAT-
LAB.
In this example the contour levels are specified explicitly and each contour is
labelled with the corresponding value of z.
29
12 axis([−1 5 −1 2])
4. Surface plots
Examples of plotting surfaces in three dimensions are as follows.
30
Example 13. Plot a mesh plot of the surface z = e−y sin x for 0 ≤ x ≤ π and 0 ≤ y ≤
1.
5. Parametric plots
The functions plot3 and comet3 can be used to plot parametric curves in three
dimensions. The function comet is a two dimensional analogue of comet3. The
mesh and surf functions can be used to plot surfaces for which z is not a function
of x and y. Instead, write x, y, and z as functions of two independent variables,
and plot a ’parametric’ surface.
31
Example 14. If the axis of a torus is the z-axis then the torus can be parametrized in the
form x = ( a + b cos ψ) cos θ, y = ( a + b cos ψ) sin θ, z = b sin ψ, for 0 ≤ θ ≤ 2π,
0 ≤ ψ ≤ 2π. Here, a is the distance from the z-axis to the center of a cross-section, b is
the radius of a cross-section, ?θ is an angle of rotation about the z-axis, and ψ is an angle
of rotation within a cross-section. Here, we plot a torus for which a = 2 and b = 1.
32
4 p = 0.7;
5 comet(x,y,p)
For practice
1 function [y1,y2]=compound()
2 % First a table of y1 = t^2/10 and y2 = t^3/100 is created.
3 % To plot y1, y2 vs. and t, they need to be made vectors.
4 % y1 and y2 vs. t are plotted on the same graph.
5 clear;
6 clc;
7 t = 0:10;
8 for n = 1:length(t)
9 y1(n) = t(n)^2/10;
10 y2(n) = t(n)^3/100;
11 end
12
13 fprintf(' t y1 y2 \n');
14 fprintf('−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−\n');
15 for n = 1:length(t)
16 fprintf('%8.1f %10.2f %10.2f \n',t(n),y1(n),y2(n));
17 end
18 % Create the plot, y1 as a solid line, y2 as a dashed line.
19 plot(t,y1,t,y2,'−−');
20 xlabel('t'), ylabel('y1,y2'), grid, title('y1 and y2 vs. t');
21
22 text(6.5,2.5,'y2');
23
24 text(4.2,2.4,'y1'),
Linear systems
To solve a system of linear equations is the most important task in technical computing.
We revisit some concepts already covered in previous classes.
33
Recall
p = [1 − 1 − 1]
r=roots(p)
produces
You can also use the Symbolic Toolbox which connects to a computer algebra system
to solve the equation without converting to a polynomial. The equation involves the
symbolic variable and a double equal sign. The solve function finds the two solutions.
The pretty function displays the results in a way that resembles typeset mathematics
34
For equations from a system of equations, solve function can be used to obtain the
solutions even though there exists more efficient methods for systems of equations.
For example
Bisection
√
Suppose we would like to compute for the 2. Bisection method can be used by using
interval bisection which uses systematic trial and error technique. We know that the
√
2 is between 1 and 2. Trying x = 1 12 . Because x2 is greater than 2, this x is too big.
Trying x = 1 41 , for this x2 is small. We continue this way and our approximations are
1 12 , 1 41 , 1 38 , 1 16
5
, 1 13
32 , . . .
This can simply be implemented in MATLAB as follows where we include a counter
to know the number of iterations.
1 M = 2;
2 a = 1;
3 b = 2;
4 k = 0;
35
Newton’s method
This involves solving f ( x ) = 0 and draws the tangent to the graph f ( x ) at any
point and determines where the tangent intersects the x-axis. The method requires one
starting value, x0 , and the iteration is
f ( xn )
x n +1 = x n − .
f ′ ( xn )
Linear systems
To solve a system of linear equations is the most important task in technical computing.
We revisit some concepts already covered in previous classes.
Recall
p = [1 − 1 − 1]
36
r=roots(p)
produces
You can also use the Symbolic Toolbox which connects to a computer algebra system
to solve the equation without converting to a polynomial. The equation involves the
symbolic variable and a double equal sign. The solve function finds the two solutions.
The pretty function displays the results in a way that resembles typeset mathematics
For equations from a system of equations, solve function can be used to obtain the
solutions even though there exists more efficient methods for systems of equations.
For example
37
Bisection
√
Suppose we would like to compute for the 2. Bisection method can be used by using
interval bisection which uses systematic trial and error technique. We know that the
√
2 is between 1 and 2. Trying x = 1 12 . Because x2 is greater than 2, this x is too big.
Trying x = 1 41 , for this x2 is small. We continue this way and our approximations are
1 12 , 1 41 , 1 38 , 1 16
5
, 1 13
32 , . . .
This can simply be implemented in MATLAB as follows where we include a counter
to know the number of iterations.
1 M = 2;
2 a = 1;
3 b = 2;
4 k = 0;
5 while b−a > 1.2e−10
6 x = (a + b)/2;
7 if x^2 > M
8 b = x;
9 else
10 a = x;
11 end
12 fprintf('These are the values of a= %5.8f and b=%5.8f at %i iteration
with %5.4e with %5.4e\n',a,b,k, b−a, eps )
13 k = k + 1;
14 end
38
Newton’s method
This involves solving f ( x ) = 0 and draws the tangent to the graph f ( x ) at any
point and determines where the tangent intersects the x-axis. The method requires one
starting value, x0 , and the iteration is
f ( xn )
x n +1 = x n − .
f ′ ( xn )
dy(t)
= f (t, y(t)) ,
dt
A numerical solution to this problem generates a sequence of values for the indepen-
dent variable,t0 , t1 , ..., and a corresponding sequence of values for the dependent vari-
able, y0 , y1 , ..., so that each yn approximates the solution at tn :
yn ≈ y (tn ) , n = 0, 1, . . .
h n = t n +1 − t n ,
so that the estimaed error in the numerical solution is controlled by a specified toler-
ance.
The fundamental theorem of calculus gives us an important connection between dif-
39
y n +1 = y n + h f ( t n , y n ) ,
tn+1 = tn + h.
The MATLAB code would use an initial point t0, a final point t f inal and initial value
y0, a step size h and a function f . The primary loop would be
1 t = t0;
2 y = y0;
3 while t <= tfinal
4 y = y + h*f(t,y);
5 t = t + h;
6 end
Improvement on this technique is called the Improved Euler’s method or the Heun’s
method, which uses the fact that in order to approximate f (t, y) we can obtain the aver-
age of its values at t0 and t1 . For example
Z t1
y ( t1 ) = y0 + f (t, y(t))dt,
t0
Z t1
f ( t0 , y0 ) + f ( t1 , y1 )
= y0 + dt.
t0 2
40
In order to approximate f (t, y(t)) by the average of its values at t0 and t1 , we need to
know its values at t0 and t1 . We know the former but not the latter. We use some other
method in this case the Euler’s method to provide us with the initial approximation
yˆ1 for y(t1 ), thus we take yˆ1 = y0 + h f (t0 , y0 ) as an approximation of y at t1 . We then
write
Z t1
f (t0 , y0 ) + f (t1 , ŷ1 )
y ( t1 ) = y0 + dt,
t0 2
f (t0 , y0 ) + f (t1 , yˆ1 )
= y0 + h .
2
This general method with ŷn+1 is also called te Runge-Kutta 2nd Order method. It is an
example of predictor-corrector method, where it uses Euler’s method to predict and
then corrects the value.
The Runge-Kutta 4th order method used improvements of the above discussed
method for increased accuracy. In general this technique calculates the values of k1 , k2 , k3 , k4
and k from the formulas
k1 = h f (tn , yn ) ,
h k1
k2 = h f tn + , yn + ,
2 2
h k2
k3 = h f tn + , yn + ,
2 2
k4 = h f (tn + h, yn + k3 ) ,
41
and
1
k= [k1 + 2k2 + 2k3 + k4] ,
6
then set yn+1 = yn + k and take this as the approximation at xn+1 = xn + h.
y′ ( x ) = xy.
In MATLAB we can use the built-in function called dsolve(). For this problem the syn-
tax looks like this
1 y = dsolve('Dy = y*x','x');
Notice in particular that MATLAB uses capital D to indicate the derivative and
requires that the entire equation appear in single quotes. MATLAB takes t to be the
independent variable by default, so here x must be explicitly specified as the indepen-
dent variable. Alternatively, if you are going to use the same equation a number of
times, you might choose to define it as a variable, say, eqn1.
To solve an IVP with say the initial condition y(1) = 1, we use can use either of the
following structures
Now that we’ve solved the ODE, suppose we want to plot the solution to get a rough
idea of its behavior. We run immediately into two minor difficulties:
42
1. Our expression for y( x ) isn’t suited for array operations (.*, ./, .?).
The first of these obstacles is straightforward to fix, using vectorize(). For the second,
we employ the useful command eval(), which evaluates or executes text strings that
constitute valid MATLAB commands. Hence, we can use
1 x = linspace(0,1,20);
2 z = eval(vectorize(y));
3 plot(x,z)
Remark 15. eval() evaluates strings (character arrays), and y, as we have defined it, is a
symbolic object. However, vectorize converts symbolic objects into strings.
d2 y ( x ) dy( x ) dy
+ 8 + 2y ( x ) = cos x y ( 0 ) = 0, (0) = 1,
dx2 dx dx
Systems of ODEs
Suppose we want to solve and plot solutions to the system of three ordinary differential
equations
43
1 [x,y,z]=dsolve('Dx=x+2*y−z','Dy=x+z','Dz=4*x−4*y+5*z');
Remark 17. If you use MATLAB to check your work, keep in mind that its choice of constants
C1, C2, and C3 probably won’t correspond with your own. For example, you might have C =
−2C1 + 21 C3, so that the coefficients of exp(t) in the expression for x are combined. Fortunately,
there is no such ambiguity when initial values are assigned. Notice that since no independent
variable was specified, MATLAB used its default, t.
To solve an initial value problem, we simply define a set of initial values and add them
at the end of our dsolve() command.Suppose we have x (0) = 1, y(0) = 2, and z(0) = 3.
We have, then,
1 inits='x(0)=1,y(0)=2,z(0)=3';
2 [x,y,z]=dsolve('Dx=x+2*y−z','Dy=x+z','Dz=4*x−4*y+5*z',inits);
3 t=linspace(0,.5,25);
4 xx=eval(vectorize(x));
5 yy=eval(vectorize(y));
6 zz=eval(vectorize(z));
7 plot(t, xx, t, yy, t, zz)
44
1 [outputs] = function_handle(inputs)
2 [t, state] = solver (@dstate, tspan, Initialconditions, options)
where
1. state: An array. The solution of the ODE (the values of the state at every time).
4. tspan: Vector that specifies the interval of the solution e.g. [t0 : 5 : t f ].
5. Initial conditions: A vector of the initial conditions for the system (row or col-
umn).
Different numerical methods reduce errors at a different rate for example Euler’s method,
Midpoint methods and Rune-Kutta methods reduce the error at 1st , 2nd and 4th orders
respectively. Different solvers have been implemented differently and are applicable
differently depending on the circumstances.
In summary
1. ode45: Based on explicit Runge-Kutta 4th and 5th order formula. In computing
y(tn+1 ), it needs only the solution at the immediately preceding time point, y(tn ).
Usage: Nonstiff problems, medium accuracy. Use most of the time. This should
be the first solver you try.
2. ode23: Based on explicit Runge-Kutta 2nd and 3rd order formula. It is often more
efficient than ode45 at crude tolerances and in the presence of moderate stiffness.
Usage: Nonstiff problems, low accuracy. Use for large error tolerances or moder-
ately stiff problems.
45
Example 18. Numerically approximate the solution of the first order differential equation
dy
= xy2 + y, y(0) = 1,
dx
on the interval x ∈ [0, 5].
For any differential equation in the form y′ = f ( x, y), we begin by defining the func-
tion f ( x, y). For single equations, we can define f ( x, y) as an inline function. For this
example we can implement it as follows
46
1 f=inline('x*y^2+y');
2 [x,y]=ode45('x*y^2+y',[0 .5],1);
3 plot(x,y);
Remark 19. It is important to point out here that MATLAB continues to use roughly the same
partition of values that it originally chose; the only thing that has changed is the values at which
it is printing a solution. In this way, no accuracy is lost.
Options
Several options are available for MATLAB ode45 solver, giving the user limited con-
trol over the algorithm. Two important options are relative and absolute tolerance,
respecively RelTol and AbsTol in MATLAB . At each step of the ode45 algorithm, an er-
ror is approximated for that step. If yk is the approximation of y( xk ) at step k, and ek is
the approximate error at this step, then MATLAB chooses its partition to insure
where the default values are RelTol = .001 and AbsTol = .000001. As an example for
when we might want to change these values, observe that if yk becomes large, then
the error ek will be allowed to grow quite large. In this case, we increase the value of
RelTol. For the equation y′ = xy2 + y, with y(0) = 1, the values of y get quite large as
x nears 1. In fact, with the default error tolerances, we find that the command
47
1 f=inline('x*y^2+y');
2 [x,y]=ode45(f,[0,1],1);
3 plot(x,y);
leads to an error message, caused by the fact that the values of y are getting too large
as x nears 1. (Note at the top of the column vector for y that it is multipled by 1014 .) In
order to fix this problem, we choose a smaller value for RelTol.
1 options=odeset('RelTol',1e−10);
2 [x,y]=ode45(f,[0,1],1,options);
3 max(y)
which is 2.4251e + 07
Example 20. Now using functions the ODE in the previous example can be implemented as
dy
= αy(t) − γt(t)2 , y(0) = 10.
dt
48
Example 22. Implement the following system of equations1 in MATLAB and plot the solu-
tions. The system is given by
dy
= x, y(0) = 2,
dt
dx
= 1000 1 − y2 x − y, x (0) = 0.
dt
This is a stiff system because the limit cycle has portions where the solution compo-
nents change slowly alternating with regions of very sharp change - so we will need
ode15s. Hence we implement as follows
49
where t was time, E, the released energy(function of mass of the bomb), ρ, density of
the ambient air and p denoting the air pressure. With all the conclusions, he deduced
the formula of the radius of the shock as
1/5
t2 E
R= , (2)
ρ
which describes the radius of the shockwave as a function of t and parameters E and
ρ. From the example above, given the measurement data (t, R(t)) and the value of
density as ρ = 1.25kg/m3 , it was possible to estimate E and hence the mass of the
nuclear bomb. Using the following data
50
2 1 1
log R = log t + log E − log ρ. (3)
5 5 5
2 1 1
log R = log t + b, where b = log E − log ρ. (4)
5 5 5
• E ≈ 8.05 × 1013 joules can be obtained by a least square bit of the data. And using
a conversion factor of 1Kiloton = 4.186 × 1012 Joules, Taylor was able to estimate
the weight of the bomb as 19.2 Kilotons and was later revealed that the actual
weight was 21.1 Kilotons. (Taylor had quite accurate approach).
• Now exploring the usefulness of the data above, the logarithmic representation
given by (4) is equivalent to an equation of the form
y(t) = αx (t) + β,
with the new variables y = log R and x = log t, known parameter α and un-
known parameter β
ti , R(ti ) : i = 1, 2, . . .
51
• Taking into account the measurement data are subject to measuring errors, e.g.,
from the measurement apparatus or from other sources of errors not part of the
measurement model, then a more realistic model could be an equation of the
form
Y (t) = αX (t) + β + ϵ(t) (5)
1. Find a straight line y = ax + b as in (5) that ’best fits’ all the data points.
y = a m x m + a m −1 x m −1 + · · · + a1 x + a0 ,
y = a0 f 0 ( x ) + a1 f 1 ( x ) + · · · + a m f m ( x ),
that ’best fits’ all data points. Here f 0 ( x ), f 1 ( x ) . . . f m ( x ) are given functions.
• When n > 2, there is a little hope for a line to pass through more than two data
52
y1 = a1 x1 + a0 ,
y2 = a1 x2 + a0 ,
..
.
y n = a1 x n + a0 .
We look for a best fittig line that minimises the total error.
From the table, d( a0 , a1 ) measures the total error between data yi and the pre-
diction a1 xi + a0 for i = 1, . . . , n. This problem is also a standard minimization
problem, we look for a pair ( aˆ0 , aˆ1 ) at which the function d( a0 , a1 ) is a minimum.
• The choice of the Euclidean norm for the measure of error gives rise to the form
’least squares’
It ensures that d( a0 , a1 ) is a differentiable function.
53
for all x ∈ R. || · || denotes the Euclidean norm and T (superscript denotes transposi-
tion of the matrices and vectors.)
• The expression term given by (6) is infact the square of the Euclidean norm ||b −
Ax ||2 and we use the fact that ||b − Ax || is minimized iff ||b − Ax ||2 is minimized.
for all x ∈ R n .
Geometrical Illustration:
A least square solution x̂ can be found based on geometrical observations.
To set
col ( A) = { Ax : x ∈ R n } ,
A x̂ = b̂. (9)
• It is desirable to find a solution x̂ of (9) without having to find the projection b̂.
Consider the following diagram
b
•
b − b̂
•
col ( A) b̂ = Proj b
54
(b − A x̂ ) · all columns of A = 0.
A T (b − A x̂ ) = 0.
A T A x̂ = A T b. (10)
• The least squares solution x̂ may not be unique, however rthe solutions to (9) and
(10) are unique.
Example 24. Find the line y = a0 + a1 x that best fits the data points (2, 1), (5, 2), (7, 3) and
(8, 3).
In this case
1 2 1
" #
1 5 2 a0
A=
1 7 , b = 3 , x = a
1
1 8 3
1 8 3
55
3.5
3
2.5
y
2
1.5
1
0.5
∗∗∗∗
2 4 6 8
x
Example 25. Find the quadratic curve that best fits the data points (2, 1), (−1, 5), (6, 2), (4, −1)
The quadratic curve is of the form
y = a2 x 2 + a1 x + a0
56
namely
4 11 a 57 7
0
11 57 287 a1 = 5
57 287 1569 a2 65
Remark 26. In the above example, we use non-linear best-fit functions. Why do we call it a
linear least-squares problem? The best fit function takes the form of linear combination of bases
functions, and finding the best-fit functions means finding the best choice of coefficients.
Example 27. Find the least squares function of the form x (t) = a0 e a1 t , t > 0, a0 > 0 for the
data points
(t1 , x1 ), (t2 , x2 ), . . . , (tn , xn ), x1 , x2 , . . . , xn > 0.
Let y(t) = ln x = ln a0 + a1 t,
57
and
b0 = ln a0 ,
b1 = a1 ,
y1 = ln x1 , · · · , yn = ln xn .
y(t) = b0 + b1 t, t > 0,
ˆ ˆ
we obtain least-squares solution b0 , b1 . The best-fit curve to the original problem is
then given by
ˆ
x (t) = aˆ0 e aˆ1 t where aˆ0 = eb0 , aˆ1 = bˆ1 .
find a parameter value θ̂ such that the curve y = f ( x, θ̂ ) minimizes the squared sum of
erros (SSE):
n
2
SSE (θ ) = ∑ (yi − f (xi , θ )) , (14)
i =1
NB: Linear least squares will not work for this kind of problem.
SSE(θ) function is treated as a smooth function of θ. To find its minimum in R m , we
use calculus to get a critical point.
∂SSE(θ )
= 0, j = 1, . . . , m. (15)
∂θi
58
• Let
(k)
∆yi = yi − f xi , θ ,
( k +1) (k)
∆θ j = θj − θj .
n m n
( k +1)
∑∑ Jis Jij ∆θs = ∑ Jij ∆yi , (18)
i =1 s =1 i =1
59
• The following iterations scheme for non-linear least-square method is called Gauss-
Newton method:
There are many other methods used for non-linear least-squares problem aimed
at improved efficiency and rates of convergence.
f ( x, θ ) = α( x ).θ,
m
f ( xi , θ ) = α ( xi ) · θ = ∑ α j ( xi ) θi , i = 1, . . . , n
j =i
and
n
SSE(θ ) = ∑ ( y i − α ( x i ) · θ )2 .
i =1
Therefore
n
∂SSE
= ∑ ( yi − α ( xi ) · θ ) − α j ( xi ) ,
∂θ i =1
n
= − ∑ α j ( xi ) (yi − α ( xi ) · θ ) , j = 1, . . . , m. (20)
i =1
60
• Let A = α j ( xi ) , then (20) can be written in matrix form as
A T y − A T Aθ = 0,
A T Aθ = A T · y,
(1, 4.6), (2, 8.82), (3, 16), (4, 31.3), (5, 58.5)
y = b0 + b1 t,
(1, 1.526) , (2, 2.177), (3, 2.773), (4, 3.444), (5, 4.069).
61
f (t, a) = a0 e a1 t , a = ( a0 , a1 ), x (1, 2, . . . , 5) T
62
(k) (k)
y 1 − a 0 e a0
y 2 − a ( k ) e a0 ( k )
0
( k ) a0 ( k )
y3 − a0 e
y − a ( k ) e a0 ( k )
4 0
( k ) a0 ( k )
y5 − a0 e
For k = 0;
1.882
" (1)
# " # " # −1 " # 1.431
a0 1 25472.8 12383 2.718 7.389 20.086 54.598 148.413
(1) = + × −4.086 ,
a1 1 133383 602214 2.718 14.778 60.257 218.393 742.066
− 23.298
−89.913
" #
1.386
= .
0.801
For k = 1;
1.511
" # " # " # # 1.936 "
a20 1.386 3797.53 24873.7 2.28 4.965 11.064 24.653 54.934
= + × 0.661 ,
a21 0.801 24873.7 166018 3.089 13.768 46.017 136.716 380.807 −2.879
−17.66
" #
2.104
= .
0.651
For k = 2, " # " #
(3)
a0 2.428
(3) = .
a1 0.635
63
The margin of error is 3 dp, the desired accuracy is achieved at 4th iteration and
a0 ≈ 2.431, a1 ≈ 0.636
70
60
50
40
30
20
10
1 2 3 4 5
dx
= f ( x, θ ), x ∈ R d , t ∈ [0, tmax ] , (21)
dt
x (0) = x0 . (22)
Here, θ ∈ R m is an m−dimensional parameter and [0, tmax ] is the finite time interval
in which the model is considered. The data is often given at discrete observation time
64
T
x (t, θ ) ≈ x (t1 , θ ), x (t2 , θ ), . . . , x (t p , θ ) .
From the fundamental theory of differential equations we know that, if the vector field
f () x, θ ) is a smooth function (having continuous partial derivatives) with respect to
( x, θ ), then the solution x (t, θ ) has a dependence on ? with the same order of smooth-
ness as f . In the above notation, the dependence of the solution on initial condition x0
is understood and suppressed. We keep x0 fixed and discuss fitting the parameter θ. In
many applications, the initial conditions are not always known and need to be fitted.
Since the solution x (t, x0 , θ ) is a diffeomorphism2 with respect to the initial conditions
x0 , we can consider x0 as part of the parameter ?. We denote the data points as
y = g ( x (1) ) , g ( x (2) ) , . . . , g ( x ( p ) ) , x ( i ) ∈ R d .
The squared sum of the errors (SSE) between solution and the data can be measured
by
p 2
SSE(θ ) = d ( g( x (t, θ )), y)2 = ∑ = g( x (ti , θ )) − g x (i) . (23)
i =1
Remark 29. It is important to measure the differences in quantities of the same type. Since
the data is given as the observable quantities g( x ), we need to compare the observable part of
the model solution g ( x (t, θ )) with the data. The expression ∥ g( x ) − g(y)∥ is the Euclidean
norm of the n-dimensional vector g( x ) − g(y), that is in its discrete form it is written as
∥ g( x ) − g(y)∥2 = ∑in=1 | gi ( x ) − gi (y)|2 .
The main idea in using the least-squares fitting is to obtain the value θ̂ of the model
parameter θ such that the value SSE(θ ) is a minimum. Such a problem is clearly a
non-linear least squares method, since the dependence of the solution x (t, θ ) on the
2Aone-to-one continuously-differentiable mapping f : M → N of a differentiable manifold M ( e.g.
of a domain in a Euclidean space) into a differentiable manifold N for which the inverse mapping is also
continuously differentiable.
65
1. lqcurvefit: This function requires the following inputs: the model equation, the
initial guess for the parameters to be fitted, the time points and the data points.
It then solves the non-linear least-squares problem directly.
3. fminsearch: This function needs sum of squares of errors SSE(θ ) between the
model output and data. With the initial guess of θ0 the model can be solved
numerically to produce a value for SSE(θ0 ). This needs the least-squares error
function and the initial guess of the parameter value, and uses a direct search
routine to find the minimum value of least-squares error. To ensure that the min-
imim value returned by the fminsearch is not just a local minimum, the process is
repeated with several choices of the initial guess.
polyfit function
MATLAB calls curve fitting with a polynomial by the name ’polynomial regression’.
The function polyfit(x,y,m) returns a vector of (m + 1) coefficients, ai ,that represent the
best-fit polynomial of degree m for the( xi , yi ) set of data points. The coefficient order
corresponds to decreasing powers of x; that is,
y c = a 1 x m + a 2 x m −1 + a 3 x m −2 + . . . + a m x + a m +1 .
As discussed in the previous section, MATLAB measures the precision of the fit using a
66
function named MSE which calculates the mean squared error (MSE) which is defined
as
1 m
(yi − yc,i )2 ,
n i∑
MSE = (24)
=1
Example 30. In this example we obtain the best fit polynomial for the approximating function
for orders 2,3,4 and 5. The data is given as: x = −10 : 2 : 10 or x2 = 10 : 0.5 : 10 and
y = (−980, −620, −70, 80, 100, 90, 0, −80, −90, 10, 220)
67
26 for n = 2:5
27 fprintf(' %d %8.2f \n',n,MSE(n))
28 end
From the values obtained for MSE, MSE decreases as the order of the fitted polynomial
is increased.
Cubic splines
Given a set of n data points, suppose that an mth -degree polynomial is selected as the
approximating curve and that this approximating curve produces curve values that are
not allowed. For example, suppose it is known that a particular property represented
by the approximating curve (such as absolute pressure or absolute temperature)must
be positive and the approximating function produces values that are negative. In this
case, the approximating function produces values that are not allowed and is therefore
not satisfactory. The method of cubic splines eliminates this problem. Given a set of
(n + 1) data points ( xi , yi ) , i = 1, 2, . . . , (n + 1), the method of cubic spline develops a
set of n cubic functions such that y( x ) is represented by a different cubic in the interval
of the n intervals and the set of the cubics passes through the (n + 1) data points.
68
This is accomplished by forcing the slopes and the curvatures to be the same fr each
pair of cubics that join at a data point.
d2 y
±
dx2
Remark 31. The curvature, K is given by k = 3/2 .
dy 2
1+ dx
Consider the diagram below, showing the two adjacent intervals in a cubic spline
curve fitting scheme.
y( x ) = Ai + Bi ( x − xi ) + Ci ( x − xi )2 + Di ( x − xi )3 .
From the above equations, we have fewer equations than the unknowns hence the
d2 y
need to make the additional assumptions. The values for dx at x1 and at xn+1 must b
assumed. The following alternatives exist:
69
In MATLAB the syntax for the cubic spline function is spline function as follows:
yy=spline(xi,yi,xx),,
where ( xi, yi ) is the given set data points and yy is the value of y at xx. The spline
function determines the four cubic coefficients for each section in the given data and
will evaluate yy by the cubic spline method.
Remark 32. Using the function interpl in MATLAB gives the same results as spline method
is specified for interpolation. The syntax for interpolating by the spline method is
yi=interpl(x,y,xi,’spline’)
Example 33. Consider the data given by distance=[0.52:0.3:4.12] and pressure=[165.5, 96.5,69.0,52.4,37.2,
27.6,21.4,17.2,13.8,11.7,10.3, 9.0, 7.2]. We use the two methods spline and interpl for this set
of data points and see the results.
70
Example 34. Consider the data given as follows for the infection of flu in a school.
Day 3 4 5 6 7 8 9 10 11 12 13 14
Number of
25 75 227 296 258 236 192 126 71 28 11 7
infected individuals
Let us use the function fminsearch to fit this data to solutions from mathematical model for flu
given by the system of ODE as follows
dS
= − βSI,
dt
dI
= βSI − αI.
dt
It is possible to fit the parameters α and β and the initial conditions. We can pre-
estimate α from the duration of infectiousness and the two initial conditions from the
data above. Since I (3) = 25 and S(3) = 738, the duration of infectiousness is 2-4
days, so we may take α = 0.3. and β = 0.0025. We use MATLAB to fit the data to the
solutions of the model.
Listing 26: Using fminserch to fit the parameters of from the ODE model
1 function ODE_model_fitting
2
3 clear all
71
4 close all
5 clc
6 Filename='Data';
7 fludata = xlsread(Filename);
8 format long % specifying higher precision
9 tdata = fludata(:,1); % define array with t−coordinates from the data
10 qdata = fludata(:,2); % define array with y−coordinates i.e the number of
infections
11
12 tforward = 3:0.01:14; % t mesh for the solution of the ODEs
13 tmeasure = [1:100:1101]'; % selects the points in the solution
14
15 a = 0.3;
16 b = 0.0025; % initial values of parameters to be fitted
17
18 function dy = model_1(t,y,k) % the system of ODEs
19 a = k(1);
20 b = k(2);
21 dy = zeros(2,1);
22
23 dy(1) = − b * y(1) * y(2);
24
25 dy(2) = b * y(1) * y(2) − a * y(2);
26 end
27
28
29 function error_in_data = moder(k) % computing the error in the data
30
31 [T Y] = ode23s(@(t,y)(model_1(t,y,k)),tforward,[738.0 25.0]);
32
33 q = Y(tmeasure(:),2);
34 error_in_data = sum((q − qdata).^2); %computes SSE
35 end
36 k = [a b]; % main routine; assigns initial values of parameters
37 [T Y] = ode23s(@(t,y)(model_1(t,y,k)),tforward,[738.0 25.0]);
72
38
39 yint = Y(tmeasure(:),2);
40 figure(1)
41 subplot(1,2,1);
42 plot(tdata,qdata,'r*');
43 hold on
44 plot(tdata,yint,'b−');
45 xlabel('time in days');
46 ylabel('Number of cases');
47 title('Fitting before optimizing the parameters')
48 axis([3 14 0 500]);
49 grid on
50
51 [k,fval] = fminsearch(@moder,k); % minimization routine;
52 [T Y] = ode23s(@(t,y)(model_1(t,y,k)),tforward,[738.0 25.0]);
53 yint = Y(tmeasure(:),2); % computing the y−coordinates ...
54 k
55 subplot(1,2,2)
56 plot(tdata,qdata,'r*');
57 hold on
58 plot(tdata,yint,'b−');
59 xlabel('time in days'); % plotting final fit
60 ylabel('Number of cases');
61 title('Fitting after optimizing the parameters')
62 axis([3 14 0 500]);
63 grid on
64 end
73
(i) Random Sampling: Monte Carlo methods rely on the generation of random num-
bers to simulate various scenarios. These random samples are drawn from prob-
ability distributions that represent the uncertainties or variables in the problem
being studied.
(ii) Integration: One of the primary applications of Monte Carlo methods is in ap-
proximating definite integrals of complex functions. Instead of using traditional
74
(iii) Importance Sampling: To improve the efficiency of Monte Carlo simulations, im-
portance sampling is often employed. This technique involves biased sampling
to focus on regions of the problem space that have a significant impact on the final
result, rather than uniformly sampling the entire space.
(iv) Markov Chain Monte Carlo (MCMC): MCMC methods are a specialized class
of Monte Carlo techniques that use Markov chains to generate correlated ran-
dom samples. These methods are particularly useful when dealing with high-
dimensional spaces and are commonly used in Bayesian statistics and machine
learning for parameter estimation and inference.
Some of the applications of Monte Carlo Methods include and not limited to
(i) Simulation: Monte Carlo simulations are extensively used to model and analyze
complex systems, such as financial markets, traffic flow, weather patterns, and
nuclear reactions. These simulations can help predict outcomes and assess risk in
real-world scenarios.
(iv) Uncertainty Analysis: Monte Carlo methods are essential for quantifying uncer-
tainty in complex systems. By running multiple simulations with randomly var-
ied inputs, one can assess the uncertainty and sensitivity of the model’s outputs.
(v) Gaming and Gambling: The original inspiration for Monte Carlo methods came
from games of chance, and these methods continue to be used in gambling and
casino industries for statistical analysis and predicting outcomes.
75
Monte Carlo methods have proven to be versatile and powerful tools for tackling com-
plex problems that defy traditional analytical approaches. However, they require care-
ful consideration of sample sizes, convergence criteria, and the underlying probability
distributions to ensure accurate results and reliable conclusions. As computational
power continues to advance, Monte Carlo methods will likely remain a crucial compo-
nent in addressing real-world challenges and refining our understanding of complex
systems.
VarX := E [ X − EX ]2 = EX 2 − E [ X ]2 .
Cov( X, Y ) := E [( X − EX ) (Y − EY )] ,
76
matrix as
h i
CovX :=E (X − EX) (X − EX)T ,
VarX1 Cov( X1 , X2 ) . . . Cov( X1 , Xn )
Cov( X , X ) VarX2 . . . Cov( X2 , Xn )
2 1
= . .. .. .
.. . .
Cov( Xn , X1 ) Cov( Xn , X2 ) ... VarXn
since Cov( X j , Xi ) = Cov( Xi , X j ) then CovX is symmetric for any X. The covariance
between random vectors X ∈ R n and Y ∈ R m is the n × m matrix
h i
Cov (X, Y) :=E (X − EX) (Y − EY) , T
Cov( X1 , Y2 ) Cov( X1 , Y2 ) . . . Cov( X1 , Ym )
.. ... ..
=
. . .
Cov( Xn , Y1 ) Cov( Xn , Y2 ) . . . Cov( Xn , Ym )
θ := Eh (X) .
77
Remark 35. In practice, this is usually too complex to evaluate analytically. The compu-
tational costs of deterministic methods for numerical integration typically increase exponen-
tially quickly with the dimension m of X. By contrast the Monte Carlo methods for computing
Eh (X) converges at a rate that is independent of m, hence this makes the Monte Carlo methods
attractive tools for complex, high-dimensional systems.
3
The integrand e− x doesn’t seem to have a closed form solution hence we can use nu-
merical techniques to evaluate it. Using the approach of Riemann Integration where it
is proposed that we use choose evenly spaced points x1 , . . . , xK over the interval [0, 1]
and obtaining the corresponding functional values f ( x1 ) , . . . , f ( xK ) and use
1 K
K i∑
( f ( xi )) ,
=1
where U is uniform random variable over the interval [0, 1], hence the integration is
3
the expected value of the random variable e−U which implies that evaluating the inte-
gration is the same as estimating the expected value. So we can generate independent,
identical distributed (iid) random variables U1 , . . . , UK ∼ Uni[0, 1] and then compute
3 3
W1 = e−U1 , . . . , WK = e−UK ,
78
R1 3
as the numerical evaluation of 0 e− x dx.
Hence by the Law of Large Number, 3
Z 1
P −Ui3 3
W̄K −
→ E (Wi ) = E e = e− x dx,
0
is the alternative numerical method that is statistically consistent. In this above exam-
ple, the integration can be written as
Z
I= f ( x ) p( x )dx,
The alternative numerical method to evaluate the above integration is to generate the
iid X1 , . . . , X N ∼ p, N data points and then use the sample average
N
1
ĪN =
N ∑ f ( Xi ) .
i =1
This method of evaluating integrals via simulating random points is called Monte
Carlo Simulation.
Remark 36. A crucial feature of Monte Carlo simulation is the the statistical theory is rooted
in the theory of sample average where we use the sample average as an estimator of the expected
value. The bias and the variance of the estimator are the key quantities in the evaluation of the
quantity of an estimator.
Now since we are using the sample average as an estimator of the expected value, so
3 The law of large numbers states that an observed sample average from a large sample will be close
to the true population average and that it will get closer the larger the sample.
79
1
Var ( ĪN ) = Var ( f ( X1 )) ,
N
1 2
2
= E f ( X1 ) − E ( f ( X1 )) ,
N | {z }
I2
Z
1
= f 2 ( x ) p( x )dx − I 2 .
N
R
Hence the variance contains two components: f 2 ( x ) p( x )dx and I 2 .
The quantity I is fixed and we need choose the number of random points N and the
sampling distribution p. If we change the sampling distribution p, the function f will
also change.
R1 3
For instance in the evaluation of the integral 0 e− x dx( we have seen using the uni-
form random variables) to evaluate it. We can also generate iid B1 , . . . , BK ∼ Beta(2, 2),
K points from the beta distribution4 Beta(2, 2), now the PDF of Beta(2, 2) is
pBeta(2,2) ( x ) = 6x (1 − x ).
Remark 37. It is important to note that different choices of p leads to a different variance of the
estimator as the expectation is always fixed to be I so the second part of the variance remains
R
the same but for the first part f 2 ( x ) p( x )dx depends on the choice of p and f . This brings
about the issue of importance of sampling procedure.
4 The probability density function (PDF) of the beta distribution for 0 ≤ x ≤ 1 for the parameters
x α −1 (1− x ) β −1 Γ(α)Γ( β)
α, β > 0 is where B(α, β) = where Γ is the gamma function.
B(α,β) Γ(α+ β)
80
Monte-Carlo framework
STEP I: Statistical properties of the Monte-Carlo estimators
Since X1 . . . , Xn have the same distribution as X, then the sample average θ̄n is unbiased
estimator of θ:
1 n nEh (X)
E θ̄n = ∑
n i =1
Eh (Xi ) =
n
= θ. (26)
We assume that E | h (X) | < ∞, so the strong Law of Large numbers implies that θ̄n
is a consistent estimator of θ, that is
so h (X1 ) , h (X2 ) , . . . is a sequence of iid random variables with finite mean θ and a
finite variance σ2 . Therefore the Central Limit Theorem5 gives the asymptotic distri-
bution of θ̄n : √
n θ̄n − θ
⇒ N (0, 1) as n → ∞. (27)
σ
Here there is convergence in the distribution where N (0, 1) is a standard normal ran-
dom variable.
When selecting the sample, it is important we select such that our Monte-Carlo esti-
mator is as accurate as possible. Let us define the confidence level6 α ∈ (0, 1) and an
5 Let { X
1 , . . . , Xn } be a sequence of iid random variables having a distribution with the expected value
as µ and finite variance given by σ2 . Suppose we are interested in the sample average X̄ = X1 +...n + Xn ,
then by the law od large numbers, the sample averages converge almost surely to the expected value µ
as n → ∞.
6 Typical values of α are 0.05 and 0.01.
81
P |θ̄n − θ | ≤ ϵ = 1 − α.
√
n(θ̄n −θ )
Using (27) gives us an approximate7 answer. Because ⇒ N (0, 1) is asymp- σ
totically distributed as a standard normal random variable. For large n we have
( √ √ )
n θ̄n − θ nϵ
P |θ̄n − θ | ≤ ϵ = P ≤ ,
σ σ
√
nϵ
≈ P |N (0, 1)| ≤ ,
σ
√ √
nϵ nϵ
= P N (0, 1) ≤ − P N (0, 1) ≤ − ,
σ σ
√
nϵ
= 2 P N (0, 1) ≤ − 1.
σ
where the last line follows symmetry and normalization of the standard normal PDF.
To guarantee that P |θ̄n − θ | ≤ ϵ ≈ 1 − α we choose n such that
√
nϵ
2 P N (0, 1) ≤ − 1 = 1 − α,
σ
√
nϵ α
⇐⇒ Φ = 1− ,
σ 2
where Φ is the standard normal cumulative distribution function (CDF). With the def-
inition α
z1− α2 := Φ−1 1 − ,
2
the required sample size is
2
σz1− α2
n= .
ϵ
In practice, we usually don’t know the variance σ2 of h (X), it can be estimated however
by the sample variance
7 It’s
approximate because the Central Limit Theorem only gives the asymptotic distribution of n in
the limit of large n. How large must n be before n starts to ’look normal’? It depends on ’how normal’
the distribution of h (X) is, but the typical rule of thumb is that n should be at least 30
82
n
1 2
σ̄n2 := ∑ h (Xi ) − θ̄n .
n − 1 i =1
This gives a two-stage procedure for deciding how large a sample size to use:
1. Choose a pilot sample size n0 say (50 or 100), generate X1 , . . . , Xn0 iid from the
distribution X, and estimate σ by σ̄n0 .
σ̄n0 z1− α 2
2. Set n = ϵ
2
Now we as given α and n for what tolerance ϵ > 0 can we be 100(1 − α)% confident
that θ lies in the interval θ̄n − ϵ, θ̄n + ϵ
Using the results obtained from Step II,
σz1− α
ϵ= √ 2,
n
because σ is usually unknown, we use σ̄n when constructing the confidence intervals.
Convergence rate
σz1− α
The confidence interval half width ϵ = √ 2
n
, gives the measure of the Monte-Carlo
convergence rate. For a fixed α, ϵ it is directly proportional to the standard deviation
σ of h (X), meaning Monte-Carlo works better for problems with less variability.
Challenges
(i) How to generate the random samples X1 , . . . , Xn iid from the distribution of X.
83
1 n
n i∑
θ̄n := h ( Xi ) ,
=1
n
1 2
σ̄n2 := ∑ h (Xi ) − θ̄n .
n − 1 i =1
Remark 38. To address shortcomings in the sampling procedures, Markov Chains are used
hence what we call Markov Chain Monte Carlo (MCMC) that is powerful as Markov chains
ensure consistent samples are generated drawn from any given distribution. Metropolis Hast-
ings algorithms further ensures that there is faster convergence and correct the biases.
84