Numerical Analysis With Optimiz PDF
Numerical Analysis With Optimiz PDF
Volume
FOXES TEAM
Tutorial on Numerical Analysis with Optimiz.xla
Optimization,
Nonlinear Fitting
and
Equations Solving
TUTORIAL ON NUMERICAL ANALYSIS FOR OPTIMIZ.XLA
Index
OPTIMIZ.XLA .......................................................................................................................... 4
Optimiz.xla installation ............................................................................................................. 5
How to install ......................................................................................................................................5
How to uninstall ..................................................................................................................................6
Optimization............................................................................................................................... 8
Optimization "on site"............................................................................................................... 8
Optimization strategy ............................................................................................................... 9
Optimization algorithms ......................................................................................................... 10
Algorithms Implemented In This Addin.................................................................................. 11
The starting point ................................................................................................................... 12
Optimization Macros .............................................................................................................. 14
Derivatives (Gradient).......................................................................................................................14
Optimization Macros with Derivatives ...............................................................................................15
The Input Menu Box: ........................................................................................................................15
Optimization Macros without Derivatives ..........................................................................................16
Examples Of Uni-variate Functions ....................................................................................... 18
Example 1 (Smooth function) ...........................................................................................................18
Example 2 (Many local minima)........................................................................................................19
Example 3 (The saw-ramp) ..............................................................................................................20
Example 4 (Stiff function)..................................................................................................................20
Example 5 (The orbits) .....................................................................................................................22
Examples of bi-variate functions............................................................................................ 24
Example 1 (Peak and Pit) .................................................................................................................24
Example 2 (Parabolic surface)..........................................................................................................25
Example 3 (Super parabolic surface) ...............................................................................................26
Example 4 (The trap) ........................................................................................................................27
Example 5 (The eye) ........................................................................................................................30
Example 6 (Four Hill) ........................................................................................................................30
Example 7 (Rosenbrock's parabolic valley) ......................................................................................31
Example 8 (Nonlinear Regression with Absolute Sums)...................................................................33
Example 9 (The ground fault) ...........................................................................................................35
Example 10 (Brown bad scaled function) .........................................................................................35
Example 11 (Beale function).............................................................................................................36
Examples of multivariate functions ........................................................................................ 37
Example 1 (Splitting function method) ..............................................................................................37
Example 2 (The gradient condition) ..................................................................................................38
Example 3 (Production) ....................................................................................................................39
Example 4 (Paraboloid 3D)...............................................................................................................40
LP - Linear programming....................................................................................................... 41
LP - Linear programming ..................................................................................................................41
Optimization with Linear Constraints ..................................................................................... 43
NLP with linear constraints ...............................................................................................................43
2
T U T O R I A L F O R M A T R I X . X L A
Credits ...................................................................................................................................... 97
References ............................................................................................................................... 97
3
Chapter
1
About this tutorial
OPTIMIZ.XLA
OPTIMIZ for Microsoft EXCEL contains macros to perform the optimization of
multivariable functions. This add-in contains also several routines for nonlinear
regression and nonlinear equation solving, the important tasks, strictly related to the
optimization one.
The main purpose of this document is to show how to work with the Optimiz.xla add-in for
solving non-linear regression and optimization problems. Of course this speaks about
math, statistic and numeric calculus but this is not a math or a statistic book. Therefore,
you will rarely find theorems and demonstrations. You will find, on the contrary, many
examples that explain, step by step, how to reach the result that you need, straight and
easy. And, of course, we speak about Microsoft Excel but this is not a tutorial on Excel.
Tips and tricks for general applications in Excel can be found at many internet sites.
Leonardo Volpi
4
Optimiz.xla installation
The OPTIMIZ add-in for Excel 2000/XP is a zip file composed of two files:
• OPTIMIZ.XLA Excel add-in file
• OPTIMIZ.HLP Help file
How to install
Unzip and place all the above files in a directory that is accessible by Excel. The best
choice is in the Add-ins directory which is in the following sequence
Local Disk (C: or something else)
Documents & Settings
(Your name, which is a directory that comes up on startup)
Application Data
Microsoft
Addins.
When loaded/saved, the add-in is contained entirely in this directory. Your system is not
modified in any other way. If you want to uninstall this package, simply delete the
designated files - it's as simple as that!.
To install in Excel as a menu item, follow the usual procedure for installing a “*.xla” add-in
to the main menu.
1) Open Excel
2) From the Excel menu toolbar select "Tools" and then select "Add-ins".
3) If “optimization tool” does not appear in the list, Optimize.xla has not been linked in,
Select the Browse box on the right side of the list, and the above Addins directory will
appear. (If you loaded it into some other place, you will have to search for it.) Select
optimize.xla.
4) Once in the Add-ins Manager list, look in the list for “optimization tool” and select it
5) Click OK
After the first installation, OPTIMIZ.xla1 will be added to the Add-in list manager as
‘optimization tool’. Your Addin Manager list will appear differently from the one shown
below (on the left side). The lists will be different depending on which foreign language
version of Excel you are using and what other tools you are using. When Excel starts, all
add-ins checked in the Add-ins Manager will be automatically loaded. If you want to stop
the automatic loading of OPTIMIZ.xla, simply deselect the check box next to “optimization
tool” before closing Excel.
1
This tutorial has been written for users of the English version of Excel The illustrations of the
appearance of Excel when Optimize.xla is used are from the Italian version of Excel. These
illustrations were not changed, since the version used by the author and the Foxes team is the Italian
version.
5
If the installation is correct,, you should see the welcome popup of OPTIMIZ.xla. This
appears only when you select "on" the check box of the Addin Manager. When Excel
automatically loads OPTIMIZ.xla, this popup remains hidden.
How to uninstall
This package never alters your system files
If you want to uninstall this package, simply delete the file. Once you have cancelled the
OPTIMIZ.xla file, to remove the corresponding entry in the Addin Manager list, follow these
steps:
1) Open Excel
2) Select <Addins...> from the <Tools> menu.
3) Once in the Addins Manager, click on ‘optimization tool’.
4) Excel will inform you that the addin is missing and ask you if you want to remove it
from the list. Select "yes".
6
WHITE PAGE
7
Chapter
2
Optimization
Optimization
Optimization "on site"
Optimiz was developed for performing the optimization task directly on a worksheet.
This means that you can define any relationship that you want to optimize, simply by
using the standard Excel built-in functions and your equations that relate them. The
optimization macros will update directly the cells containing the parameters to be
changed and the related variables to be optimized.
Object function. For example: if you want to search for the minimum of the bi-
dimensional function
f ( x, y ) = (x − 100 ) + ( y − 100 )
51 2 35 2
,
you insert in the cell E4 the formula "=(B4-0.51)^2+(C4-0.35)^2". Here the cells B4
and C4 contain the current values of the variables x and y. By changing the values of
B4 or C4, the function value E4 is also consequently changed.
Gradient. Some optimization algorithms require the gradient of the function, which is
the derivative with respect to each independent variable. In that case you must insert
also the gradient formulas.
∂f ∂f
∇f = , = (2(x − 100
51
) , 2( y − 100
35
))
∂x ∂y
8
Constraints. Usually constrained variables have simple bounding constraints (for
example, as follows):
Doing an optimization using worksheet cells and cell equations is slower than by doing
it with VBA subroutines that use the worksheet only for the input data and output
parameters. The former method gives considerable flexibility, but is prone to errors.
The latter method is inflexible, but errors are much reduced
Optimization strategy
In numerical analysis the optimization of a function is not a trivial task and there is no
single algorithm good for all cases. Each time it is to be done, we have to study the
problem to establish an optimization strategy by choosing:
:
• The most adapt algorithm
• The most adapt starting point
The algorithm depends strongly on the characteristics of the function that we have to
optimize. The choosing of the starting point depends on the local behavior near the
"optimum" of the function itself.
9
Optimization algorithms
The best optimization algorithm, good for every case, is unknown, and this should be
obvious. However, there are several good algorithms adapted for large cases of
practical common optimization problems, from which one can be selected.
here, methods with gradient are better here, methods without gradient are better
Also, the local behavior near the optimum can favor one type of algorithm instead of
others. It happens for example, when there is a narrow extreme point near other local
extremes or, at the opposite, when the function has a large flat "valley".
Here, methods with gradients are more Here, methods without gradients are able to
efficient. arrive at a global optimum and not hang-up at
a local optimum.
10
Algorithms Implemented In This Addin
Downhill-Simplex
The Nelder–Mead downhill simplex algorithm is a popular derivative-free
optimization method. It is based on the idea of function comparisons among a
simplex of N + 1 points. Depending on the function values, the simplex is
reflected or shrunk away from the maximum point. Although there are no
theoretical results on the convergence of the algorithm, it works very well on
a wide range of practical problems. It is a good choice when a one-optimum
solution is wanted with minimum programming effort. It can also be used to
minimize functions that are not differentiable, or that we cannot differentiate.
It shows a very robust behavior and converges over a very large set of
starting points. In our experience it is the best general purpose algorithm;
solid as a rock. It's a "jack of all trades”.
Random
This is another derivative-free algorithm. It simply "shoots" a set of random
points and takes the best extreme value (max or min). Usually the accuracy
is not comparable with the other algorithms (only about 5%), and it also
requires a considerable extra effort and time. On the other hand, it's
absolutely insensitive to the presence of unwanted local extremes, and works
with smooth and discontinues functions as well. In this implementation, the
random algorithm can increase the accuracy (0.01%) by a "resizing" strategy
(under particular conditions of the objective function). On the contrary, this
algorithm is not adaptable for functions that have a large "flat" region near the
extreme, like what happens in the least squared optimization. Convergence
problems do not exist because a starting point is not necessary
Divide-Conquer 1D
For univariate functions only. It's another very robust, derivative free
algorithm. It is simply a modified version of the bisection algorithm. It can be
adapted to every function, smooth or discontinuous. It converges over a very
large segment of parameter space.
Parabolic 1D
For univariate functions only. This algorithm uses a parabolic interpolation to
find any local extreme (maximum or minimum). It is very efficient and fast
with smooth-derivable functions. The starting point is simply a segment of
parameter space bracketing the extreme (local or not) that we want to find.
The condition is that the extreme must be within the stated segment.
Conjugate Gradients
Also called CG. This is a very popular algorithm that cannot miss. It requires
a gradient evaluation at each step which can be approximated internally by
the finite difference method or supplied directly by the user as well. The exact
gradient information improves the accuracy of the final result, but in many
case these differences are not relevant to the extra effort. The starting point
should be chosen sufficiently close to the optimized one.
Davidon-Fletcher-Powell
Also know as DFP algorithm. This is a sophisticated and efficient method for
finding extremes of smooth-regular functions. It requires a gradient evaluation
at each step which can be approximated internally by the finite difference
method or supplied directly by the user as well. The exact gradient
information improves the accuracy of the final result, but in many case these
11
differences are not relevant to the extra effort. The starting point should be
chosen sufficiently close to the optimized one, even if the region is larger
than the allowable region for a CG solution.
Newton-Raphson
The most popular algorithm for solving nonlinear equations. It needs the
exact gradient for approximating the Hessian matrix. It is extremely fast and
accurate but, because of its poor global convergence performance, it is used
only for refining the final result from another algorithm.
Levenber-Marquardt
Levenberg-Marquardt is a popular alternative to the Gauss-Newton method
of finding the minimum of a function that is a sum of squares of nonlinear
functions. This algorithm was found to be an efficient, fast and robust method
which also has a good global convergence property. For these reasons, It
has been incorporated into many good commercial packages performing
non-linear regression. Finding this algorithm on public domain is not very
easy.
.
How can we choose a "good" starting point? We have to say that this is the key of any
optimization problem.
Before starting the optimization, we have to study the objective function, acquiring as
much information as possible about the function itself and, if possible, also about its
derivatives. We have to guess how the function grows or decays and where the
locations are (if any) of the "valleys" and "mountains" of the function itself.
The most solid method for this, is function plotting. For one-dimensional functions f(x)
we simply plot the function itself. Choosing an adapt scale factor and a zoom window
we are sure to bracket the location of the desired minimum and maximum.
For two-dimensional functions f(x, y) we can plot the contour-lines for several
different function values.
12
Example of contour-lines plots
We can also plot a 3D graph but it is less useful, in our opinion, then the contour-lines
method. Anyways, there are lots of good programs, also freeware that perform this
task.
An example of good 3D plots
If you like, with a little patience, you can construct a similar graph also in Excel.
Alternatively you can download the freeware workbook Random_Plot.xls from our
website that just creates automatically a 3D plot and a contours plot of two-
dimensional functions.
Example of the graph obtained with Random_Plot.xls
13
This example shows that the starting point for searching the function minimum should
be taken in the half triangular domain where the hole is located. On the contrary, it
would be inefficient to start with a point located behind the big mountain. Simply,
imagine a little ball rolling along the surface. Where from, do you think it can quickly
fall into the hole?
For functions having more than 2 dimensions, the difficulty increases sharply because
we cannot use the plot method as it is. We have to plot the graphs of several function
sections. We keep fixed a variable (for example z) to one value (for example 1) and
then plot the function f(x, y, 1) as a contour lines plot. This plot is a section (or "slice")
at z = 1 of the function f(x, y, z). Repeating for several values of z with can map the
behavior of the entire function
Optimization Macros
According to the didactic intention of this add-in, the macros are named with the
algorithm's name instead with the usual scope or action of the macros themselves
We could say that the scope is always the same: finding the optimized point of a given
function. What mainly differentiates each macro is its action field. For example the
Levenberg-Marquardt algorithm is generally the most adapt for the nonlinear least
squared fitting; but many times we can see that the Downhill-Simplex algorithm is
competitive. The Downhill-Simplex algorithm is sometimes superior, due to its robust
global convergence property. In other words: "there is no any fixed situation" and each
problem must be always studied before attempting to find the optimum.
Derivatives (Gradient)
A primary decision is the choice between algorithms using derivatives or algorithms
without derivatives. One thinks that the second choice is automatic when the derivative
is unknown or too hard to calculate. The second choice is not always a valid choice,
because the algorithms could easily approximate internally the derivatives with
sufficiently accuracy.
The reason why the primary decision about derivatives is deeper comes from the basic
nature of the function. When we adopt to algorithm that uses the gradient we should
be sure that this information is valid and will remain valid in the working domain of the
function. This happens not for all functions. Analytical, smooth, functions like
polynomials and exponentials fulfill this rule; and also rational, power and logarithmic
functions when the domain does not include singularity. Using derivatives can greatly
improve both accuracy and convergence speed. So, in general, algorithms that use
derivatives are very fast.
There are cases, on the contrary, where derivative information is not useful or will
even hinder the convergence. This happens for discontinuous functions or for
discontinuous derivatives. In these cases is better to ignore the derivative information.
Another case happens when the function has many local extremes near the optimum
one; in this case, following the derivative information, the algorithm might fall into one
of the "traps" of local extremes.
14
Optimization Macros with Derivatives
Those macros need information about:
1. The cell containing the definition (computed result) of the function to optimize
(objective function)
2. The range of the cells containing the values (variables) to be changed (max 9
variables)
3. The range containing the constraints (minimum and maximum limits) on the
variables (constraints box)
4. Optionally the range of the values of the computed gradient functions, one for
each variable.
The locations of the cells within each of these ranges must be consistent with the
specific variables, so that there is a direct, unambiguous link to all the required
information about each variable. If the arrangement on the worksheet mixes up the
relationships, then the computation may fail or be entirely wrong.
15
Maximum or Minimum Selection: The two buttons in the upper right of the menu
box switch between the minimization and maximization algorithms
Gradient: If the gradient formulas are provided (the range entered) the macro will use
them for its internal calculations. Otherwise the derivatives are approximated internally
by the finite difference central formulas.
Newton-Raphson Option: If checked, the macro will attempt to refine the final result
with 2-3 extra iterations of the Newton-Raphson algorithm. This option always requires
the gradient formulas, for evaluating the Hessian matrix with sufficient accuracy to
obtain a good optimum value. It is a numerical problem, inherent in the loss of
accuracies of the differences obtained by numerical subtractions.
Stopping Limit. In each panel there is always an input box for setting the maximum
number of iterations or the maximum number of evaluation points allowed. The macro
stops itself when this limit has been reached.
Relative Error Limit: In the "Random" macro there is also an input box for setting the
relative error limit. The other algorithms do not use the error criterion, they simply stop
when the accuracy does not increase anymore after several iterations.
The locations of the cells within each of these ranges must be consistent with the
specific variables, so that there is a direct, unambiguous link to the specific constraint.
16
If the arrangement on the worksheet mixes up the relationships, then the computation
may fail or be entirely wrong.
Maximum or Minimum Selection: The two buttons in the upper right of the menu
box switch between the minimization and maximization algorithms
Stopping Limit. In each panel there is always an input box (Points) for setting the
maximum number of iterations or the maximum number of evaluation points allowed.
The macro stops itself when this limit has been reached.
Relative Error Limit: In the "Random" macro there is also an input box for setting the
relative error limit. The other algorithms do not use the error criterion, they simply stop
when the accuracy does not increase anymore after several iterations.
17
Examples Of Uni-variate Functions
Example 1 (Smooth function)
The search for an extreme in a uni-variate smooth function is quite simple and almost
all algorithms usually work. We have only to plot the function to locate immediately the
extreme.
Assume for example, a problem of finding a local maximum and minimum of the
following function in the range 0 < x < 10
sin( x)
f ( x) =
1+ x2
The plot below shows that there are three local extreme points within the range of 0 to
10.
Interval Extreme
0<x<2 local max that is also the absolute max
2<x<6 local min that is also the absolute min
6 < x < 10 local max
In order to approximate the extreme points we can use the parabolic interpolation
macro. This algorithm converges to the extreme within each specified interval, no
matter if it is a maximum or a minimum.
A possible worksheet arrangement may be the following where the range A3:B3 is the
constrain box, the cell C3 is the variable to change, and the cell D3 contains the
function
18
a b x f(x)
0 2 1.109293102 0.599522164
2 6 4.503864793 -0.21205764
6 10 7.727383943 0.127312641
If we want to find the absolute maximum or minimum within a given interval we must
use the divide-and-conquer algorithm (a variant of the bisection algorithm)
For example assume to have to find the maximum and minimum of the following
function within the range, 0 < x < 5
f ( x) = x ⋅ e − x cos(6 ⋅ x)
19
a b x f(x)
max 0 5 1.0459 0.367
min 0 5 1.5608 -0.327
As we can see the algorithm ignores the other local extremes and converges to the
true absolute maximum. But of course this is a didactic extreme case. Generally
speaking, it is always better to isolate the desired extreme within a sufficiently close
segment before attempting to find the absolute maximum (minimum). If it's impossible
and there are many local extremes, you may increase the number of points limit from
600 (default) to 1000 or 2000.
f ( x) =| x | +4⋅ | int (x + 12 ) − x | +1
The optimization macro will converge to the point (0, 1) for the minimum and to (3.5,
6.5) for the maximum
Given the following function for x ≥ 0 , find the absolute max and min
f ( x) =
sin ( x)
1+ x2
20
One plot covers the wide range 0 < x < 100 and another covers the smaller region
0 < x < 10 where the function shows a narrow maximum. (Note that there are other
local extremes within the global interval.) The absolute minimum is located within the
interval 10 < x < 30.
For finding the maximum and minimum with the best accuracy, we can use the divide-
and-conquer algorithm (robust convergence), obtaining the following result:
a b x f(x) rel.error
0 100 18.2845204 -0.049492601 6.97E-09
0 100 0.760360454 0.6094426 2.23E-08
Note that the parabolic algorithm has some convergence difficulty in finding the
maximum near 0, if the interval is not sufficiently close to zero. On the contrary, there
is no problem for the minimum
a b x f(x) rel.error
0 3 6.869565509 0.07165213 6.11E+00
0 2 -1.668589413 #NUM! -
0.5 1.5 0.760360472 0.6094426 2.72E-10
10 30 18.28452054 -0.04949260 1.00E-09
This behavior for the ranges near zero, can be explained by observing that the
function does not have a derivative value at x = 0. The derivative is:
f ' ( x) =
(1 + x )cos( x )− 2 x sin ( x )
2 3
2 ⋅ x (1 + x )
2 3
21
Example 5 (The orbits)
The objective function also can have an indirect link to the parameter that we have to
change. This is illustrated in the following example:
Two satellites follow two plane elliptic orbits described by the following parametric
equations with respect to the earth.
x = 2 cos(t ) + 3 sin(t )
SAT1 ≡
y = 4 sin(t ) − cos(t )
x = cos(t ) + 2 sin(t )
SAT2 ≡
y = sin(t )
We want to find when the two satellites have a minimum distance from each other (In
order to transmit messages with the lowest noise possible). We want to also find the
position of each satellite at the minimum distance. (Note that, in general, this position
does not coincide with the static minimum distance between the orbits.)
This problem can be regarded as a minimization problem having one parameter (the
time "t") and one objective function (the distance "d")
d= (x1 − x2 )2 + ( y1 − y2 )2
We can solve this problem directly on a worksheet as shown in this example.
Range B4:C5: The x and y coordinates of the two satellites at a given time.
Range E4: Parameter to change (time)
Range F4: Distance (objective function to be minimized)
Range E7:F7: Constraints on the time parameter t.
First of all we note that both orbits are periodic of the same period T = π ≅ 6.28
So we can study the problem for 0 < t < 6.28
22
Note that when you change the parameter "t" , for example giving a set of sequential
values (0, 1, 2, 3, 4, 5) we get immediately the orbit coordinates in the range B4:C5
If we plot in Excel these coordinates we have the following interesting pictures that
simulate the motion.
We observe that the condition of "minimum distance" happens two times: in the
intervals (0, 3) and (3, 6).
Starting the macro "1D-divide and Conquer" with the following constrain conditions:
(tmin = 0, tmax = 3) and (tmin = 3, tmax = 6.28), returns the following values.
Can we use also the Downhill-Simplex algorithm? Of course yes. Because the Simplex
uses the starting point information, we can use it for finding the nearest minimum.
Starting the macro "Downhill-Simplex" with the starting points: (t = 0) gives the same
values shown in the first line of the above table. Starting with (t = 3), gives the same
values shown in the second line.
Improving accuracy
The optimum values of the parameter "t" was calculated with a good accuracy of about
1E-8. If we want to improve the accuracy we may try to use the parabolic algorithm.
However with the parabolic algorithm on this application, we have to pay attention to
bracketing the minimum in a narrow interval about the desired minimum point. For
example we can use the segment 0 < t < 0.3 for the first minimum and 3 < t < 3.5 for
the second one.
23
Constraints
t min t max time error
0 0.3 0.231823805 7.074E-12
3 3.5 3.373416458 1.934E-12
The 3D plot of is shown on page 14 above under "The starting point" of the previous
chapter. In the plot we clearly observe the presence of a maximum and a minimum in
the domain
−2.5 < x < 2.5 , −2.5 < y < 2.5
The maximum is located in the area { x, y | x>0 , y>0 } and the minimum is located in
the area { x, y | x<0 , y<0 }. The point (0, 0) is at the middle of the maximum and
minimum points so we can use it as starting point for both searches.
24
Example 2 (Parabolic surface)
Assume the minimum of the following function is to be found:
f ( x, y ) = 2 x 2 + 4 xy + 8 y 2 − 12 x − 36 y + 48
We see that the minimum is located in the region 0 < x < 2, 0 < y < 4.
Because the gradient is simple, we can also insert the derivative formulas
∇f = (4 x + 4 y − 12 , 4 x + 16 y − 36 )
Repeating the minimum search we find the point (1, 2) with the following accuracy
As we can see, for smooth functions like polynomials, the exact derivatives are useful,
since with the Newton-Raphson (NR) final step, the global accuracy of the solution can
be improved.
25
Example 3 (Super parabolic surface)
This example shows another case in which the optimization algorithms that do not
require external derivative equations are sometimes superior to those that require
external derivative equations.
( )
4 y 3 − 6 y 2 + 3 y − 12 = 0 ⇒ 4 y 3 − 32 y 2 + 34 y − 18 = 0 ⇒ 4( y − 12 ) = 0
3
4( y − 12 ) = 0 ⇒ y = 12 = 0.5
3
Note that the last root of the second gradient has a multiplicity of 3. This means that it
is a root also for the 2nd derivative df/dy.
At the point (0.3, 0.5) the determinant of H is zero. This condition reduces the
efficiency of those algorithms that use the derivative information like the Newton or
quasi-Newton methods.
Surprisingly the methods that are derivative-free like Random and Simplex, show the
best results.
26
This happens because they are not affected by the cancellation of the 2nd derivatives
On the contrary the other method, also with NR refinement step, cannot reduce the
error less then 1E-3. Note that the bigger error happens over the y variable.
This is not strange because, as we have demonstrated, the y variable annihilates its
2nd derivatives at the point y = 0.5.
f ( x, y ) = 1 − 0.5 ⋅ e − (x ) − e −10(x )
2
+ y2 2
+ y 2 −2 x−2 y +2
The contours-plot shows the presence of two extreme points: one in the center (0, 0),
called A, and another one in a more narrow region near the point (1, 1), called B
We try to find the minimum with all the methods, starting from the point (0, 1), in the
domains of -2 < x < 2 and -2 < y < 2.
27
Downhill-Simplex 8.38E-08 2.04E-08 - 2 sec
As we can see, all algorithms, except one, fail to converge at the true minimum. They
all fall into the false central minimum. Only the random algorithm has escaped from the
"trap", giving the true minimum with a good accuracy (1E-5). Random algorithms are in
general, suitable for finding a narrow global optimum where there are surrounding
local optimums..
Convergence region
It's reasonable that for the other algorithms there will be some starting points, from
which the algorithm will converge to the true minimum B. There will be other starting
points that the algorithm will end up at the false minimum (0,0). The set of "good"
starting points constitutes the convergence region.
The results are shown as 2D graphs with the following regions color coded as
indicated.
Legend
True minimum
False minimum
Convergence OK
Convergence failed
28
Downhill-Simplex
The random algorithm, of course has a convergence region coincident with the black
square
As we can see, from the point of view of convergence, the most robust algorithm is the
Random, followed by the Downhill-Simplex and then by the CG and DFP
Mixed Method
But of course we could use a "mix of algorithms" to reach the best results
For example if we start with the random method, we can find a sufficiently accurate
starting point for the DFP algorithm. Following this mixed method, we can find the
optimum with a very high accuracy (2E-9), no matter what the starting point was.
29
Example 5 (The eye)
Derivative discontinuity in general can give problems to those algorithms using
gradient information. But this not always true.
The contour-plot takes on an "eye" pattern for the individual contours. The plot shows
that the minimum is clearly the point (2, 1). Note from the 3D plot that the gradient in
the minimum is not continuous.
Let's see how the algorithms works, in the domain box 0 < x < 4 and 0 < y < 2 , starting
from the point (0, 0)
Algorithm x y error
Simplex 2 1 2.34E-13
CG 2.007159289 1 3.58E-03
DFP 2 1 0.00E+00
Random 1.99990227 0.999999965 4.89E-05
A good result. Only the CG algorithm has had some difficulty, but the other algorithms
have worked fine.
Both variables appear only with even powers. So the function is symmetric to both x
and y axes. This means that if the function has a maximum in the 1st region
{x, y | x>0 , y>0 }, it will have also three other maximum extremes in all other regions.
The optimization macros cannot give in one pass, all four maximum points (within the
designated region) so one of them is chosen randomly. To avoid this little indecision
we must give the initial starting point nearer one of these points or, resizing the
convergence region.
No too clear? Never mind. Let's see the following plot in the symmetric region
−2 < x < 2 , −2 < y < 2
30
Contours plot 3D plot
It's clear that the function has four symmetric maximums in every region of the
selected interval. We can restrict our study to the 1st region 0 < x < 2 , 0 < y < 2.
In this region, starting from a point like (2, 2) all algorithms work fine in reaching the
true maximum extreme (1, 1) with good accuracy.
Algorithm x y error
Simplex 0.999999993 1.000000045 2.59E-08
CG 1.000000005 1.000000005 5.27E-09
DFP 1.000000005 1.000000005 5.27E-09
Random 1.000020331 0.999983443 1.84E-05
More accurate values can be obtained only with the aid of the gradient and the
Newton-Raphson extra-step.
4 x(1 − x 2 ) 4 y (1 − y 2 )
∇f = ,
(x 4 + y 4 − 2 x 2 − 2 y 2 + 3) (x 4 + y 4 − 2 x 2 − 2 y 2 + 3)
2 2
f ( x, y ) = m ⋅ ( y − x 2 ) + (1 − x )
2 2
The parameter "m" changes the level of difficulty. A high m value means high difficulty
in searching for a minimum. The reason is that the minimum is located in a large flat
region with a very low slope. The following plots are obtained for m = 10
31
The function is always positive except at the point (1, 1) where it's 0. Taking the
gradient it's simple to demonstrate this.
4m ⋅ x 3 + 2 x(1 − 2m ⋅ y ) − 2 = 0
∇f = 0 ⇒
(
2m y − x 2 = 0 )
From the second equation, we get:
( )
2m y − x 2 = 0 ⇒ y = x2
So the only extreme is at the point (1, 1), which is the absolute minimum of the
function.
Starting from the point (0, 0) we obtain the following results
Note that some algorithms may reach the limit in the number of iterations in this
example.
If we repeat the test with m = 100, we have the following result:
We note a general loss of accuracy, because all algorithms seem to have difficulty in
locating the exact minimum. They seem to get "stuck in the mud" of the valley. Also
the random algorithm seems to have a greater difficulty in finding the minimum. The
32
reason is that, when the random algorithm samples a quasi-flat area, all points have
similar heights so it has difficulty in discovering where the true minimum is located.
The only exception is the Downhill-Simplex algorithm. Its path, rolling into the valley, is
both fast and accurate. Why? I have to admit that we cannot explain it... but it works!
f ( x, a , k ) = a ⋅ e − k ⋅ x
The goal of the regression is to find the best couple of parameters values (a, k) that
minimize the sum of the absolute value of the difference between the regression
model and the given data set.
AS = ∑ | yi − f ( xi , a, k ) |
The objective function AS depends only on the parameters a and k. By minimizing AS,
with our optimization algorithms, we hope to solve the regression problem.
33
We hope that by changing parameters "a" and "k" in the cells E3 and F3, the objective
function in G3 goes to its minimum value. Note that the objective cell G3, being the
sum of the range D3:D13, depends indirectly on the cells E3 and F3.
Start the Downhill-Simplex and insert the appropriate range as shown in the input box.
Starting from the point (1, 0) you will see the cells changing quickly until the macro
stops itself, leaving the following "best" fitting parameter values of the regression y*
a k
1 -2
34
Example 9 (The ground fault)
Assume the minimum of the following function is to be found:
1 1
f ( x, y ) = 2 − −
1 + ( x − y − 1) 1 + ( x + y − 3) 2
2
The contours and 3D plots of this function are shown in the following graphs
contours-plot
3D plot
Both plots indicate clearly a narrow minimum near the point (2, 1). Nevertheless this
function may create some difficulty because the narrow minimum is hidden at the
cross of two long valleys (like a ground fault).
In order to increase the difficulty, choose a large domain box:
-10 < x < 10 and -10 < y < 10
and (0, 0) as starting point.
The results are in the following table
(
f ( x, y ) = x − 10 6 ) + (y − 10 ) + (xy − 2)
2 −6 2 2
This function is always positive and it is zero only at the point (106 , 2 10−6 ). At this
point, the abscissa is very high and the ordinate is very low. It is hard to generate good
plots of this function. We also have no idea where the extremes are located. This
situation, is not very common indeed, but if this happens, the only thing that we can do
is to run the Downhill-Simplex algorithm trusting in its intrinsically robustness.
35
Algorithm x y rel. error
Simplex 1000000 2.00E-06 1.61E-13
Fortunately, in this case, the algorithm converges quickly to the exact minimun with a
very high accuracy
It is always positive, being a sum of three square terms. So the minimum, if exists,
must be positive or 0.
Algorithm x y error
Simplex 3 0.5 3.4E-13
Random 3.000097478 0.5000228382 1.56E-04
CG 3 0.5000000001 1.45E-10
DFP 3 0.5 9.7E-14
Note that the convergence is highly influenced by the starting point. We can verify it
simply by starting the CG algorithm from the point (0, 0). The result, after two steps,
will be
36
Algorithm x y error iteration
CG (1st step) 2.933979062 0.482383744 0.083637 2020
CG (2nd step) 2.999966486 0.499991634 4.19E-05 810
As we can see, the final accuracy is a thousand times less then the previous one.
Clearly the time spent for choosing a suitable starting point is useful (This is in general
true, when it's possible).
Assume the maximum and the minimum of the following function is to be found:
3 1
f ( x, y, z ) = x 2 + 4 y 2 + 2 | z | 2 + x⋅ | y | 2 + x + z
First of all, we observe that the function has no maximum; so it could have only the
minimum.
This function can be split into two new functions. One of 2 variables (g(x,y)) and the
other of 1 variable h(z).
( 1
)( 3
f ( x, y, z ) = g ( x, y) + h( z ) = x 2 + 4 y 2 x⋅ | y | 2 + x + 2 | z | 2 + z )
We can plot and study each sub-function separately
From the first contours-plot we deduce that the minimum is located in the region of −2
< x < 0 and −1 < y < 1. From the second plot we have the region −0.2 < z < 0. Now
we have a constraints box for searching for the minimum of f(x,y,z).
37
Let's begin the search with the aid of the random macro. The approximate values
obtained by this algorithm will be used for the starting point of all other macros
For clarity we have rounded (by it’s not necessary) the values obtained by the random
macro give the following approximate starting point
x y z
-0.6 0.02 -0.1
The final result is:
f ( x, y , z ) = ( x − z ) + y ( y − x ) + z 2
2
So we have to study the minimum (if any). The gradient condition is:
2x − y − 2z 2 x − y − 2 z = 0
∇f = 2 y − x ⇒ ∇f = 0 ⇒ 2 y − x = 0
4z − 2x 4 z − 2 x = 0
We see that the only point for the minimum is (0, 0, 0). Starting from any point around
the origin, every algorithm will converge to the origin.
38
An example of a variation of this function
We have seen that this function has no upper limit. This is true if the variables are
unconstrained. But surely the maximum exists if the variables are limited by a specific
range. Assume now that each variable must be limited in the range [−2, 2].
We can restart the macro "random" searching for the max in the given box or we can
also use the CG macro starting from any internal point like for example (1, 1, 1)
Here are the results
Algorithm f x y z
Random 30.378 -2.057 2.148 2.073
CG 28 -2 2 2
So there must be another maximum point at the symmetrical point (2, −2, −2). To test
for it, simply restart the CG macro, this time choosing the starting point (2, −1, −1). It
will converge exactly to the second maximum point.
Example 3 (Production)
This example shows how to tune the production of several products to maximize profit.
The function model here is the Cobb-Douglas production function for three products.
0.1 0.2 0.3
p = x1 ⋅ x2 ⋅ x3
Where x1, x2, x3 are the quantities of each product (input) and "p" an arbitrary unit-less
measure of value of the ouput products.
The production cost function can be expressed as
c = c1 x1 + c2 x2 + c3 x3 + c0
where c1, c2, c3 are the production costs of each item and c0 is a fixed cost. The total
profit (our objective function) can then be expressed as g = s·p - c , where s converts
the Cobb-Douglas production function "p" value to the same units of cost.
Now let's find the best solution in the Excel worksheet given the following constant
values
c1 c2 c3 c0 s
0.3 0.1 0.2 2 2
with the constraints x i > 0 , and with the following maximum limits:
39
A possible arrangement could be
For a starting point we can use the middle point of each range (5, 25, 25)
We then try several algorithms, obtaining the following results
Finding the extremes is easy by solving the following linear system of gradients.
fx 4 x + 6 y − 2 z − 6 = 0
∇f = 0 ⇒ f y = 0 ⇒ 6 x + 20 y + 4 z − 14 = 0
f − 2 x + 4 y + 10 z − 2 = 0
z
2
the CG algorithm was restarted 2 consecutive times
40
LP - Linear programming
A Linear program represent a problem in which we have to find the optima value
(maximum or minimum) of a linear function of certain variables (objective function)
subject to linear constraints on them.
LP - Linear programming
This macro solves a linear programming problem by the Simplex algorithm
Its input parameters are:
• The coefficients vector of the linear objective function to optimize
• The coefficients matrix of the linear constraints
F = x1 + x2 + 3x3 − 0.5 x4
x1 + 2 x3 ≤ 10
2 x − 7 x ≤ 0
1 4
x 2 − x3 + 2 x 4 ≥ 0.5
x1 + x 2 + x3 + x 4 = 9
and with
x1 ≥ 0 , x 2 ≥ 0 , x3 ≥ 0 , x 4 ≥ 0
The range B2:E2 contains the coefficients of the linear objective function
The range B4:G7 contains the coefficients matrix of the linear constraints
Note: the symbols "<" and "<=" or ">" and ">=" are numerically equivalent for this
macro.
Now select the objective function range B2:E2 and start the macro Linear
Programming from the menu Optimiz... > Optimization
41
The solution found, in the range B9:E9 is
x1 = 0 , x2 = 3.375 , x3 = 4.725 , x4 = 0.95
The macro returns "inf" if the feasible region is unbounded; returns "?" if the feasible region
is bounded but no solution exist. Observe that this macro does not work on site, therefore it
is very fast and can solve more large problems.
Example. Find the solution of the following LP problem
max{x1 − x2 + 2 x3 − x4 + 4 x5 }
The matrix constraints is
x1 x2 x3 x4 x5
3.011 1.539 3.235 1.852 3.442
Note that the value of the function, calculated in the cell L9, is about f ≅ 19.9
42
Optimization with Linear Constraints
The Linear Programming seen in the previous chapter is the mostly commonly applied
form of constrained optimization. Constrained optimization is much difficult then
unconstrained optimization: we have to find the best point of the function respecting all
the constraints that may be equalities or inequalities. The solution (the optimum point),
in fact, may not occur at the top of a peak or at the bottom of a valley.
The main elements of any constrained optimization problem are: the objective
function, the variables, the constraints and sometime the variable bounds.
When the objective function is not linear (example a quadratic function) and the
constraints are linear we have a so called NLP with linear constraints.
The following graph shows the contour lines (blue) of the function F(x, y) and the linear
constraints (red).
Now let's see how to compute numerically the constrained optimum point
The following worksheet shows a possible arrangement
The cell D2 contains the function definition =B2^2+2*C2^2-2*B2-8*C2+9
43
The range B4:C5 is the constraints box and the range B7:E8 contains the two linear
constraints.
Note: symbols "<=" and "<" are equivalent for this macro
Select the cell D2 and start the macro Linear Constraints from the menu Optimiz... >
Optimization
Because this macro works "on site", the solution appears directly in the variables cells
B2:C2.
x = 0.727272727272713, y = 1.90909090909093
Compare with the exact solution (8/11, 21/11)
Other settings
Maximum or Minimum Selection: The two buttons in the upper right of the menu
box switch between the minimization and maximization algorithms
Stopping Limit. In each panel there is always an input box for setting the maximum
number of iterations or the maximum number of evaluation points allowed. The macro
stops itself when this limit has been reached.
Relative Error Limit: input box for setting the relative error limit.
44
1
F ( x, y ) =
x + y − xy − x − y + 1
2 2
The following graph shows the contour lines (blue) of the function F(x, y) , the linear
constraints (red) and the box constraints (green).
Arrange a worksheet as the following inserting the function definition in the cell D3
If you select the "max" option the algorithm will find the point (1, 1) while if you select the
"min" option the algorithm will approach to the point (3, 2)
Observe that if you have selected the "Rnd" option, the starting point (0, 0) will be ignored
by the macro. On the contrary if you deselect it, you must provide an adapt initial point.
In this case we will see that the point (0,0) may be good for the finding the maximum point
but with (0, 0) the algorithm fails to reach the minimum point.
For the minimum searching we should start, for example, from the point (2, 1)
45
Chapter
3
Nonlinear Regression
Nonlinear Regression
Nonlinear regression is a general fitting procedure that will estimate any kind of
relationship between a dependent (or response variable), and a set of independent
variables. In this document we focus our attention on unvariate relationships (one
response variable "y", one independent variable "x")
y = f ( x, p1 , p2 ... pn )
Where the parameters p1, p2, ...pn are the unknowns to be determined for the best fit.
When we investigate the relationship between two variables, we have some steps to
follow:
1) Sampling. We take experimental observations of the system in order to get a
dataset of n samples (xi, yi) , i =1, ...n. The dimension n varies from few points
to thousand of samples.
2) Modelling. At the second step we have to choose a function that should best
explain the response variable of the system.
a. This task is dependent on the theoretical aspects of the problem,
prior information on the source of the data, what the resulting
function will be used for, your imagination or your experience.
b. It would be useful to first plot the points of the dataset in a scatter x-
y graph. By a simple inspection we can "smell" which model could
be a fit.
c. We have to recognize that the data set has errors of measurement,
and that we should not over-fit the model to fit these errors. A
knowledge of statistics is important here. We have to recognize that
the data is a sample, and that there are sampling errors involved.
d. Actually, we should performs several trials before finally choosing
the best model.
3) Prediction. We try to estimate a set of parameter (p1, p2, ...pn)(0) that should
approximate the given experimental dataset. These parameter may have
some theoretical basis, other than just “fitting” parameters.
4) Starting the fitting process. We try at first with some set of reasonable
parameter values as a starting point. We should try several starting points to
see if there is a dependency on the results due to different starting points.
This is common in scientific problems involving complex functions where the
surfaces may have many local minimums, at unknown parameter
combinations.
46
5) Error measurement. Now, by using the fitted model function, we calculate
the response values yi* at the same point xi of the sampling. Of course the
predicted values will not exactly match the yi values obtained from the
sampling, and the differences are the residuals (yi* − yi ). We can take the
sum of the square of the residuals RSS = ∑(yi* − yi )2 as a measure of the
distance between the experimental data and our model. In other words the
RSS measures the goodness of our fit. Low RSS means a more accurate
regression fit and vice versa. The error measurement function is also called a
loss function.
6) Correction. The initial set of parameter values is changed in order to reduce
the RSS function. This is the heart of the non-linear regression process. This
task is usually performed with minimization algorithms. We could use any
algorithm that we like, but by experience we have observed that some
algorithms work better then others. Because the function model is known, and
then we also know its derivatives, we could choose an algorithm that exploits
the derivative information in order to gain more accuracy.
(a) There is one very efficient algorithm, so called the quasi-Newton,
which approximates the second-order derivatives of the loss function to
guide the search for the minimum.
(b) The Levenberg-Marquardt is a popular alternative to the Gauss-
Newton method of finding the minimum of a function that is a sum of
squares of nonlinear functions The Optimiz.xla non-linear fitting process
uses just this algorithm for minimizing the residuals squared sum
The set of parameters found at step 5 can now be used for repeating step 3 and, if the
process is convergent, after some iterations we'll get the "best" set of parameters (p1,
p2, ...pn)(0) . That is the set of parameters best fitting the given dataset, in the sense of
the least squared residuals criteria..
47
Nonlinear Regression for general functions
Levenberg-Marquardt macro
Optmiz.xla has a macro for performing least squares fitting of nonlinear functions
directly on the worksheet with the Levenberg-Marquardt algorithm3. It uses the
derivatives information (if available) or approximates them internally by the finite
difference method. It needs also the function definition cell (=f(x, p1, p2,...), the
parameter starting values (p1, p2,...), and of course the dataset (xi, yi).
The Levenberg-Marquardt method uses a search direction that is a cross between the
Gauss-Newton direction and the steepest descent. It is an heuristic method that works
extremely well in practice. It has become a virtual standard for optimization of medium
sized nonlinear models.4
Layout. In order to automatically fill in the input box, the macro assumes a typical
layout: first the x-column, then the y-column at the right, and then the function column.
But this is not obligatory at all. You can arrange the sheet as you like
Example: Assume that you have the following data set (xi,yi) of 7 observations and
you have to perform a least squares fit with the following exponential model having 2
parameters (a, k).
y* = a ⋅ ek x
3
The Levenberg-Marquardt subroutine used in Optimiz.xla was developed by Luis Isaac Ramos Garcia.
Thanks to its kindly contribution we have the chance to publish this nice, efficient -and rare - VB routine in
the public domain.
4
The implementation details of how this works are reviewed in Numerical Recipes in C, 2nd Edition,
pages 683-685.
48
This model has the following derivatives
∂y * ∂y *
= e k ⋅x , = a e k⋅ x x
∂a ∂k
a= 9.76526
k= -1.98165
And the plot of the y and y* (regression function) looks like the following graph
5 5
4 4
3 3
2 2
1 1
0 0
0 1 2 3 4 0 1 2 3 4
Observe that the LM algorithm converges also from starting condition quite distance to
the optimum. This is a didactical example to show the robustness if this algorithm.
Usually the intial conditions should be set better than the provious one.
49
Re-try without derivatives. The LM will converge to the same optimal solution. From
our experimentation, we have observed that derivatives usually increase the final
accuracy of 1-2 order.
The dataset is in the range A2:C12. The cells G2, G3, G4 contains respectively the
parameters a, b, c. Each cells of the range C2:C12 contains the model definition. For
example the cell D2 contains the formula: =1/($G$2*A2^2+$G$3*A2*B2+$G$4*B2).
Choose a compatible starting point - for example a = 1, b = 1, c =1, select the range
A2:C12 and start the macro. If you have followed the above disposition all the input
box will be correctly filled. Mark the check box "ESD" and "RSD" in order to calculate
also the standard deviation of the estimates and the regression. Press "Run"
The Standard Deviations of the Estimates are always placed adjacent to their
parameters. The Residual Standard Error is always output just two rows below the last
parameter. So be sure that these cells are empty before starting the macro with the
options ESD and RSD active.
50
Nonlinear Regression with a predefined model
This set of macros comes in handy when we have to perform a nonlinear regression
using a predefined model. They are much faster that the general nonlinear regression
macro and, in addition, you do not have to build the formula model and its derivatives.
Rational a0 a0 + a1 x a0 + a1 x + a2 x 2
.we can set the degree , ,
for numerator and b0 + x b0 + b1 x + x 2 b0 + b1 x + b2 x 2 + x 3
denominator (max
degree: 4)
Exponential a1 ⋅ e k1 x , a0 + a1 ⋅ e k1 x , a1 ⋅ e k1 x + a2 ⋅ e k 2 x
(from 2 to 6
parameters.) a0 + a1 ⋅ e k1 x + a2 ⋅ e k 2 x , a1 ⋅ e k1 x + a2 ⋅ e k 2 x + a3 ⋅ e k3 x
Power a
a ⋅ xk ,
xk
Logarithmic a ⋅ log( x) + b
Gauss x−µ
−
2
σ
a⋅e
Logistic ab
a + (b − a ) e k x
Input/Output information
All those macros need in input only the data (xi, yi) to
fit that must be an array of N rows and 2 columns
Example
51
The data to fit is the range A2:B8
The output parameter cell is E2. Because the selected model has 3 parameters, the
area automatically filled in will be the range E2:F4
The output regression range is C2:C9
y = b1*(1-exp[-b2*x])
Data: y x
--------------------------
10.07E0 77.6E0 NIST-STRD
14.73E0 114.9E0 Certified Values
17.94E0 141.1E0 b1 = 2.3894212918E+02
23.93E0 190.8E0 b2 = 5.5015643181E-04
29.61E0 239.9E0
35.18E0 289.0E0 Residual Sum of Squares:
40.02E0 332.8E0 1.2455138894E-01
44.82E0 378.4E0
50.76E0 434.8E0
55.05E0 477.3E0
61.01E0 536.8E0
66.40E0 593.1E0
75.47E0 689.1E0
81.78E0 760.0E0
In the range A2:A15, we insert the x-data. In the next right column B2:B15 the y-data
is inserted and in the next column C2:C15, we insert the regression formula,
= b1*(1-EXP(−b2*x)) in each cell. The consecutive columns of data x, y and function
model y* make up the layout standard for the macro.
The parameters b1, b2 are inserted in the cells F5, and F6. For the macro this would
be sufficient, but we have inserted other useful functions; (a) calculating the residual
squared sum =SUMXMY; (b) computing the relative errors between the certified
values and the approximated values =ABS((F6-F10)/F10) , =ABS((F5-F9)/F9); and (c)
the Log Relative Error (LRE) = −Log(G5) and = -Log(G6)
52
The starting point is given in the NIST Misr1a file: b1 = 200, b2 = 0.002.
Many times the non-linear regression fails to converge, because of a starting point too
far from the optimum parameter values.
Before starting the macro select the range of data xy (or, alternatively, it is sufficient to
select the first cell A2). In this way, the macro will automatically recognize the
complete range. If the macro cannot find the right input ranges, the input boxes will be
empty and you must indicate the correct ranges. In this case, re-input the data range
as the complete full range.
The calculation will take place under your eyes because the macro works on site. Step
after step the parameters converge to the optimum while the residual squared sum
becomes smaller. When this values does not decrease any more, the macro stops
itself, leaving the final "optimum" parameters b1 and b2 in F5 and F6.
The macro stops itself if something goes wrong; in that case a short message will
advise you that the final result could not be right.
After a few seconds the macro will end leaving the following situation
53
90
y
80
70 y*
60
50
40
30
20
10
0
0 100 200 300 400 500 600 700 800
As we can see the parameters found are different from the certified values by less
then 1E-9. The algorithm has caught 8 exact digits out of 10 total digits! Clearly this
accuracy is superfluous for a normal fitting, but it is a good example that shows what
the Levenberg-Marquardt algorithm can do.
We have to point out that sometimes the process cannot give such accurate results.
However, by restarting the process with new starting values corresponding to the last
set of calculated parameter values, we can see if the values will change to achieve a
better fit Remember that your guide for judging a result is always the magnitude of the
residuals squared sum (orange cell).
Try for example a start with the following parameters b1 = 100, b2= 0.01 (this starting
situation is worse that the previous one as you can see from the plot) In this case the
macro found new parameters. Even if they seem accurate, these parameter are less
accurate then from the previous pass.
Using this parameters as a new starting point, start again the macro.
Certified values
238.94212918
0.0005501564318
What happens? The macro has reached the best possible accuracy catching all 10
digits exactly. Clearly this a lucky case but it summarizes a strategy that we should
adopt in searching for the best fitting model. Try and.. re-try
54
e −b1x
y=
b2 + b3 x
The dataset is too long to report here, but you can download it from the NIST StRD
site. A summary of the NIST certified results are:
y = exp(-b1*x)/(b2+b3*x)
b1 = 1.6657666537E-01
b2 = 5.1653291286E-03
b3 = 1.2150007096E-02
Now let's to begin with the regression. Select the cell A2 and start the macro
"Levenberg-Marquardt". If you have made the same set-up, you will see the entire
input box fill with the correct ranges. Select "Run" and after a few seconds the macro
ends leaving the following results
This is a very accurate result, being that the average relative error is less then 1E-5
and the residual error is close to the expected one.
55
Can we improve the results? Let' see. We restart the algorithm from these values and
get a new set of parameter values:
As we can see the extra iterations have taken the global relative error to the level of
about 1E-9, which is about the minimum that we can get from this data set with the
given model.
The result will not converge to the right solution, even if the final fit is not so bad. This
is a typical behaviour of the exponential class models
56
Example 3 (using derivatives)
This example explains how to use the derivatives with the Levenberg-Marquardt
algorithm.
This algorithm needs one derivative for each parameter of the model. For example if
the model has three parameters, we must provides the information for three partial
derivatives:
Model Derivatives
∂y ∂y ∂y
y = f ( x, p1 , p2 , p3 ) , ,
∂p1 ∂p2 ∂p3
In this example we performs the regression for the following function model
y = arctan(a1 x 2 ) − arctan(a2 x 2 )
∂y x2 ∂y x2
= 2 4 , =− 2 4
∂a1 a1 x + 1 ∂a2 a2 x + 1
The dataset to fit and its plot is shown below. We have also plot the function (pink-line)
having the starting parameters a1 = 0.1 and a2 = 0
x y 0.4
0 0
0.35
0.1 0.00499970835
0.2 0.01998135315 0.3
0.3 0.04478851234
0.25
0.4 0.07882527647
0.5 0.12062366858 0.2
0.6 0.16746264235
0.7 0.21534838043 0.15
0.8 0.25961024656 0.1
0.9 0.29599955237
1 0.32175055440 0.05
1.1 0.33604846902
0
1.2 0.33978560977
0 0.2 0.4 0.6 0.8 1 1.2 1.4
1.3 0.33490615246
In our worksheet, also at the usually information, we have to insert the derivatives
functions. A possible arrangement could be the following. The formulas of function and
its derivatives must be filled into all cells below the first cell (by dragging)
57
Now we are ready to begin the
0.4
regression.
Let's elect the cell A2 and start 0.35
the macro. All input boxes will arctan(x 2)-arctan(x 2 /2)
0.3
be filled except the derivatives
one. Insert the range D2:E15 0.25
0.2
0.15
0.1
0.05
After few iterations the 0
parameters will converge to the 0 0.2 0.4 0.6 0.8 1 1.2 1.4
optimum values:
a1 = 1 and a2 = 0.5
58
Example 4 (Rational class)
The dataset. MGH09.dat from the NIST/ITL StRD (1981) data sets, belongs to the
rational equation class of non-linear Least Squares regression. The problem of fitting
parameter values was found to be difficult for some very good algorithms. This dataset
contains 11 observations, 1 response , 1 predictor and 4 parameters (b1 to b4) to be
determined. The equation to be fitted is:
b1 (x 2 + b2 x )
y=
x 2 + b3 x + b4
A summary of the certified NIST results and the data are:
y = b1*(x^2+x*b2) / (x^2+x*b3+b4)
b1 = 1.9280693458E-01
b2 = 1.9128232873E-01
b3 = 1.2305650693E-01
b4 = 1.3606233068E-01
Data: y x
1.957000E-01 4.000000E+00
1.947000E-01 2.000000E+00
1.735000E-01 1.000000E+00
1.600000E-01 5.000000E-01
8.440000E-02 2.500000E-01
6.270000E-02 1.670000E-01
4.560000E-02 1.250000E-01
3.420000E-02 1.000000E-01
3.230000E-02 8.330000E-02
2.350000E-02 7.140000E-02
2.460000E-02 6.250000E-02
59
The algorithm begins to converge. Restarting the macro and continuing for two or
three times we finally reach an average accuracy of about 1E-9 for the following
parameter values:
The final plot having these parameters is shown in the graph below
0.25
0.2
0.15
0.1
0.05
0
0 1 2 3 4 5
The data set to fit is in the range A2:D9. Insert the model in each row of the range
E2:E9. For example, the cell E2 contains the following function definition:
=$A$12*A2^2+$B$12*B2^2+$C$12*C2^2-1
Insert the starting point in the range A12:C12, for example (1, 1, 1) that is a sphere of
unitary radius. Select A12:C12 and give "Run". The macro will converge to the
solution (2, 5, 1)
2 x2 + 5 y2 + z 2 = 1
60
Gaussian regression
Gaussian regression is a symmetrical exponential model useful for many applications.
x −b 2
−
y = a ⋅e c
a = amplitude
b = axis of symmetry
c = deviation or spread
The axis of symmetry "b" is the abscissa where the function has its maximum
amplitude of "a". The deviation "c" is the length where the function is at 37 % of its
maximum value.
Usually the data are affected by several errors that "mask" the original Gaussian
distribution. In that case we can use the regression method to measure how the row
data fit the Gaussian model and to evaluate its parameters.
61
Select the range of data
(x, y) or simply select the
first cell A6 of x data and
start the macro
>Gaussian< of the
menu
>Non-linear
regression<
1.2
a= 1.00263
1
b = 100.0006
c = 0.69677
0.8
If we plot the
Gaussian model with
0.6
those parameters
over the above
scattering plot we can 0.4
observe a good fit.
0.2
0
97.5 98 98.5 99 99.5 100 100.5 101 101.5 102 102.5
62
Rational regression
Rational formulas could be used to approximate a wide variety of functions. But the
mainly profit happens when we want to interpolate a function near a "pole". Large,
sharply oscillations of the system responce could be followed better using a rational
model instead other models like polynomials or exponentials.
A rational model is a fraction of two polynomials. The max degree of one polynomial
gives the degree of the rational model. Usually, in modelling of real stable systems, the
denominator degree is always greater then the numerator.
The following models have respectively 1, 2 and 3 degree.
a0 a0 + a1 x a0 + a1 x + a2 x 2
y= y= y=
b0 + x b0 + b1 x + x 2 b0 + b1 x + b2 x 2 + x 3
The macro Rational from the menu NL- Regression allows to set separately the
numerator and denominator degree.
Exanple
0.4
So, the best fitting rational regression
model is 0.2
2
0
2 + x2 0 1 2 3 4 5
63
When using the rational model
The rational model is more complicate then polynomial model. For example a 3 degree
rational model has 6 parameters, while the polynomial model of the same degree has
4 parameters.
Far from the "pole", the rational model takes no advantage over the polynomial model.
Therefore they should be used only when it is truly necessary.
The scatter plot of the dataset can helps us in choosing the adapt model. A typical plot
that increase or decrease sharply often detects the presence of a "pole" and a 1st
degree rational model could be sufficient; on the other hand, if the plot shows a narrow
"peak" probably the better choice would be a 2nd degree rational model.
8 1.2
7
1
6
5 0.8
4 0.6
3
0.4
2
1 0.2
0 0
0 0.5 1 1.5 2 2.5 3 0.5 0.7 0.9 1.1 1.3 1.5
Usually also the inquiry of the system characteristic from witch the samples are given
could be help us to choose a suitable rational model degree.
Example. The following data was derived from a frequency response of a one-pole
system. Find the best fitting model and estimate the pole.
x y 12
0.5 0.4877814 y = 5.975x 4 - 31.287x 3 + 57.945x 2 - 43.833x + 11.813
0.6 0.5978195
10
0.7 0.4539886 −1
0.8 0.4725799 y=
8
x − 2 .6
0.9 0.5267415
1 0.53705
1.1 0.7518795 6
1.2 0.6314435
1.3 0.8182817
4
1.4 0.804856
1.5 0.8759378
1.6 1.0459817 2
1.7 1.1395459
1.8 1.1503947 0
1.9 1.4471687 0 0.5 1 1.5 2 2.5 3
2 1.752384
2.1 2.0726757 The given data set are interpolated with a 1st degree rational function.
2.2 2.4904918 The fitting appears good. The pole is the root of the denominator:
2.3 3.3956681
x −2.6 = 0 ⇒ x = 2.6
2.4 5.0324014
2.5 10.056427 Compare the accuracy and the simplicity of this rational model respect
to the one obtained by fitting with a 4th degree polynomial (dotted line)
64
Frequency response of 2nd degree systems often shows a "peak" due to a resonance
condition.
In this situation rational regression is the only accurate way to fit the data. Compare
the regression below obtained with a 2nd degree model (pink line) with the polynomial
regression of 6th degree (dotted line)
The superiority is evident
1.2
q
1 y=
x − 2x +1 + q
2
0.8
q = 0.005
0.6
0.4
0.2
0
0.4 0.6 0.8 1 1.2 1.4 1.6
-0.2
Rational regression gives good results also when same poles is located inside the
dataset range. In the following example the dataset is sampled from the function
x +1
f ( x) =
x2 − 2
There is a poles near x = 1.414 but the regression fits very well the same.
0
0 1 2 3 4 5
-2
-4
-6
-8
65
Exponential regression
Exponential relations are very common in the real world. Usually they appear together
with oscillating circular function sine, cosine. In this chapter we investigate the
regression of a simple exponential and sum of exponentials.
(
log( yi ) = log A ⋅ e k ⋅ xi ) ( )
⇒ log( yi ) = log( A) + log e k ⋅ xi ⇒ log( yi ) = log( A) + k ⋅ xi
So we obtain the linear function
zi =b 0 +b1 ⋅ xi
Performing the linear regression for this model we get the parameters (b0 , b1 ) and
finally the original parameter of the nonlinear function (A , k) by these simple formulas
A = e b 0 , k =b1
This method is quite popular but we have to put in evidence that this method could fail.
In fact this is not a true "least squares nonlinear" regression, but a sort of quick
method to obtain an approximation of the true "least squares nonlinear" regression.
Sometime the parameters obtained by the linearization method are sufficiently close to
those of the NL-LS (Non-Linear Least Squares) method; but sometime not and
sometime could gives values completely different. So a good technique, always valid
to check the result, is to calculate the residuals of the regression. If the least squares
of the residual are too high the linearized regression must be rejected.
Sometime, the parameters obtained by the linearized method could be used as
starting point for the optimization algorithms performing the true NL-LS regression, like
the Levenberg-Marqaudt macro of the addin Optimiz.xla
Now let' see this example of exponential regression.
66
the parameter obtained
are
k1 = −1.1945
a1= 6.9527
y = a1e k1x
Now perform the regression of the same data with a true NL-exponential regression.
Insert the data (x,y) in range A4:B10 and the parameter a1 , k1 in the range F4:F5.
Assume the starting point the value given by the above regression k1 = -1.19
a1 = 6.95. Select the range A4:B10 and start the macro "Exponential" of the menu
"NL Regression"
y = a1e k1x
After a while the macro finds the two new optima values a1 = 9.75 , k1 = -1.83 that
are quite different form those obtained by the linearized method
67
The NL-LS exponential
regression y* (pink-line)
fits the data set much
better than the linearized
exponential regression y^
(dotted line).
The difference between linearized and true NL-LS regression exists for all models:
logarithm, exponential, power, etc) but the difference may be so evident only for
exponential functions. For other models the difference is always very low.
There is another reason to dedicated many attention to this important but tricky
regression
68
Offset
Curiously the presence of the simple offset vanish completely the linearization method.
When the offset increase respect to the amplitude parameter, the linearized regression
becomes much more inaccurate. The only way in this case is the NL-LS regression
method with the model
y = a0 + a1 ⋅ e k ⋅x
Let's see the following example where the data set are generated from the function 10
e-x adding the offset 30 and a bit of random noise
34
33.5 -x
y = 30+10 e
33
32.5
32
31.5
31
30.5
30
29.5 -0.0213x
y = 33.043e
29
0 1 2 3 4 5 6
69
Multi-exponentials model
The effort for finding a good fitting with a model having two or more exponentials
grows sharply. the key of the success is a good starting parameter set.
Let's try with the following general two exponential model having 5 parameters
y = a0 + a1 ⋅ e k1 ⋅ x + a2 ⋅ e k 2 ⋅ x
The scatter plot of the data set (xi,yi) is showed in the dotted line (blue) in the following
graph. A reasonable starting point may be taken with the following observation:
With this (rather drastic) assumptions we take the following starting parameters
a0 = 0.4, a1 = 1 , k1 = -1, a2 = -1.4, k2 = -10
which the relative regression function is plotted in the above graph (pink continue line).
Start the macro exponential regression, fill correctly the input-box and set the 5th
parameters model. We get the final parameters of the exponential regression.
1.4
1.2
0.8
0.6
0.4
0.2
0
0 1 2 3 4
70
Good fitting is good regression ?
It is the object of the regression to condense and summarize the behaviour of a
system by a set of measurements. Usually this is done by a mathematical function-
also called "model" - that depends on adjustable parameters. The number of
parameters depends by the model complexity; examples of parameters are: growth-
rate, concentration, time-decay, pollution, frequency response, etc.
When the model parameters are close to the "true" unknown parameters of the
system, the experimental data are much close to those extrapolate from the model.
This good fitting represents a good agreement between the data and the model. It is
also reasonable to think that a good fitting also represent a good estimation of the
unknown parameters. It seems reasonable but unfortunately this is not always true,
and it happens overall in exponential model. In other words we may have a good fitting
without having a good regression model.
This means that we could use the model for predicting the values but not for
investigating the internal parameter of the system.
a0 a1 k1 a2 k2
3 -4 -0.2 10 -1
1.9 -33 -0.66 40 -0.77
0.22 0.25 0.33 8.5 -1.1
At the first sight they are completely different: not only in the exponential amplitudes (
a1 changes from −33 to 0.25 , and a2 from 8.5 to 40) but also in the constant decay
factors (k1 changes from −4 to 0.33 becoming positive)
On the other hand, if we take good the 1st model, we would believe in a good stability
being the damping factor about -1 and -4.
But where is the true? As we can see, the consideration about the original system, that
is the main goal of the regression, in that case, could be completely wrong.
71
Damped cosine regression
That is a very common behaviour of a 2nd order real system. The responce oscillates
around a final value with amplitude decreasing with the time.
y = a 0 + a1 e k t cos(ω t + θ )
This model has 5 parameters
2.5
The parameters found are:
2 offset 1
amp. 1
1.5 damp. -1
puls. 6.283
1 phase 0
0.5
0
0 0.5 1 1.5 2 2.5 3
72
Power regression
The simple model of this regression is
y = a ⋅ xk
with x ≥ 0 and k >0, a >0.
Case k > 1 12
The power fitting is very closed to y = 0.2992x2 + 0.5118x
the polynomial fitting when the 10
exponent is positive and greater y = 0.7x 1.65
than 1 ( k > 1) 8
In the right example we can see the
6
polynomial fitting (dotted-line)
y = 0.3 x2 + 0.5 x 4
The difficult of the polynomial regression is located near the origin where the derivative
grows sharply, becaming infinite for x = 0. Any polynomials, having no singular point,
cannot follow the curve near this point. If we would have a dataset far from the singular
point x = 0 , also polynomial regression would be better.
73
Case k < 0 1.6
When the exponent k is negative
the power fitting should be used. y = 0.7 x - 0.33
Also a logarithmic regression 1.2
could be used.
In the right example we can see
the logarithmic fitting (dotted-line) 0.8
y = 0.7 x −0.33 0
0 0.5 1 1.5 2 2.5 3
Logarithmic regression
Strictly related to the power fitting is the logarithmic regression.
y = a log( x) + b
Where "a" and "b" are parameters to determine.
We performs this kind of regression when we have dataset sampled over a wide
interval range
For example the following dataset cames from the armonic analysis of a system. The
vibration amplitude, was measured at 10 different frequencies, from 0.1 KHz to about
2000 KHz
We usually plot this dataset with the help of a half-logarithm chart, and thus, it is
reasonable to assume also a logarithm regression
8
x (KHz) y (dB) 6 y = -1.5 Log(x)+2
0.1 5.334 4
0.3 3.791 2
0.9 2.233 0
2.7 -0.286 -2
8.1 -1.088
-4
24.3 -1.868
-6
72.9 -4.821
-8
218.7 -5.127
-10
656.1 -8.074
1968.3 -10.027 -12
0.1 1 10 100 1000 10000
74
NIST Certification Test
The Levenberg-Marquardt macro was recently tested completely with the non-linear NIST
StRD dataset.
In this test we have used the approximate derivative. For same dataset we have restarted
the macro 2 times. When the macro fails the convergence from the 1st starting point, we
have started the macro using the 2nd starting point provided by NIST.
The test result - the minimum LRE for all regressors - is reported in the following graph,
compared with the the Solver of Excel 97
MGH09
MGH10
Roszmzn1
Eckere4
BoxBOD
Misrala
Gauss1
Gauss2
DanWood
Misralb
Hahn1
Nelson
Gauss3
Misrald
Rat42
Rat43
Thurber
Misralc
Lanczos3
Kirby2
Lanczos1
Lanczos2
Bennett5
Chwirut2
Chwirut1
75
Chapter
4
Nonlinear Equations Systems
Nonlinear Equations Systems
Nonlinear Equations Systems
Optimiz has a set of macros called rootfinding algorithms concerning the numerical
solution of systems of Non-Linear Equations (NLEs).
f1 ( x1 , x2 ...xn ) = 0
f ( x , x ...x ) = 0
2 1 2 n
⇔ F(x) = 0
....
f n ( x1 , x2 ...xn ) = 0
If we are interested in pursuing this solution strategy, we can adopt all the minimization
algorithms contained in this addin but be aware that it is as likely to fail as often as it
works, for the following reason. While every root of the system F(x) = 0 is a global
minimum of the function φ , there also exist local minima which are not roots of the
system. A minimization algorithm is as likely to converge to a local minimum as to a
global one, so an abundance these local minima may render such an algorithm
practically useless for finding the roots of F(x) .
In addition there is another good reason that suggest to use a dedicated rootfinding
algorithm for solving the system F(x) = 0; its solution is generally more accurate than
the solution obtained by minimization algorithms. On the other hand the minimization
algorithms usually shows better convergence performance, therefore, often, the two
strategies are used together
76
NL Equation macros
The rootfinding algorithms contained in this addin are:
Newton-Raphson
This algorithm is the prototype for nearly all NLE solvers. The Working on site.
method works quite well. In the general case, the method
converges rapidly (quadratically) towards a solution. The
drawbacks of the method are that the Jacobian is expensive to
calculate, and there is no guarantee that a root will ever be
found unless your starting value is close to the root. This
version adopts a relaxation strategy to improve the
convergence stability
77
NLE - Newton-Raphson
Solves a system of nonlinear equations using the Newton-Raphson relaxed method
Example assume the root of the following system is to be found
x − ( y / 840) 6 − 2 ⋅10 −5 ⋅ y = 0
x + 1/ y = 0
In order to locate the root in the plane we use the contour method. The contour plots of
the two equations are show in the following graph
If you have followed this procedure the fields will be already filled correctly and you
have only to press "run". After few iterations the solution will appear in the cells B3 and
B4.
Other setting are:
Iteration Limit. In the panel there is always an input box for setting the maximum
number of iterations allowed. The macro stops itself when this limit has been reached.
78
Residual Error: The input box sets the error limit of the residual error defined as: max{
|fi(x)| }.
Relax: Switches on /off the relaxation strategy. If disabled the simple traditional
Newton-Raphson algorithm is used. If enabled the macro exhibits a better global
convergence behaviour. In any case this parameter does not affect the final accuracy.
Trace: Switches on /off the trace of the root trajectory. If selected, the macro opens an
auxiliary input box requiring the cell where the output will begin
5 x 2 − 6 xy + 5 y 2 − 1 = 0
−x
2 − cos(π ⋅ y ) = 0
x y
1 root 0.299376925061096 -0.198049648152025
2 root 0.552094992004132 0.261097771471505
79
NLE - Broyden
Solves a system of nonlinear equations using the Broyden method
Example assume the root of the following system is to be found
x 4 + 2 y 4 − 16 = 0
2
x + y 2 − 4 = 0
The contour plots of the two equations are show in the following graph
Select the range C5:D5 containing the system to solve and start the macro Broyden
from the menu Optimiz... > NL Equation
If you have followed this procedure the fields will be already filled correctly and you
have only to press "run". After few iterations the solution will appear in the cells A5 and
B5.
Other settings
80
Iteration Limit. In the panel there is always an input box for setting the maximum
number of iterations allowed. The macro stops itself when this limit has been reached.
Residual Error: The input box sets the error limit of the residual error defined as: max{
|fi(x)| }.
Trace: Switches on /off the trace of the root trajectory. If selected, the macro opens an
auxiliary input box requiring the cell where the output will begin
x1.2 + y 0.5 + xy − 1 = 0
−2 x
e + y − 1 = 0
x y
1 root 0.283157364898325 0.432386602268468
81
NLE - Brown
Solves a system of nonlinear equations using the Brown method
Example assume the root of the following system is to be found
x 4 + 3 y 2 − 8 x + 2 y − 33 = 0
3
x + 3 y 2 − 8 xy − 12 x 2 + x + 59 = 0
The contour plots of the two equations are show in the following graph
Select the range E3:E4 containing the system to solve and start the macro Brown
from the menu Optimiz... > NL Equation
If you have followed this procedure the fields will be already filled correctly and you
have only to press "run". After few iterations the solution will appear in the cells B3 and
B4.
Repeating with the starting points (x0,y0) = (3, 0), (x0,y0) = (-2, 0.1), (x0,y0) = (-2, -2)
we get all the system solutions
82
Other settings
Iteration Limit. In the panel there is always an input box for setting the maximum
number of iterations allowed. The macro stops itself when this limit has been reached.
Residual Error: The input box sets the error limit of the residual error defined as: max{
|fi(x)| }.
Trace: Switches on /off the trace of the root trajectory. If selected, the macro opens an
auxiliary input box requiring the cell where the output will begin
2 x + cos( y ) − 2 = 0
cos( x) + 2 y − 1 = 0
83
NLE - Global rootfinder
This macro attempts to finds all roots of a nonlinear system in a given space range
using the random searching method + the Newton- Maehly formula for zeros
suppression.
This macro works inside a specific box range and does not need any starting point.
It is quite time expensive and, like other rootfinder algorithms, there is no guarantee
that the process succeeds. If the macro takes too long try to reduce the searching
area.
Example assume the root of the following system is to be found
x 2 − y 2 + xy − 1 = 0
4
x + 2 y 2 + xy − 16 = 0
The contour plots of the two equations are show in the following graph
There are 4 intersections
located in the box range
−4≤ x ≤ 4 ; −4≤ y ≤ 4
We can arrange the problem in a worksheet. In order to speed up the input filling , this
macro assumes the following simple schema:
84
If you have followed this procedure the fields will be already filled correctly and you
have only to press "run" . The macro begins to search for the roots and lists each root
found in the cell starting from B8. Because the algorithm proceeds randomly, the roots
may appear in any order.
The macro stops when it cannot find anymore roots.
Other settings
Trials. sets the maximum number of random trials allowed for the global searching.
For example if the number is 100 (default) then the algorithm samples a random
starting point inside the given range at the most for 100 times. This means that, at the
most, the macro could find 100 different roots (of course if any trial succeeds)
Iteration Limit. sets the maximum number of iterations allowed for each root. When
the algorithm overcomes this limit the trial fails and another starting point is sampled.
Fails: sets the limit for the stopping criteria. The number 10 (default) means that the
macro stops itself when it fails consecutively more than 10% of the total trials.
Data Output
The macro output the roots and also other interesting data related to each root
In the above example we see that the the first tow roots were found consecutively at
the 1st and 2nd trial, the the algorithm fails the 3rd trial and found the 3rd root at the
4th trial using 24 iterations. After that it fails the trials 5, 6 and 7. Finally it found the
last 4th roots at the 8th trial using no more than 7 iterations.
Single equation
Of course this macro can be used for finding all the root of a single equation
Let's see.
Find all the roots (if any) of the following equation
x
cos(π ⋅ x) + =0
4
Plotting the equation we note that the roots are located in the range − 5 ≤ x ≤ 5
85
The number of the roots should be 8, at the least
We can arrange the problem in the following way
Solving problem "on site", thus directly on the worksheet cells, is more slow then
solving by VBA program but, on the other hand, is more flexible because we can use
practically all the Excel and user functions that we have.
Let's see the following equation
Example. Find all the roots of the following equation with the Bessel functions of 1st
and 2nd kind
J1 (4 x)Y0 ( x) − J 0 ( x)Y1 (4 x) = 0
By the graph we observe that the roots are infinite; the first 10 roots are located in the
range 0 < x < 10 . Note also that the equation is not defined for x ≤ 0
0.4
0.3
0.2
0.1
0
-0.1
-0.2
-0.3
-0.4
0 1 2 3 4 5 6 7 8 9 10
86
We can arrange the problem in the following way
The function
=BESSEL.J(4*B5;1)*BESSEL.Y(B5;0)-BESSEL.J(B5;0)*BESSEL.Y(4*B5;1)
is inserted in the cell E5. Now select this cell and start the macro Global
Set a reasonable number of trial (for example 200) and give "start"
In a few seconds the algorithm has found all the ten roots. Note that the most difficult
root to find has been the root 8 = 0.3934562. The algorithm has made 9 trials
(24-15 = 9) before finding it. It usually happens when a root is near to same singular
point (in that case is x = 0).
More variables
Solving equations with 3 or more variables may be
very difficult because generally we cannot use the
graph method. As we have seen it is necessary to
locate the space region where the roots are with
sufficiently precision. Or, at the least, have an idea
of the limits where the variables can move. This is
necessary for setting the constraints box.
The variable bounding can be often discover by
examining the equations of the system itself.
Let's try to find the variable range. For this, we explicit the variable x from the 1st
equation, y from the 2nd and z from the last equation.
We have:
87
x = (1.9 − cos y − cos z ) / 2
y = (1.8 − cos x − cos z ) / 2
z = (1.7 − cos x − cos y ) / 2
Because the function cosine is bounded between -1 and 1 the lower and upper bounds
for each variable will be
x = −0.05 x = 1.95 − 1 ≤ x ≤ 3
y = −0.1 y = 1.9 ⇒ − 1 ≤ y ≤ 3
z = −0.15 z = 1.85 − 1 ≤ z ≤ 3
Of course the bounding could be more tight, for example -0.2 ≤ x ≤ 2. But, from our
experimentation, the global searching algorithm works better if the constraints box is a
bit larger than the one strictly necessary.
We can arrange the problem in the following way. Insert in the cells:
cell F2 = 2*B2+COS(B3)+COS(B4)-1.9
cell F3 = COS(B2)+2*B3+COS(B4)-1.8
cell F4 = COS(B2)+COS(B3)+2*B4-1.7
The variables x, y, z are the cell B2, B3, B4; the constraints box is inserted in the
range C2:D4. Select the range F2:F4 and start the macro Global
Variables Solution
x -0.0423690502771721
y -0.0941340004413468
z -0.147337615888545
Try to restart several times the macro by the "start" buttom. The algorithm will output
the same solution.
The same root will also confirmed by other rootfinder algorithms (Newton or Broyden,
for example) starting from the point (0, 0, 0)
e − x − xy + y 3 = 0
2 2
3 y + 2 z − 4 = 0
2 2
x − 2 x + 2 z + xz − 6 = 0
Let's try to find the variable range. We observe that the equations are "weakly"
coupled. That is each equation ties only two variables and, thus, defines an implicit
curves that we can plot in order to estimate the limit range of the variables.
For convenience we plot the zero contour of the 2nd and 3rd equation
88
3y2 + 2z 2 − 4 = 0 x 2 − 2 x + 2 z 2 + xz − 6 = 0
-2 ≤ y ≤ 2 , -2 ≤ z ≤ 2 -2 ≤ x ≤ 4 , -3 ≤ z ≤ 2
Form the above graphs we can choose the lower limit -2 and the upper limit 5 for each
variable. This is a sort of "bracketing" of the system roots.
We can arrange the problem in the following way. Insert in the cells:
cell F2 = EXP(-B2)-B2*B3+B3^3
cell F3 = 3*B3^2+2*B4^2-4
cell F4 = B2^2-2*B2+2*B4^2+B2*B4-6
The variables x, y, z are the cell B2, B3, B4; the constraints box is in the range C2:D4.
Select the range F2:F4 and start the macro Global
After a few seconds the macro will outputs the following list:
Repeating the process we confirm the result. There are 2 roots in the given box that
are all the possible solution of the given system
89
Univariate Rootfinding macro
This macro solves a single nonlinear equation. The user can choose the method
among a miscellanea of the most popular rootfinding algorithms.
x 0.5 − sin( x) − 1 = 0
Select the cell D3 containing the equation to solve and start the macro 1D-Zerofinder
misc. from the menu Optimiz... > NL Equation
In this example, we have obtained the numerical solution by the Pegasus algorithm in
about 8 iterations with an accuracy better then 1E-15. But we may also try several
other algorithms.
Other settings
Iteration Limit. In the panel there is always an input box for setting the maximum
number of iterations allowed. The macro stops itself when this limit has been reached.
Residual Error: The input box sets the error limit of the residual error defined as: max{
|fi(x)| }.
Trace: Switches on /off the trace of the root trajectory. If selected, the macro opens an
auxiliary input box requiring the cell where the output will begin. The first column
contains the root value; the second column, the residual error | F(x) |.
Algorithm: By this combo-box the user can choose the rootfinding algorithm5 that he
like among the following list
5
For further details about these methods see "Nonlinear Equations - Iterative Methods" L. Volpi, 2006,
Foxes Team
90
1 "Bisection" Bisection method
2 "Pegasus" Pegasus (Dowell-Jarratt)
3 "Brent" Brent hybrid method (Wijngaardern- Dekker-Brent)
4 "Secant" Secant method
5 "Halley" Halley method
6 "Halley FD" Halley method with finite differences
7 "Secant-back" Secant back-step method
8 "Star E21" Star method (Traub E21)
9 "Parabola" Parabola method
10 "Parabola inv." Inverse parabola method
11 "Fraction" Fraction interpolation method
12 "Newton" Newton-Raphson method
13 "Regula falsi" False position method
14 "Chebychev FD" Chebychev-Householder method with finite differences
15 "Muller" Muller method
16 "Rheinboldt 2" Rheinboldt hybrid method
17 "Steffenson" Steffenson method
Algorithms 5, 12, 17 takes the starting value from the cell "x". The other algorithms
start using the points "xmin" and "xmax" of the constraints range.
They can be useful for studying and comparing the behavior of several algorithms
Example: compare the error trajectories of the Secant, Newton and Halley algorithms
applied to the equation
e −4 x + e − ( x +3) − x 6 = 0
91
2D - Zero Contour
This macro solves the bivariate equation in a plane region x-y
f ( x, y ) = 0
As know the solutions form a curve called "zero-contour" or "zero-path" of the function
f(x, y). Plotting these curves in a scatter graph we have many useful information about
invertibility, intersections with other curves, etc.
f ( x, y ) = (10 x 2 + 1) ⋅ y 2 − 1
Because we have no idea where the function has its zeros, we begin with a large
searching region and we restrict successively the area in order to have a good
positioning of the plot.
Here we choose the rectangular range:
− 2 ≤ x ≤ 2 ; −1 ≤ y ≤ 1
Arrange the worksheet like the following and insert the function
=(10*B4^2 + 1)^2 *C4^2 - 1 in the cell D4.
Then select the cell D4 and start the macro 2D-Zero Path from the menu Optimiz... >
NL Equation
If you have followed the schema, all the input box will be correctly filled and you have
to click "Start" for beginning the path finder. We can also stop the process if it takes
too long.
In this case the macro has found a path of by about 200 points in about 7 sec.
Observe that each point is an high accurate numerical solution of the equation
f ( x, y ) = 0
92
Plotting the data of the range B9:C223 in a scatter x-y graph we get the folloging
image
0.5
-0.5
-1
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2
Note the there are two different trees, simmetrical respect to the x-axis (note also that
the range B9:C223 is composed by two separated dataset: the first one in B9:C115
and the second one in B116:C223. They are just the two paths of the zero-contour
Other settings
Trials. sets the maximum number of random trials allowed. For example if the number
is 16 (default) then the algorithm samples a random starting point inside the given
range for 16 times. This means that the macro could find at the most 16 different trees
of the zero contour.
Points: The input box sets the maximum number of the points allowed for all the
contour, Increase this value if the contour is very long or has many trees.
Step: Sets the space between two consecutive points of the path. Reduce the step
only for very detailed paths but remember that the elaboration time will increase
sharply. A simple row rule is taking about 1% of the maximum rectangular dimension.
93
Other curves
| x | 2 / 3 + | y | 2 / 3 −1 = 0 x 3 + y 3 − 3xy = 0
1 2
1.5
0.5 1
0.5
0 0
-0.5
-0.5 -1
-1.5
-1 -2
-1 -0.5 0 0.5 1 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
x 3 + xy 2 − x 2 + y 2 = 0 x 4 − 6 x 2 y − 16 x 2 + 25 y 2 = 0
1 5
0.5
0 2
1
-0.5
-1 -1
-1 -0.5 0 0.5 1 -5 -4 -3 -2 -1 0 1 2 3 4 5
94
2D Intersection
This useful macro extrapolates the intersection between two contours
f ( x, y ) = 0 , g ( x, y ) = 0
given as two sets of consecutive points (see the macro 2D - Zero Contour)
First of all, we get the zero-contour of each equation by the macro 2D zero contour.
The data set of each path are in the range B10:C167 and D10:E104 respectively.
This schema is not obligatory but help in filling the input box of the macro
Select the first cell B10 and start the macro 2D Intersection from the menu Optimiz...
> NL Equation
If you have followed the above schema, all the input box will be correctly filled and you
have only to click "Run"
95
The macro has found 4 intersection
2
points
xi yi
1
-0.73892 -0.30292
-0.81216 0.56474
0.73898 0.30312
0.81233 -0.56521 0
-2 -1 0 1 2
96
Credits
Same of the VB routines contained in this addin was developed with the contribution of
the following authors who kindly gave us the permission to release them in the free
public domain. Many thanks for this great contributions.
Many thanks also to D. A. Heiser for his kind contribution in testing, debugging and
documentation revision
References
Software
Sub LMNoLinearFit "Rutina para calcular los valores de ajuste por minimos cuadrados a un
modelo Fun(x) mediante el algoritmo de Lebenberg-Marquart", Oct. 2004, by Luis Isaac Ramos
Garcia
Sub NMSimplex " Rutina para buscar el minimo de una funcion segun el algoritmo de Nelder
Mead" Oct. 2004, by Luis Isaac Ramos Garcia
Sub SolveLS "Solving Linear System with scaled pivot" Oct. 2004, by Luis Isaac Ramos
Garcia and Foxes Team
Documents
"Process Modeling", The National Institute of Standards and Technology (NIST) website for
Statistical Reference Datasets, (http://www.itl.nist.gov/div898/handbook/pmd/pmd)
97
"Metodos numericos con Matlab"; J. M Mathewss et al.; Prentice Hall
"Numerical Methods that usually work", F. S. Acton, The Mathematica Association of America,
1990
"An Introduction to the Conjugate Gradient Method Without the Agonizing Pain",
Jonathan Richard Shewchuk, Edition 114, August 4, 1994, School of Computer Science,
Carnegie Mellon University, Pittsburgh
"Optimization for Engineering Systems", Ralph W. Pike, 2001, Louisiana State University
(http://www.mpri.lsu.edu/bookindex)
“Advanced Excel for scientific data analysis", Robert de Levie, 2004, Oxford University Press
"Microsoft Excel 2000 and 2003 Faults, Problems, Workarounds and Fixes", David A. Heiser,
web site http://www.daheiser.info/excel/frontpage.html
98
WHITE PAGE
99
Analytical Index
A L
algorithm; 11 Levenber-Marquaurdt; 12
C N
CG; 11 Newton-Raphson; 12
Conjugate Gradients; 11 Nonlinear regression; 47
Constraints; 9
O
D
Object function; 8
Davidon-Fletcher-Powell; 11
DFP; 11 P
Divide-Conquer; 11
Downhill-Simplex; 11 Parabolic; 11
R
G
Gradient; 8 Random; 11
100
2006, by Foxes Team
ITALY
[email protected]
2. Edition
May 2006
101