0% found this document useful (0 votes)
23 views

MATLAB Function Reference

This document provides information about the MATLAB function optimset, which is used to create or edit optimization option structures. optimset allows setting parameters like display level, maximum function evaluations, and termination tolerances. It can also display all available parameters and defaults. The document includes examples of using optimset.

Uploaded by

junzi2000
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

MATLAB Function Reference

This document provides information about the MATLAB function optimset, which is used to create or edit optimization option structures. optimset allows setting parameters like display level, maximum function evaluations, and termination tolerances. It can also display all available parameters and defaults. The document includes examples of using optimset.

Uploaded by

junzi2000
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

MATLAB Function Reference

optimset
Create or edit optimization options parameter structure

Syntax
• options = optimset('param1',value1,'param2',value2,...)
• optimset
• options = optimset
• options = optimset(optimfun)
• options = optimset(oldopts,'param1',value1,...)
• options = optimset(oldopts,newopts)

Description

options = optimset('param1',value1,'param2',value2,...) creates an


optimization options structure called options, in which the specified parameters
(param) have specified values. Any unspecified parameters are set to [] (parameters
with value [] indicate to use the default value for that parameter when options is
passed to the optimization function). It is sufficient to type only enough leading
characters to define the parameter name uniquely. Case is ignored for parameter
names.

optimset with no input or output arguments displays a complete list of parameters


with their valid values.

options = optimset (with no input arguments) creates an options structure


options where all fields are set to [].

options = optimset(optimfun) creates an options structure options with all


parameter names and default values relevant to the optimization function optimfun.

options = optimset(oldopts,'param1',value1,...) creates a copy of


oldopts, modifying the specified parameters with the specified values.

options = optimset(oldopts,newopts) combines an existing options structure


oldopts with a new options structure newopts. Any parameters in newopts with
nonempty values overwrite the corresponding old parameters in oldopts.

Parameters

Optimization parameters used by MATLAB functions and Optimization Toolbox


functions:

Parameter Value Description


Display 'off' | 'iter' | Level of display. 'off' displays no output;
'final' | 'iter' displays output at each iteration;
'notify' 'final' displays just the final output; 'notify'
dislays output only if the function does not
converge.
MaxFunEvals positive integer Maximum number of function evaluations
allowed.
MaxIter positive integer Maximum number of iterations allowed.
TolFun positive scalar Termination tolerance on the function value.
TolX positive scalar Termination tolerance on .

Optimization parameters used by Optimization Toolbox functions

Property Value Description


DerivativeCheck 'on' | {'off'} Compare user-supplied analytic
derivatives (gradients or Jacobian)
to finite differencing derivatives.
Diagnostics 'on' | {'off'} Print diagnostic information about
the function to be minimized or
solved.
DiffMaxChange positive scalar | {1e-1} Maximum change in variables for
finite difference derivatives.
DiffMinChange positive scalar | {1e-8} Minimum change in variables for
finite difference derivatives.
GoalsExactAchieve positive scalar integer | Number of goals to achieve exactly
{0} (do not over- or underachieve).
GradConstr 'on' | {'off'} Gradients for nonlinear constraints
defined by the user.
GradObj 'on' | {'off'} Gradient(s) for objective
function(s) defined by the user.
Hessian 'on' | {'off'} Hessian for the objective function
defined by the user.
HessMult function | {[]} Hessian multiply function defined
by the user.
HessPattern sparse matrix |{sparse Sparsity pattern of the Hessian for
matrix of all ones} finite differencing. The size of the
matrix is n-by-n, where n is the
number of elements in x0, the
starting point.
HessUpdate {'bfgs'} | 'dfp' | Quasi-Newton updating scheme.
'gillmurray' |
'steepdesc'
Jacobian 'on' | {'off'} Jacobian for the objective function
defined by the user.
JacobMult function | {[]} Jacobian multiply function defined
by the user.
JacobPattern sparse matrix |{sparse Sparsity pattern of the Jacobian for
matrix of all ones} finite differencing. The size of the
matrix is m-by-n, where m is the
number of values in the first
argument returned by the user-
specified function fun, and n is the
number of elements in x0, the
starting point.
LargeScale {'on'} | 'off' Use large-scale algorithm if
possible. Exception: default for
fsolve is 'off'.
LevenbergMarquardt 'on' | {'off'} Chooses Levenberg-Marquardt
over Gauss-Newton algorithm.
LineSearchType 'cubicpoly' | Line search algorithm choice.
{'quadcubic'}
MaxPCGIter positive integer Maximum number of PCG
iterations allowed. The default is
the greater of 1 and floor(n/2))
where n is the number of elements
in x0, the starting point.
MeritFunction 'singleobj' | Use goal attainment/minimax merit
{'multiobj'} function (multiobjective) vs.
fmincon (single objective).
MinAbsMax positive scalar integer | Number of to minimize the
{0} worst case absolute values
PrecondBandWidth positive integer | {0} | Upper bandwidth of preconditioner
Inf for PCG.
TolCon positive scalar Termination tolerance on the
constraint violation.
TolPCG positive scalar | {0.1} Termination tolerance on the PCG
iteration.
TypicalX vector of all ones Typical x values. The length of the
vector is equal to the number of
elements in x0, the starting point.

Examples

This statement creates an optimization options structure called options in which the
Display parameter is set to 'iter' and the TolFun parameter is set to 1e-8.
• options = optimset('Display','iter','TolFun',1e-8)

This statement makes a copy of the options structure called options, changing the
value of the TolX parameter and storing new values in optnew.

• optnew = optimset(options,'TolX',1e-4);

This statement returns an optimization options structure that contains all the
parameter names and default values relevant to the function fminbnd.

• optimset('fminbnd')

Optimization Toolbox

fminunc
Find a minimum of an unconstrained multivariable function

where x is a vector and f(x) is a function that returns a scalar.

Syntax
• x = fminunc(fun,x0)
• x = fminunc(fun,x0,options)
• x = fminunc(fun,x0,options,P1,P2,...)
• [x,fval] = fminunc(...)
• [x,fval,exitflag] = fminunc(...)
• [x,fval,exitflag,output] = fminunc(...)
• [x,fval,exitflag,output,grad] = fminunc(...)
• [x,fval,exitflag,output,grad,hessian] = fminunc(...)

Description

fminunc finds a minimum of a scalar function of several variables, starting at an


initial estimate. This is generally referred to as unconstrained nonlinear optimization.

x = fminunc(fun,x0) starts at the point x0 and finds a local minimum x of the


function described in fun. x0 can be a scalar, vector, or matrix.
x = fminunc(fun,x0,options) minimizes with the optimization parameters
specified in the structure options. Use optimset to set these parameters.

x = fminunc(fun,x0,options,P1,P2,...) passes the problem-dependent


parameters P1, P2, etc., directly to the function fun. Pass an empty matrix for
options to use the default values for options.

[x,fval] = fminunc(...) returns in fval the value of the objective function fun
at the solution x.

[x,fval,exitflag] = fminunc(...) returns a value exitflag that describes the


exit condition.

[x,fval,exitflag,output] = fminunc(...) returns a structure output that


contains information about the optimization.

[x,fval,exitflag,output,grad] = fminunc(...) returns in grad the value of


the gradient of fun at the solution x.

[x,fval,exitflag,output,grad,hessian] = fminunc(...) returns in hessian


the value of the Hessian of the objective function fun at the solution x.

Input Arguments

Function Arguments contains general descriptions of arguments passed in to fminunc.


This section provides function-specific details for fun and options:

fun The function to be minimized. fun is a function that accepts a vector x


and returns a scalar f, the objective function evaluated at x. The function
fun can be specified as a function handle.
• x = fminunc(@myfun,x0)

where myfun is a MATLAB function such as


• function f = myfun(x)
• f = ... % Compute function value at x

fun can also be an inline object.


• x = fminunc(inline('norm(x)^2'),x0);

If the gradient of fun can also be computed and the GradObj parameter is
'on', as set by
• options = optimset('GradObj','on')

then the function fun must return, in the second output argument, the
gradient value g, a vector, at x. Note that by checking the value of
nargout the function can avoid computing g when fun is called with only
one output argument (in the case where the optimization algorithm only
needs the value of f but not g).
• function [f,g] = myfun(x)
• f = ... % Compute the function value at x
• if nargout > 1 % fun called with 2 output
arguments
• g = ... % Compute the gradient evaluated
at x
• end

The gradient is the partial derivatives of f at the point x. That is, the
ith component of g is the partial derivative of f with respect to the ith
component of x.
If the Hessian matrix can also be computed and the Hessian parameter is
'on', i.e., options = optimset('Hessian','on'), then the function
fun must return the Hessian value H, a symmetric matrix, at x in a third
output argument. Note that by checking the value of nargout we can
avoid computing H when fun is called with only one or two output
arguments (in the case where the optimization algorithm only needs the
values of f and g but not H).
• function [f,g,H] = myfun(x)
• f = ... % Compute the objective function value at
x
• if nargout > 1 % fun called with two output
arguments
• g = ... % Gradient of the function evaluated at x
• if nargout > 2
• H = ... % Hessian evaluated at x
• end

The Hessian matrix is the second partial derivatives matrix of f at the


point x. That is, the (i,j)th component of H is the second partial derivative

of f with respect to xi and xj, . The Hessian is by definition a


symmetric matrix.
options Options provides the function-specific details for the options parameters.

Output Arguments

Function Arguments contains general descriptions of arguments returned by fminunc.


This section provides function-specific details for exitflag and output:

exitflag Describes the exit condition:


> 0 The function converged to a solution x.
0 The maximum number of function evaluations or
iterations was exceeded.
< 0 The function did not converge to a solution.
output Structure containing information about the optimization. The fields of
the structure are:
iterations Number of iterations taken.
funcCount Number of function evaluations.
algorithm Algorithm used.
cgiterations Number of PCG iterations (large-scale algorithm only).
stepsize Final step size taken (medium-scale algorithm only).
firstorderopt Measure of first-order optimality: the norm of the
gradient at the solution x.

Options

fminunc uses these optimization parameters. Some parameters apply to all


algorithms, some are only relevant when using the large-scale algorithm, and others
are only relevant when using the medium-scale algorithm.You can use optimset to
set or change the values of these fields in the parameters structure, options.

We start by describing the LargeScale option since it states a preference for which
algorithm to use. It is only a preference since certain conditions must be met to use
the large-scale algorithm. For fminunc, the gradient must be provided (see the
description of fun above to see how) or else the minimum-scale algorithm is used:

LargeScale Use large-scale algorithm if possible when set to 'on'. Use medium-
scale algorithm when set to 'off'.

Large-Scale and Medium-Scale Algorithms. These parameters are used by both


the large-scale and medium-scale algorithms:

Diagnostics Print diagnostic information about the function to be minimized.


Display Level of display. 'off' displays no output; 'iter' displays output
at each iteration; 'final' (default) displays just the final output.
GradObj Gradient for the objective function defined by user. See the
description of fun above to see how to define the gradient in fun.
The gradient must be provided to use the large-scale method. It is
optional for the medium-scale method.
MaxFunEvals Maximum number of function evaluations allowed.
MaxIter Maximum number of iterations allowed.
TolFun Termination tolerance on the function value.
TolX Termination tolerance on x.

Large-Scale Algorithm Only. These parameters are used only by the large-scale
algorithm:

Hessian If 'on', fminunc uses a user-defined Hessian (defined in fun),


or Hessian information (when using HessMult), for the
objective function. If 'off', fminunc approximates the
Hessian using finite differences.
HessMult Function handle for Hessian multiply function. For large-scale
structured problems, this function computes the Hessian matrix
product H*Y without actually forming H. The function is of the
form
• W = hmfun(Hinfo,Y,p1,p2,...)

where Hinfo and the additional parameters p1,p2,... contain


the matrices used to compute H*Y.
The first argument must be the same as the third argument
returned by the objective function fun.
• [f,g,Hinfo] = fun(x,p1,p2,...)

The parameters p1,p2,... are the same additional parameters


that are passed to fminunc (and to fun).
• fminunc(fun,...,options,p1,p2,...)

Y is a matrix that has the same number of rows as there are


dimensions in the problem. W = H*Y although H is not formed
explicitly. fminunc uses Hinfo to compute the preconditioner.
Note 'Hessian' must be set to 'on' for Hinfo to
be passed from fun to hmfun.
HessPattern Sparsity pattern of the Hessian for finite-differencing. If it is
not convenient to compute the sparse Hessian matrix H in fun,
the large-scale method in fminunc can approximate H via
sparse finite-differences (of the gradient) provided the sparsity
structure of H -- i.e., locations of the nonzeros -- is supplied as
the value for HessPattern. In the worst case, if the structure is
unknown, you can set HessPattern to be a dense matrix and a
full finite-difference approximation is computed at each
iteration (this is the default). This can be very expensive for
large problems so it is usually worth the effort to determine the
sparsity structure.
MaxPCGIter Maximum number of PCG (preconditioned conjugate gradient)
iterations (see the Algorithm section below).
PrecondBandWidth Upper bandwidth of preconditioner for PCG. By default,
diagonal preconditioning is used (upper bandwidth of 0). For
some problems, increasing the bandwidth reduces the number
of PCG iterations.
TolPCG Termination tolerance on the PCG iteration.
TypicalX Typical x values.

Medium-Scale Algorithm Only. These parameters are used only by the medium-
scale algorithm:

DerivativeCheck Compare user-supplied derivatives (gradient) to finite-


differencing derivatives.
DiffMaxChange Maximum change in variables for finite-difference gradients.
DiffMinChange Minimum change in variables for finite-difference gradients.
LineSearchType Line search algorithm choice.

Examples

Minimize the function

To use an M-file, create a file myfun.m.

• function f = myfun(x)
• f = 3*x(1)^2 + 2*x(1)*x(2) + x(2)^2; % Cost function

Then call fminunc to find a minimum of myfun near [1,1].

• x0 = [1,1];
• [x,fval] = fminunc(@myfun,x0)

After a couple of iterations, the solution, x, and the value of the function at x, fval,
are returned.

• x =
• 1.0e-008 *
• -0.7512 0.2479
• fval =
• 1.3818e-016

To minimize this function with the gradient provided, modify the M-file myfun.m so
the gradient is the second output argument

• function [f,g] = myfun(x)


• f = 3*x(1)^2 + 2*x(1)*x(2) + x(2)^2; % Cost function
• if nargout > 1
• g(1) = 6*x(1)+2*x(2);
• g(2) = 2*x(1)+2*x(2);
• end

and indicate the gradient value is available by creating an optimization options


structure with the GradObj parameter set to 'on' using optimset.

• options = optimset('GradObj','on');
• x0 = [1,1];
• [x,fval] = fminunc(@myfun,x0,options)

After several iterations the solution x and fval, the value of the function at x, are
returned.

• x =
• 1.0e-015 *
• 0.1110 -0.8882
• fval2 =
• 6.2862e-031

To minimize the function f(x) = sin(x) + 3 using an inline object

• f = inline('sin(x)+3');
• x = fminunc(f,4)

which returns a solution

• x =
• 4.7124

Notes

fminunc is not the preferred choice for solving problems that are sums-of-squares,
that is, of the form

Instead use the lsqnonlin function, which has been optimized for problems of this
form.
To use the large-scale method, the gradient must be provided in fun (and the GradObj
parameter set to 'on' using optimset). A warning is given if no gradient is provided
and the LargeScale parameter is not 'off'.

Algorithms

Large-Scale Optimization. By default fminunc chooses the large-scale algorithm if


the user supplies the gradient in fun (and the GradObj parameter is set to 'on' using
optimset). This algorithm is a subspace trust region method and is based on the
interior-reflective Newton method described in [2],[3]. Each iteration involves the
approximate solution of a large linear system using the method of preconditioned
conjugate gradients (PCG).

Medium-Scale Optimization. fminunc, with the LargeScale parameter set to


'off' with optimset, uses the BFGS Quasi-Newton method with a mixed quadratic
and cubic line search procedure. This quasi-Newton method uses the BFGS
([1],[5],[8],[9]) formula for updating the approximation of the Hessian matrix. The
DFP ([4],[6],[7]) formula, which approximates the inverse Hessian matrix, is selected
by setting the HessUpdate parameter to 'dfp' (and the LargeScale parameter to
'off'). A steepest descent method is selected by setting HessUpdate to 'steepdesc'
(and LargeScale to 'off'), although this is not recommended.

The default line search algorithm, i.e., when the LineSearchType parameter is set to
'quadcubic', is a safeguarded mixed quadratic and cubic polynomial interpolation
and extrapolation method. A safeguarded cubic polynomial method can be selected by
setting the LineSearchType parameter to 'cubicpoly'. This second method
generally requires fewer function evaluations but more gradient evaluations. Thus, if
gradients are being supplied and can be calculated inexpensively, the cubic
polynomial line search method is preferable.

Limitations

The function to be minimized must be continuous.fminunc may only give local


solutions.

fminunc only minimizes over the real numbers, that is, x must only consist of real
numbers and f(x) must only return real numbers. When x has complex variables, they
must be split into real and imaginary parts.

Large-Scale Optimization. To use the large-scale algorithm, the user must supply
the gradient in fun (and GradObj must be set 'on' in options).

Currently, if the analytical gradient is provided in fun, the options parameter


DerivativeCheck cannot be used with the large-scale method to compare the
analytic gradient to the finite-difference gradient. Instead, use the medium-scale
method to check the derivative with options parameter MaxIter set to 0 iterations.
Then run the problem again with the large-scale method.
References

[1] Broyden, C.G., "The Convergence of a Class of Double-Rank Minimization


Algorithms," Journal Inst. Math. Applic., Vol. 6, pp. 76-90, 1970.

[2] Coleman, T.F. and Y. Li, "An Interior, Trust Region Approach for Nonlinear
Minimization Subject to Bounds," SIAM Journal on Optimization, Vol. 6, pp. 418-
445, 1996.

[3] Coleman, T.F. and Y. Li, "On the Convergence of Reflective Newton Methods
for Large-Scale Nonlinear Minimization Subject to Bounds," Mathematical
Programming, Vol. 67, Number 2, pp. 189-224, 1994.

[4] Davidon, W.C., "Variable Metric Method for Minimization," A.E.C. Research
and Development Report, ANL-5990, 1959.

[5] Fletcher, R.,"A New Approach to Variable Metric Algorithms," Computer


Journal, Vol. 13, pp. 317-322, 1970.

[6] Fletcher, R., "Practical Methods of Optimization," Vol. 1, Unconstrained


Optimization, John Wiley and Sons, 1980.

[7] Fletcher, R. and M.J.D. Powell, "A Rapidly Convergent Descent Method for
Minimization," Computer Journal, Vol. 6, pp. 163-168, 1963.

[8] Goldfarb, D., "A Family of Variable Metric Updates Derived by Variational
Means," Mathematics of Computing, Vol. 24, pp. 23-26, 1970.

[9] Shanno, D.F., "Conditioning of Quasi-Newton Methods for Function


Minimization," Mathematics of Computing, Vol. 24, pp. 647-656, 1970.

fmincg

fmincg is an internal function developed by course on Coursera, unlike fminunc,


which is inbuilt Octave function. Since they both are used for logistic regression, they
only differ in one aspect. When the number of parameters to be considered is
considerably large (if or not compared to the size of the training set), fmincg works
faster and processes more accurately than fminunc. And, fminunc is preferred when
there are no restrictions (unconstrained) on the parameters being passed to it.

fmincg is more accurate than fminunc. Time taken by both of them is almost same. In
Neural Network or in general which has more no of weights fminunc can give out of
memory error.So fmincg is more memory efficient.
The fmincg function is used not to get a more accurate result, your cost function
should be the same in either case, and your hypothesis will become less simple or
more complex, but because it is more efficient at doing gradient descent for especially
complex hypotheses. You should use fminunc where the hypothesis has few features,
but fmincg where it has hundreds.

Explanation of use of fmincg in One_vs_All:

• all_theta is a matrix, where there is a row for each of the trained thetas.

• Each call to fmincg() returns a theta vector.

• The "y == c" statement creates a vector of 0's and 1's for each value of 'c' as you
iterate from 1 to num_labels. Those are the effective 'y' values that are used for
training to detect each label.

You might also like