MATLAB Function Reference
optimset
Create or edit optimization options parameter structure
Syntax
• options = optimset('param1',value1,'param2',value2,...)
• optimset
• options = optimset
• options = optimset(optimfun)
• options = optimset(oldopts,'param1',value1,...)
• options = optimset(oldopts,newopts)
•
Description
options = optimset('param1',value1,'param2',value2,...) creates an
optimization options structure called options, in which the specified parameters
(param) have specified values. Any unspecified parameters are set to [] (parameters
with value [] indicate to use the default value for that parameter when options is
passed to the optimization function). It is sufficient to type only enough leading
characters to define the parameter name uniquely. Case is ignored for parameter
names.
optimset with no input or output arguments displays a complete list of parameters
with their valid values.
options = optimset (with no input arguments) creates an options structure
options where all fields are set to [].
options = optimset(optimfun) creates an options structure options with all
parameter names and default values relevant to the optimization function optimfun.
options = optimset(oldopts,'param1',value1,...) creates a copy of
oldopts, modifying the specified parameters with the specified values.
options = optimset(oldopts,newopts) combines an existing options structure
oldopts with a new options structure newopts. Any parameters in newopts with
nonempty values overwrite the corresponding old parameters in oldopts.
Parameters
Optimization parameters used by MATLAB functions and Optimization Toolbox
functions:
Parameter Value Description
Display 'off' | 'iter' | Level of display. 'off' displays no output;
'final' | 'iter' displays output at each iteration;
'notify' 'final' displays just the final output; 'notify'
dislays output only if the function does not
converge.
MaxFunEvals positive integer Maximum number of function evaluations
allowed.
MaxIter positive integer Maximum number of iterations allowed.
TolFun positive scalar Termination tolerance on the function value.
TolX positive scalar Termination tolerance on .
Optimization parameters used by Optimization Toolbox functions
Property Value Description
DerivativeCheck 'on' | {'off'} Compare user-supplied analytic
derivatives (gradients or Jacobian)
to finite differencing derivatives.
Diagnostics 'on' | {'off'} Print diagnostic information about
the function to be minimized or
solved.
DiffMaxChange positive scalar | {1e-1} Maximum change in variables for
finite difference derivatives.
DiffMinChange positive scalar | {1e-8} Minimum change in variables for
finite difference derivatives.
GoalsExactAchieve positive scalar integer | Number of goals to achieve exactly
{0} (do not over- or underachieve).
GradConstr 'on' | {'off'} Gradients for nonlinear constraints
defined by the user.
GradObj 'on' | {'off'} Gradient(s) for objective
function(s) defined by the user.
Hessian 'on' | {'off'} Hessian for the objective function
defined by the user.
HessMult function | {[]} Hessian multiply function defined
by the user.
HessPattern sparse matrix |{sparse Sparsity pattern of the Hessian for
matrix of all ones} finite differencing. The size of the
matrix is n-by-n, where n is the
number of elements in x0, the
starting point.
HessUpdate {'bfgs'} | 'dfp' | Quasi-Newton updating scheme.
'gillmurray' |
'steepdesc'
Jacobian 'on' | {'off'} Jacobian for the objective function
defined by the user.
JacobMult function | {[]} Jacobian multiply function defined
by the user.
JacobPattern sparse matrix |{sparse Sparsity pattern of the Jacobian for
matrix of all ones} finite differencing. The size of the
matrix is m-by-n, where m is the
number of values in the first
argument returned by the user-
specified function fun, and n is the
number of elements in x0, the
starting point.
LargeScale {'on'} | 'off' Use large-scale algorithm if
possible. Exception: default for
fsolve is 'off'.
LevenbergMarquardt 'on' | {'off'} Chooses Levenberg-Marquardt
over Gauss-Newton algorithm.
LineSearchType 'cubicpoly' | Line search algorithm choice.
{'quadcubic'}
MaxPCGIter positive integer Maximum number of PCG
iterations allowed. The default is
the greater of 1 and floor(n/2))
where n is the number of elements
in x0, the starting point.
MeritFunction 'singleobj' | Use goal attainment/minimax merit
{'multiobj'} function (multiobjective) vs.
fmincon (single objective).
MinAbsMax positive scalar integer | Number of to minimize the
{0} worst case absolute values
PrecondBandWidth positive integer | {0} | Upper bandwidth of preconditioner
Inf for PCG.
TolCon positive scalar Termination tolerance on the
constraint violation.
TolPCG positive scalar | {0.1} Termination tolerance on the PCG
iteration.
TypicalX vector of all ones Typical x values. The length of the
vector is equal to the number of
elements in x0, the starting point.
Examples
This statement creates an optimization options structure called options in which the
Display parameter is set to 'iter' and the TolFun parameter is set to 1e-8.
• options = optimset('Display','iter','TolFun',1e-8)
•
This statement makes a copy of the options structure called options, changing the
value of the TolX parameter and storing new values in optnew.
• optnew = optimset(options,'TolX',1e-4);
•
This statement returns an optimization options structure that contains all the
parameter names and default values relevant to the function fminbnd.
• optimset('fminbnd')
Optimization Toolbox
fminunc
Find a minimum of an unconstrained multivariable function
where x is a vector and f(x) is a function that returns a scalar.
Syntax
• x = fminunc(fun,x0)
• x = fminunc(fun,x0,options)
• x = fminunc(fun,x0,options,P1,P2,...)
• [x,fval] = fminunc(...)
• [x,fval,exitflag] = fminunc(...)
• [x,fval,exitflag,output] = fminunc(...)
• [x,fval,exitflag,output,grad] = fminunc(...)
• [x,fval,exitflag,output,grad,hessian] = fminunc(...)
•
Description
fminunc finds a minimum of a scalar function of several variables, starting at an
initial estimate. This is generally referred to as unconstrained nonlinear optimization.
x = fminunc(fun,x0) starts at the point x0 and finds a local minimum x of the
function described in fun. x0 can be a scalar, vector, or matrix.
x = fminunc(fun,x0,options) minimizes with the optimization parameters
specified in the structure options. Use optimset to set these parameters.
x = fminunc(fun,x0,options,P1,P2,...) passes the problem-dependent
parameters P1, P2, etc., directly to the function fun. Pass an empty matrix for
options to use the default values for options.
[x,fval] = fminunc(...) returns in fval the value of the objective function fun
at the solution x.
[x,fval,exitflag] = fminunc(...) returns a value exitflag that describes the
exit condition.
[x,fval,exitflag,output] = fminunc(...) returns a structure output that
contains information about the optimization.
[x,fval,exitflag,output,grad] = fminunc(...) returns in grad the value of
the gradient of fun at the solution x.
[x,fval,exitflag,output,grad,hessian] = fminunc(...) returns in hessian
the value of the Hessian of the objective function fun at the solution x.
Input Arguments
Function Arguments contains general descriptions of arguments passed in to fminunc.
This section provides function-specific details for fun and options:
fun The function to be minimized. fun is a function that accepts a vector x
and returns a scalar f, the objective function evaluated at x. The function
fun can be specified as a function handle.
• x = fminunc(@myfun,x0)
•
where myfun is a MATLAB function such as
• function f = myfun(x)
• f = ... % Compute function value at x
•
fun can also be an inline object.
• x = fminunc(inline('norm(x)^2'),x0);
•
If the gradient of fun can also be computed and the GradObj parameter is
'on', as set by
• options = optimset('GradObj','on')
•
then the function fun must return, in the second output argument, the
gradient value g, a vector, at x. Note that by checking the value of
nargout the function can avoid computing g when fun is called with only
one output argument (in the case where the optimization algorithm only
needs the value of f but not g).
• function [f,g] = myfun(x)
• f = ... % Compute the function value at x
• if nargout > 1 % fun called with 2 output
arguments
• g = ... % Compute the gradient evaluated
at x
• end
•
The gradient is the partial derivatives of f at the point x. That is, the
ith component of g is the partial derivative of f with respect to the ith
component of x.
If the Hessian matrix can also be computed and the Hessian parameter is
'on', i.e., options = optimset('Hessian','on'), then the function
fun must return the Hessian value H, a symmetric matrix, at x in a third
output argument. Note that by checking the value of nargout we can
avoid computing H when fun is called with only one or two output
arguments (in the case where the optimization algorithm only needs the
values of f and g but not H).
• function [f,g,H] = myfun(x)
• f = ... % Compute the objective function value at
x
• if nargout > 1 % fun called with two output
arguments
• g = ... % Gradient of the function evaluated at x
• if nargout > 2
• H = ... % Hessian evaluated at x
• end
•
The Hessian matrix is the second partial derivatives matrix of f at the
point x. That is, the (i,j)th component of H is the second partial derivative
of f with respect to xi and xj, . The Hessian is by definition a
symmetric matrix.
options Options provides the function-specific details for the options parameters.
Output Arguments
Function Arguments contains general descriptions of arguments returned by fminunc.
This section provides function-specific details for exitflag and output:
exitflag Describes the exit condition:
> 0 The function converged to a solution x.
0 The maximum number of function evaluations or
iterations was exceeded.
< 0 The function did not converge to a solution.
output Structure containing information about the optimization. The fields of
the structure are:
iterations Number of iterations taken.
funcCount Number of function evaluations.
algorithm Algorithm used.
cgiterations Number of PCG iterations (large-scale algorithm only).
stepsize Final step size taken (medium-scale algorithm only).
firstorderopt Measure of first-order optimality: the norm of the
gradient at the solution x.
Options
fminunc uses these optimization parameters. Some parameters apply to all
algorithms, some are only relevant when using the large-scale algorithm, and others
are only relevant when using the medium-scale algorithm.You can use optimset to
set or change the values of these fields in the parameters structure, options.
We start by describing the LargeScale option since it states a preference for which
algorithm to use. It is only a preference since certain conditions must be met to use
the large-scale algorithm. For fminunc, the gradient must be provided (see the
description of fun above to see how) or else the minimum-scale algorithm is used:
LargeScale Use large-scale algorithm if possible when set to 'on'. Use medium-
scale algorithm when set to 'off'.
Large-Scale and Medium-Scale Algorithms. These parameters are used by both
the large-scale and medium-scale algorithms:
Diagnostics Print diagnostic information about the function to be minimized.
Display Level of display. 'off' displays no output; 'iter' displays output
at each iteration; 'final' (default) displays just the final output.
GradObj Gradient for the objective function defined by user. See the
description of fun above to see how to define the gradient in fun.
The gradient must be provided to use the large-scale method. It is
optional for the medium-scale method.
MaxFunEvals Maximum number of function evaluations allowed.
MaxIter Maximum number of iterations allowed.
TolFun Termination tolerance on the function value.
TolX Termination tolerance on x.
Large-Scale Algorithm Only. These parameters are used only by the large-scale
algorithm:
Hessian If 'on', fminunc uses a user-defined Hessian (defined in fun),
or Hessian information (when using HessMult), for the
objective function. If 'off', fminunc approximates the
Hessian using finite differences.
HessMult Function handle for Hessian multiply function. For large-scale
structured problems, this function computes the Hessian matrix
product H*Y without actually forming H. The function is of the
form
• W = hmfun(Hinfo,Y,p1,p2,...)
•
where Hinfo and the additional parameters p1,p2,... contain
the matrices used to compute H*Y.
The first argument must be the same as the third argument
returned by the objective function fun.
• [f,g,Hinfo] = fun(x,p1,p2,...)
•
The parameters p1,p2,... are the same additional parameters
that are passed to fminunc (and to fun).
• fminunc(fun,...,options,p1,p2,...)
•
Y is a matrix that has the same number of rows as there are
dimensions in the problem. W = H*Y although H is not formed
explicitly. fminunc uses Hinfo to compute the preconditioner.
Note 'Hessian' must be set to 'on' for Hinfo to
be passed from fun to hmfun.
HessPattern Sparsity pattern of the Hessian for finite-differencing. If it is
not convenient to compute the sparse Hessian matrix H in fun,
the large-scale method in fminunc can approximate H via
sparse finite-differences (of the gradient) provided the sparsity
structure of H -- i.e., locations of the nonzeros -- is supplied as
the value for HessPattern. In the worst case, if the structure is
unknown, you can set HessPattern to be a dense matrix and a
full finite-difference approximation is computed at each
iteration (this is the default). This can be very expensive for
large problems so it is usually worth the effort to determine the
sparsity structure.
MaxPCGIter Maximum number of PCG (preconditioned conjugate gradient)
iterations (see the Algorithm section below).
PrecondBandWidth Upper bandwidth of preconditioner for PCG. By default,
diagonal preconditioning is used (upper bandwidth of 0). For
some problems, increasing the bandwidth reduces the number
of PCG iterations.
TolPCG Termination tolerance on the PCG iteration.
TypicalX Typical x values.
Medium-Scale Algorithm Only. These parameters are used only by the medium-
scale algorithm:
DerivativeCheck Compare user-supplied derivatives (gradient) to finite-
differencing derivatives.
DiffMaxChange Maximum change in variables for finite-difference gradients.
DiffMinChange Minimum change in variables for finite-difference gradients.
LineSearchType Line search algorithm choice.
Examples
Minimize the function
To use an M-file, create a file myfun.m.
• function f = myfun(x)
• f = 3*x(1)^2 + 2*x(1)*x(2) + x(2)^2; % Cost function
•
Then call fminunc to find a minimum of myfun near [1,1].
• x0 = [1,1];
• [x,fval] = fminunc(@myfun,x0)
•
After a couple of iterations, the solution, x, and the value of the function at x, fval,
are returned.
• x =
• 1.0e-008 *
• -0.7512 0.2479
• fval =
• 1.3818e-016
•
To minimize this function with the gradient provided, modify the M-file myfun.m so
the gradient is the second output argument
• function [f,g] = myfun(x)
• f = 3*x(1)^2 + 2*x(1)*x(2) + x(2)^2; % Cost function
• if nargout > 1
• g(1) = 6*x(1)+2*x(2);
• g(2) = 2*x(1)+2*x(2);
• end
•
and indicate the gradient value is available by creating an optimization options
structure with the GradObj parameter set to 'on' using optimset.
• options = optimset('GradObj','on');
• x0 = [1,1];
• [x,fval] = fminunc(@myfun,x0,options)
•
After several iterations the solution x and fval, the value of the function at x, are
returned.
• x =
• 1.0e-015 *
• 0.1110 -0.8882
• fval2 =
• 6.2862e-031
•
To minimize the function f(x) = sin(x) + 3 using an inline object
• f = inline('sin(x)+3');
• x = fminunc(f,4)
•
which returns a solution
• x =
• 4.7124
•
Notes
fminunc is not the preferred choice for solving problems that are sums-of-squares,
that is, of the form
Instead use the lsqnonlin function, which has been optimized for problems of this
form.
To use the large-scale method, the gradient must be provided in fun (and the GradObj
parameter set to 'on' using optimset). A warning is given if no gradient is provided
and the LargeScale parameter is not 'off'.
Algorithms
Large-Scale Optimization. By default fminunc chooses the large-scale algorithm if
the user supplies the gradient in fun (and the GradObj parameter is set to 'on' using
optimset). This algorithm is a subspace trust region method and is based on the
interior-reflective Newton method described in [2],[3]. Each iteration involves the
approximate solution of a large linear system using the method of preconditioned
conjugate gradients (PCG).
Medium-Scale Optimization. fminunc, with the LargeScale parameter set to
'off' with optimset, uses the BFGS Quasi-Newton method with a mixed quadratic
and cubic line search procedure. This quasi-Newton method uses the BFGS
([1],[5],[8],[9]) formula for updating the approximation of the Hessian matrix. The
DFP ([4],[6],[7]) formula, which approximates the inverse Hessian matrix, is selected
by setting the HessUpdate parameter to 'dfp' (and the LargeScale parameter to
'off'). A steepest descent method is selected by setting HessUpdate to 'steepdesc'
(and LargeScale to 'off'), although this is not recommended.
The default line search algorithm, i.e., when the LineSearchType parameter is set to
'quadcubic', is a safeguarded mixed quadratic and cubic polynomial interpolation
and extrapolation method. A safeguarded cubic polynomial method can be selected by
setting the LineSearchType parameter to 'cubicpoly'. This second method
generally requires fewer function evaluations but more gradient evaluations. Thus, if
gradients are being supplied and can be calculated inexpensively, the cubic
polynomial line search method is preferable.
Limitations
The function to be minimized must be continuous.fminunc may only give local
solutions.
fminunc only minimizes over the real numbers, that is, x must only consist of real
numbers and f(x) must only return real numbers. When x has complex variables, they
must be split into real and imaginary parts.
Large-Scale Optimization. To use the large-scale algorithm, the user must supply
the gradient in fun (and GradObj must be set 'on' in options).
Currently, if the analytical gradient is provided in fun, the options parameter
DerivativeCheck cannot be used with the large-scale method to compare the
analytic gradient to the finite-difference gradient. Instead, use the medium-scale
method to check the derivative with options parameter MaxIter set to 0 iterations.
Then run the problem again with the large-scale method.
References
[1] Broyden, C.G., "The Convergence of a Class of Double-Rank Minimization
Algorithms," Journal Inst. Math. Applic., Vol. 6, pp. 76-90, 1970.
[2] Coleman, T.F. and Y. Li, "An Interior, Trust Region Approach for Nonlinear
Minimization Subject to Bounds," SIAM Journal on Optimization, Vol. 6, pp. 418-
445, 1996.
[3] Coleman, T.F. and Y. Li, "On the Convergence of Reflective Newton Methods
for Large-Scale Nonlinear Minimization Subject to Bounds," Mathematical
Programming, Vol. 67, Number 2, pp. 189-224, 1994.
[4] Davidon, W.C., "Variable Metric Method for Minimization," A.E.C. Research
and Development Report, ANL-5990, 1959.
[5] Fletcher, R.,"A New Approach to Variable Metric Algorithms," Computer
Journal, Vol. 13, pp. 317-322, 1970.
[6] Fletcher, R., "Practical Methods of Optimization," Vol. 1, Unconstrained
Optimization, John Wiley and Sons, 1980.
[7] Fletcher, R. and M.J.D. Powell, "A Rapidly Convergent Descent Method for
Minimization," Computer Journal, Vol. 6, pp. 163-168, 1963.
[8] Goldfarb, D., "A Family of Variable Metric Updates Derived by Variational
Means," Mathematics of Computing, Vol. 24, pp. 23-26, 1970.
[9] Shanno, D.F., "Conditioning of Quasi-Newton Methods for Function
Minimization," Mathematics of Computing, Vol. 24, pp. 647-656, 1970.
fmincg
fmincg is an internal function developed by course on Coursera, unlike fminunc,
which is inbuilt Octave function. Since they both are used for logistic regression, they
only differ in one aspect. When the number of parameters to be considered is
considerably large (if or not compared to the size of the training set), fmincg works
faster and processes more accurately than fminunc. And, fminunc is preferred when
there are no restrictions (unconstrained) on the parameters being passed to it.
fmincg is more accurate than fminunc. Time taken by both of them is almost same. In
Neural Network or in general which has more no of weights fminunc can give out of
memory error.So fmincg is more memory efficient.
The fmincg function is used not to get a more accurate result, your cost function
should be the same in either case, and your hypothesis will become less simple or
more complex, but because it is more efficient at doing gradient descent for especially
complex hypotheses. You should use fminunc where the hypothesis has few features,
but fmincg where it has hundreds.
Explanation of use of fmincg in One_vs_All:
• all_theta is a matrix, where there is a row for each of the trained thetas.
• Each call to fmincg() returns a theta vector.
• The "y == c" statement creates a vector of 0's and 1's for each value of 'c' as you
iterate from 1 to num_labels. Those are the effective 'y' values that are used for
training to detect each label.