Chapter 8 Lecture Notes
Chapter 8 Lecture Notes
• The gradient of f at x0 denoted ∇f(x0), if it is not a zero vector, is orthogonal to the tangent vector to an arbitrary smooth curve passing
through x0 on the level set f(x) = c.
○ Thus, the direction of maximum rate of increase of a real-valued differentiable function at a point is orthogonal to the level set of the
function through that point.
• In other words, the gradient acts in such a direction that for a given small displacement, the function f increases more in the direction
of the gradient than in any other direction.
• To prove this statement, recall that 〈∇f(x), d〉, ||d|| = 1, is the rate of increase of f in the direction d at the point x.
• By the Cauchy-Schwarz inequality,
• Thus, the direction in which ∇f(x) points is the direction of maximum rate of increase of f at x.
○ The direction in which – ∇f(x) points is the direction of maximum rate of decrease of f at x.
• Hence, the direction of negative gradient is a good direction to search if we want to find a function minimizer.
• Let x(0) be a starting point, and consider the point x(0) – α∇f(x(0)). Then, by Taylor’s theorem, we obtain
• This means that the point x(0) – α∇f(x(0))) is an improvement over the point x(0) if we are searching for a minimizer.
• To formulate an algorithm that implements this idea, suppose that we are given a point x(k).
• To find the next point x(k+1) we start at x(k) and move by an amount –α∇f(x(k)), where αk is a positive scalar called the step size.
• This procedure leads to the following iterative algorithm:
• Note: The method of steepest descent moves in orthogonal steps, as stated in the following proposition.
Proposition 1
If [Equation] {x(k)}∞k=0 is a steepest descent sequence for a given function f: ℝn → ℝ, then for each k the vector x(k+1) – x(k) is
orthogonal to the vector
x(k+2) – x(k+1).
• The proposition above implies that ∇f(x(k)) is parallel to the tangent plane to the level set {f(x) = f(x(k+1))} at x(k+1).
○ Note that as each new point is generated by the steepest descent algorithm, the corresponding value of the function f decreases in
value, as stated below.
Proposition 2)
If {x(k)}∞k=0 is the steepest descent sequence for f: ℝn → ℝ and if ∇f(x(k)) ≠ 0, then f(x(k+1)) < f(x(k)).
• If for some k, we have ∇f(x(k)) = 0, then the point x(k) satisfies the FONC.
○ In this case, x(k+1) = x(k). We can use the above as the basis for a stopping (termination) criterion for the algorithm.
○ The condition ∇f(x(k+1)) = 0, however, is not directly suitable as a practical stopping criterion, because the numerical computation of the
gradient will rarely be identically equal to zero.
• A practical stopping criterion is to check if the norm ||∇f(x(k))|| of the gradient is less than a prespecified threshold, in which case we
stop.
○ Alternatively, we may compute the absolute difference |f(x(k+1)) – f(x(k))| between objective function values for every two successive
iterations, and if the difference is less than some prespecified threshold, then we stop; that is, we stop when
○ Yet another alternative is to compute the norm ||x(k+1) – x(k)|| of the difference between two successive iterates, and we stop if the
norm is less than a prespecified threshold:
○ Alternatively, we may check “relative” values of the quantities above; for example,
Note: The two (relative) stopping criteria above are preferable to the previous (absolute) criteria because the relative criteria are “scale-
independent.”
○ For example, scaling the objective function does not change the satisfaction of the criterion |f(x(k+1)) – f(x(k))|/|f(x(k))| < ε.
○ Similarly, scaling the decision variable does not change the satisfaction of the criterion ||x(k+1) – x(k)||/||x(k))|| < ε.
Example 1) Use the method of steepest descent to find the minimizer of the following function
• Let us now see what the method of steepest descent does with a quadratic function of the form
○ where Q ∈ ℝn×n is a symmetric positive definite matrix, b ∈ ℝn, and x ∈ ℝn. The unique minimizer of f can be found by setting the
gradient of f to zero, where
• The Hessian of f is F(x) = Q = Q⊤ > 0. To simplify the notation we write g(k) = ∇f(x(k)). Then, the steepest descent algorithm for the
quadratic function can be represented as
• In summary, the method of steepest descent for the quadratic takes the form
Example 2) Let f(x1,x2 ) = x 1 2+x 2 2. Find the minimal solution.