Rec 1
Rec 1
1
Descent Direction: A direction d at a point x where moving small distances from x in direction d lowers
the objective value
Wierstrass Theorem
If a set F is compact, and xi is a sequence within F , then there exists a convergent subsequence converging
to a point in F .
Intuitively, if you have innitely many points in F , then they have to be clustering somewhere (F is
bounded, so they can't be marching o to innity). That somewhere must be in F , because it is closed.
Thus, some of the xi have to be converging to a point in F .
If a function f (x) is continuous over a compact set F , then there exists a point in F that minimizes f (x)
over F .
Intuitively, pick a point x1 2 F . If f (x1 ) minimizes f (x) over F , we are done. Otherwise, since F is
continuous, there must be another point x2 2 F with f (x2 ) = 12 (f (x1 ) + infx2F f (x)). Repeat. You now
have an innite sequence of points, which converges to something in F by the above, and by continuity of
f , their objective values converge to infx2F f (x).
This tells us that optimizing over a compact set, using a continuous objective function, there exists an
optimum, so we will never be trapped in a situation where no optimal value exists.
Necessary Conditions for Optimality (in the Unconstrained Case)
If x is a local minimum, then clearly there are no descent directions at x, because if there were, you could
clearly do better. Since the gradient gives the rate of change of the objective, it must therefore be zero,
otherwise moving opposite the direction of the gradient gives you a descent direction.
Thus, if f is di erentiable, rf (x) = 0 is necessary for x to be a local minimum. If f isn't di erentiable,
life is more complicated, but you clearly still need to have no descent directions. If there are constraints,
those will also complicate matters.
If the above is satised, there is still the possibility that you have a maximum, or what in one dimension
would be a point of inection... Thus, just as in one dimension we look for second derivative non-negative,
we here look for the Hessian to be positive semidenite. If it isn't, then there is some direction in which the
gradient is zero but turning negative, so going in that direction will make the objective go down.
Thus, if f is di erentiable, H (x) must be SPSD in order for x to be a local minimum. Or looking at the
maximization problem, SNSD for a local maximum.
Sucient Conditions for Optimality (in the Unconstrained Case)
In order to be a strict local minimum, it has to be the case that moving a little bit in any direction increases
the objective value. We argued above that the gradient must be zero, so to rst order approximation, moving
in any direction does nothing. But to a second order approximation, the objective will change as you move
in a direction d by 22 dt H (x)d. Thus, if H (x) is positive denite, you are increasing the objective in any
direction.
Thus, if at x the gradient is zero and the hessian is positive denite, then x is a local minimum.
Alternately, we know that if f (x) is convex, then H (x) is everywhere positive denite, so any point where
the gradient is zero is a local minimum, and further, if you had two local minimums, you could consider
what the gradient does along the line between them and get a contradiction, so if f (x) is convex, it has at
most one local minimum, which is also a global minimum.