Lec 11
Lec 11
Lecture 11
• At each step t = 0, 1, 2, . . .:
– Choose a step size at > 0
– Set wt+1 = wt at rF(wt )
fi
Descent direction
• A vector d is descent direction for f at w0 if
• f(w0 + td) < f(w0) for some t > 0
• For continuously di erentiable f, If d is descent direction then
⊤
• d ∇f(w0) < 0
• Argue with the help of Taylor’s theorem (use remainder)
So in our case: f(w0 + td) = f(w0) + ∇f(w0 + t̄d)⊤(td) for some t̄ ∈ (0,t)


Descent direction
• A vector d is descent direction for f at w0 if
• f(w0 + td) < f(w0) for some t > 0
• For continuously di erentiable f, If d is descent direction then
⊤
• d ∇f(w0) < 0
⊤
• Taylor’s theorem gives f(w0 + td) = f(w0) + ∇f(w0 + t̄d) (td) for t̄ ∈ (0,t)
• Since f(w0 + td) < f(w0) for some t > 0
⊤
• 0 > f(w0 + td) − f(w0) = ∇f(w0 + t̄d) (td)
⊤
• For small t, ∇f(w0 + t̄d) d < 0 and due to the continuity of ∇f, we get
⊤
• d ∇f(w0) < 0
ff
Proposition
• The point w* is a local minimizer of f only if
• ∇f(w*) = 0
⊤ 2
• If not true, d = − ∇f(w*) will be a descent direction (d ∇f(w*) = −∥∇f(w*)∥ )
Proposition
d
• Let f : ℝ → ℝ be convex and di erentiable
• x* will be global minimizer if and only if
• ∇f(x*) = 0
• f(x*) ≤ f(x) ∀x ⟺ ∇f(x*) = 0
f(x* + t(x − x*)) = f ((1 − t)x* + tx)
≤ (1 − t)f(x*) + tf(x)
(f(x* + t(x − x*)) − (1 − t)f(x*) ≤ tf(x), for t ∈ [0,1]
• The rst de nition f (tx + (1 − ty)) ≤ tf(x) + (1 − t)f(y) means that the
function evaluated at any point between x, y stays below than the line
joining f(x) and f(y)
• You can understand the di erence between the above two statements by
2
making a graph of f(x) = x .
ff
fi
fi
ff
fi