Cs3491 - Aiml - Unit III - Gradient Descent
Cs3491 - Aiml - Unit III - Gradient Descent
Engineering
Regulation 21
Semester: III
K.Sumithra Devi
Assistant Professor
CSE
1
KCG DEPARTMENT OF CSE 1
UNIT III SUPERVISED LEARNING – GRADIENT DESCENT
K3
=⇒ ∇f
(x∗)T d ≥ 0, ∀d
But if x ( t ) is not optimal, then we want
f ( x ( t ) + ϵ d ) ≤ f ( x (t) )
So, 1 h (t) i
lim f (x + ϵ d ) − f (x(t) ) = ∇f (x(t) )T d
`є→0 ϵ ˛¸
≤0, forxsome d
=⇒ ∇f ( x ( t ) ) T d ≤
0
KCG DEPARTMENT OF CSE 4
Descent Direction
Pictorial illustration:
∇f (x) is perpendicular to the contour.
A search direction d can either be on the positive side ∇f (x)T d ≥
0 or negative side ∇f (x)T d < 0.
Only those on the negative side can reduce the
cost. All such d ’s are called the descent
directions.
7
KCG DEPARTMENT OF CSE 7
Step Size
The algorithm:
E.g., if f (x) = 21 x T Hx + c T x,
then
∇f ( x ( t ) ) T d ( t )
α (t) = − .
d (t)T H d (t)
λ min 2t
≤ 1 f (x(1)) − f (x∗) .
− λmax
Thus, f ( x ( t ) ) → f (x∗) as t → ∞. 9
KCG DEPARTMENT OF CSE 9
Understanding Convergence
Gradient descent can be viewed as successive
approximation. Approximate the function as
1
f (x + d ) ≈ f (x ) + ∇f (x ) d + d 2.
t t t T
2α
We can show that the d that minimizes f (x t + d ) is d = −α∇f (x t
). This suggests: Use a quadratic function to locally approximate
f.
Converge when curvature α of the approximation is not too big.
10
KCG DEPARTMENT OF CSE 10
Advice on Gradient Descent
12
KCG DEPARTMENT OF CSE 12