0% found this document useful (0 votes)
12 views

6 Gradient Method

Uploaded by

tanmoynath0999
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

6 Gradient Method

Uploaded by

tanmoynath0999
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Gradient Method

1
Gradient
• If we look at the Level Sets (projections at different
heights), then the derivatives indicate the direction
along which the function increases at the maximum
rate.

2
Gradient
Level Set of a function 𝑓: ℝ𝑛 → ℝ at a level 𝑐 is the set of points 𝑆 = {𝑥: 𝑓 𝑥 = 𝑐}.
Example:

2 2
𝑓 𝑥, 𝑦 = 𝑥𝑒 −𝑥 −𝑦
𝑥 ∈ −0.1, 2 , 𝑦 ∈ [−0.1, 3]

• The level sets are seen as the contour


lines here.

• As we approach the extremum, the


sets converge to zero gradient.

3
Gradient
• If we consider the direction of increment to be 𝑑, 𝑑 = 1,
then the rate of increase of 𝑓(𝑥) along 𝑑 at 𝑥 is
𝛻𝑓 𝑥 𝑇 𝑑 ≤ 𝛻𝑓 𝑥

{Cauchy Schwarz Inequality}.

𝛻𝑓 𝑥
• If 𝑑 = i.e., unit vector along the direction of
| 𝛻𝑓 𝑥 |
𝛻𝑓(𝑥), then

𝑇
𝛻𝑓 𝑥
𝛻𝑓 𝑥 = 𝛻𝑓 𝑥
| 𝛻𝑓 𝑥 |

Hence this is the direction where we will have the maximum


rate of increase.

4
Gradient

𝑇
𝛻𝑓 𝑥
𝛻𝑓 𝑥 = 𝛻𝑓 𝑥
| 𝛻𝑓 𝑥 |

Hence this is the direction where we will have the


maximum rate of increase.

• This indicates the direction −𝛻𝑓(𝑥) is the maximum


rate of decrease and we will find the minimum if we
move along this direction.

• Thus we will search along the direction of negative


gradient or Gradient Descent.

5
Gradient Descent
• Let us start at an initial approximation 𝑥0 , then to find the minimum, the next point according to gradient
descent will be 𝑥1 = 𝑥0 − 𝛻𝑓(𝑥0 ).

• Here we can add a step size to avoid oscillation


𝑥1 = 𝑥0 − 𝛼𝛻𝑓 𝑥0

• Thus, the update equation will be


𝑥𝑘+1 = 𝑥𝑘 − 𝛼𝑘 𝛻𝑓(𝑥𝑘 )

• Here, 𝛼𝑘 > 0 is called the Step Size or Learning Rate and we start at 𝑥𝑘 and move along the direction
− 𝛼𝑘 𝛻𝑓 𝑥𝑘 .

• 𝜶𝒌 is a Hyper-parameter of the algorithm.

6
Gradient Descent
• The gradient varies as the search proceeds, tending to zero as we approach the minimizer.
• We have the option of either taking very small steps and reevaluating the gradient at every step, or
we can take large steps each time.
• The first approach results in a laborious method of reaching the minimizer, whereas the second
approach may result in a more zigzag path to the minimizer.
• The advantage of the second approach is possibly fewer gradient evaluations.

7
Steepest Descent
• Here, the step size 𝛼𝑘 is chosen to achieve the maximum reduction in the objective function at each
step.
• Some methods of selecting 𝛼𝑘 are
• Fixed learning rate
• Line search
• Bold driver
• Momentum

8
Steepest Descent: Fixed Rate

4 3 4
Example: Find the minimum of 𝑓 𝑥1 , 𝑥2 , 𝑥3 = 𝑥1 − 4 + 𝑥2 − 3 + 4 𝑥3 + 5

Solution: Assume 𝑥0 = 4, 2, −1 𝑇

9
Steepest Descent: Fixed Rate
Example: Find the minimum of 𝑓 𝑥1 , 𝑥2 , 𝑥3 = 𝑥1 − 4 4 + 𝑥2 − 3 3 + 4 𝑥3 − 5 4

𝑇
Solution: Assume 𝑥0 = 4, 2, −1 and choose 𝛼 = 0.001
Step 1: Calculate the gradient at the point

4 𝑥1 − 4 3
𝛻𝑓 𝑥 ቚ = 2 𝑥2 − 3 2
𝑥= 4,2,−1 𝑇
16 𝑥3 + 5 3 𝑥= 4,2,−1 𝑇

10
Steepest Descent: Line Search

• Step 3: Use Line search to determine 𝛼𝑘 and then calculate the next point 𝑥𝑘+1
𝜙(𝛼) = 2𝛼 − 1 4 + 4 4 − 1024𝛼 4

• Minimize 𝜙(𝛼), such that 𝛼 ≥ 0.

• The solution will give you 𝛼0 .

• We can find 𝛼0 using the Secant Method:

𝛼 𝑘 − 𝛼 𝑘−1
𝛼 𝑘+1 = 𝛼𝑘 − ′ 𝑘 ′ 𝑘−1
𝜙 ′ (𝛼 𝑘 )
𝜙 𝛼 −𝜙 𝛼
⇒ 𝛼0 = 3.976 × 10−3

11
Steepest Descent: Line Search
• Step 3: Use Line search to determine 𝛼𝑘 and then calculate the next point 𝑥𝑘+1
4 4
𝜙(𝛼) = 2𝛼 − 1 + 4 4 − 1024𝛼

• Minimize 𝜙(𝛼), such that 𝛼 ≥ 0.

• The solution will give you 𝛼0 .

• We can find 𝛼0 using the Secant Method: 𝛼0 = 3.976 × 10−3

• Place this 𝛼0 in 𝑥1 = 𝑥0 − 𝛼0 𝛻𝑓 𝑥0 .

• Go to Step 1 and use this 𝑥1 to find 𝑥2 and keep repeating Steps 1, 2 and 3 to reach the minimum
of 𝑓 𝑥1 , 𝑥2 , 𝑥3 . Each iteration will have a new 𝛼 value.

12
Selection of the Stopping Criteria

• If we are working with a convex function, then stop when 𝛻𝑓 𝑥𝑘 = 0 .

• But if we are working with local minima or constraint sets, then we will have to modify the
stopping criterion. Such as
• Stop when the step size 𝛼𝑘 approaches zero.

• Stop when the error 𝑥𝑘 − 𝑥𝑘−1 ≤ 𝜖 or 𝑓(𝑥𝑘 ) − 𝑓(𝑥𝑘−1 ) ≤ 𝜖 for 0 < 𝜖 ≪ 1.

• Stop if the gradient 𝛻𝑓(𝑥𝑘 ) is small enough.

13
Examples
1. Find the solution for ℎ 𝑥 = 0 using fixed rate steepest descent.

4 + 3𝑥1 + 2𝑥2
ℎ 𝑥 = =0
1 + 2𝑥1 + 3𝑥2

Make , ℎ 𝑥 = 𝛻𝑓 𝑥 ⇒ 𝑥𝑘+1 = 𝑥𝑘 − 𝛼ℎ(𝑥𝑘 )

𝑥1 = −1.999
𝑥2 = 0.999

14
Steepest Descent for Quadratic Function
• For quadratic functions of the form

1
𝑓 𝑥 = 𝑥 𝑇 𝑄𝑥 − 𝑏 𝑇 𝑥
2

Where, 𝑄 ∈ ℝ𝑛×𝑛 is a positive definite symmetric matrix, 𝑏 ∈ ℝ𝑛 and 𝑥 ∈ ℝ𝑛 .


𝛻𝑓 𝑥 = 𝑄𝑥 − 𝑏 = 𝑔

• In this case, for the update equation, 𝑥𝑘+1 = 𝑥𝑘 − 𝛼𝑘 𝑔𝑘 , we can determine the update of 𝛼𝑘 .

• Since

𝛼𝑘 = 𝑎𝑟𝑔𝑚𝑖𝑛𝛼≥0 𝑓 𝑥𝑘 − 𝛼𝑔𝑘
⇒ 𝑀𝑖𝑛𝑖𝑚𝑖𝑧𝑒 𝜙 𝛼 = 𝑥𝑘 − 𝛼𝑔𝑘 𝑇 𝑄 𝑥𝑘 − 𝛼𝑔𝑘 − 𝑏 𝑇 𝑥𝑘 − 𝛼𝑔𝑘
⇒ 𝜙 ′ 𝛼 = 𝑥𝑘 − 𝛼𝑔𝑘 𝑇 𝑄 −𝑔𝑘 − 𝑏 𝑇 −𝑔𝑘

15
Steepest Descent for Quadratic Function

𝜙 ′ 𝛼 = 𝑥𝑘 − 𝛼𝑔𝑘 𝑇 𝑄 −𝑔𝑘 − 𝑏𝑇 −𝑔𝑘 = 0


⇒ −𝑥𝑘𝑇 𝑄𝑔𝑘 + 𝛼𝑘 𝑔𝑘𝑇 𝑄𝑔𝑘 + 𝑏 𝑇 𝑔𝑘 = 0
⇒ − 𝑥𝑘𝑇 𝑄 − 𝑏𝑇 𝑔𝑘 + 𝛼𝑘 𝑔𝑘𝑇 𝑄𝑔𝑘 = 0
⇒ −𝑔𝑘𝑇 𝑔𝑘 + 𝛼𝑘 𝑔𝑘𝑇 𝑄𝑔𝑘 = 0

𝑔𝑘𝑇 𝑔𝑘
⇒ 𝛼𝑘 = 𝑇
𝑔𝑘 𝑄𝑔𝑘

𝑔𝑘𝑇 𝑔𝑘
Hence, the update equation would be 𝑥𝑘+1 = 𝑥𝑘 − 𝑔𝑘 , where 𝑔𝑘 = 𝑄𝑥𝑘 − 𝑏
𝑔𝑘𝑇 𝑄𝑔𝑘

16
Steepest Descent for Quadratic Function

For fixed rate of step size 𝛼, the gradient descent algorithm 𝑥𝑘 → 𝑥 ∗ for any 𝑥0 if and only if

2
0<𝛼<
𝜆𝑚𝑎𝑥 (𝑄)

Where, 𝜆𝑚𝑎𝑥 (𝑄) is the maximum eigenvalue of Q.

17
Home Work 1
1
Convert the function in 𝑓 𝑥 = 2 𝑥 𝑇 𝑄𝑥 + 𝑏 𝑇 𝑥 format and show steepest descent with varying 𝜶
for 2 iterations.
3 2 2
𝑓 𝑥 = (𝑥1 +𝑥2 ) + 2𝑥1 𝑥2 + 3𝑥1 − 𝑥2 − 22
2
Hint: 𝑄 must be symmetric and positive definite.
2
• 0<𝛼<
𝜆_𝑚𝑎𝑥 𝑄

• Defined 𝑓(𝑥) and 𝛻𝑓 𝑥 = 𝑄𝑥𝑘 − 𝑏 in script as function


• Use the update equation 𝑥𝑘+1 = 𝑥𝑘 − 𝛼𝛻𝑓(𝑥𝑘 )
𝑔𝑘𝑇 𝑔𝑘
• Repeat the same problem with 𝛼𝑘 = , instead of Fixed value.
𝑔𝑘𝑇 𝑄𝑔𝑘

• Note down the differences, pros and cons you have.

18
Home Work 2
Show the first two iterations for steepest descent with a fixed 𝜶 for the following function
3
𝑓 𝑥 = 𝑥 𝑇 4 2√2 𝑥 + 𝑥 𝑇 + 24
0 5 6

• Hint: Rearrange the equation to make 𝑸 symmetric and then check for positive definiteness.
• 𝑄 = 8 2 2; 2 2 10 ; Find maximum eigenvalue of Q
2
• 0<𝛼<
𝜆_𝑚𝑎𝑥 𝑄
• Defined 𝑓(𝑥) and 𝛻𝑓 𝑥 = 𝑄𝑥𝑘 − 𝑏 in script as function
• Use the update equation 𝑥𝑘+1 = 𝑥𝑘 − 𝛼𝛻𝑓(𝑥𝑘 )
𝑔𝑘𝑇 𝑔𝑘
• Repeat the same problem with 𝛼𝑘 = , instead of Fixed value.
𝑔𝑘𝑇 𝑄𝑔𝑘
• Note down the differences, pros and cons you have.

19

You might also like