Optimization and Gradient Descent Algorithm
Optimization and Gradient Descent Algorithm
minimize (total distance travelled) maximize (satisfied clauses) minimize (attacking queens)
What is optimization?
For example
• Timetabling • Clustering
• Feature selection • Game strategy planning
• Wireless sensor network optimization • Circuit designing
• Vehicle routing • Bioinformatics
… • Watermarking etc.
… ….
Mathematical Formulation of Optimization
Problems
• Optimization can be in terms of minimization or maximization
• Optimization problems can be formulated mathematically
• For example: Optimal point (0, 0)
where A = 10
find x to minimize f
Height
Note:
We should have some mathematical
formula f(x) to calculate height from
current position x.
Position
Example
• Assume that the height can be computed from x using following
function:
Height = f(x) = 2x – 3
Example
• Assume that the height can be computed from x using following
function:
Height = f(x) = 2x – 3
Step 1: df/dx = 2
Step 2: x = x – 2 = 6 – 2 = 4
Example
• Assume that the height can be computed from x using following
function:
Height = f(x) = 2x – 3
Step 1: df/dx = 2
Step 2: x = x – 2 = 4 – 2 = 2
Example – If slope is negative
• Assume that the height can be computed from x using following
function:
Height = f(x) = -2x + 10
Example – If slope is negative
• Assume that the height can be computed from x using following
function:
Height = f(x) = -2x + 10
Step 1: df/dx = -2
Step 2: x = x – (-2) = 2 + 2 = 4
Example – If the function is non-linear
• Assume that the height can be computed from x using following
function:
Height = f(x) = x2 – 2x + 1
Step 1: df/dx = 2x - 2
Step 2: x = x – (2x - 2) = 4 - 6 = -2
Example – If the function is non-linear
• Assume that the height can be computed from x using following
function:
Height = f(x) = x2 – 2x + 1
Step 1: df/dx = 2x - 2
Step 2: x = x – 0.1 * (2x - 2)
= 4 – 0.6 = 3.4
Example – If the function is non-linear
• Solution: we can control the step size by just multiplying the
derivative term with a small fraction before subtracting from the
value of x.
Height = f(x) = x2 – 2x + 1
Step 1: df/dx = 2x - 2
Step 2: x = x – 0.1 * (2x - 2)
= 4 – 0.6 = 3.4
Finalized Gradient Descent Rule
Updating x:
Step 1: df/dx = 2*0.5 x
Step 2: x = x – 0.2 * (2*0.5 x) = 5 - 1 = 4
Updating y:
Step 1: df/dy = 2*0.5 y
Step 2: y = y – 0.2 * (2*0.5 y) = 4 – 0.8 = 3.2
Demo: Gradient Descent for multi-variate
functions
Gradient Descent Algorithm – Steps
1. Choose/randomly initialize a starting point
2. Calculate gradient at current point
3. Make a scaled step in the opposite direction to the gradient if
minimizing (For maximization step is taken in same direction to the
gradient)
• Repeat points 2 and 3 until one of the criteria is met:
• maximum number of iterations reached
• step size is smaller than the tolerance.
Python code
• Implementation of GD
algorithm in Python is
provided separately.
Following function is
used as an example:
Problem requirements
• Gradient Descent is more productive if the problem/function is:
• Differentiable
• Convex
Problem requirements – Differentiability