Basic Optimization: Naufal Suryanto
Basic Optimization: Naufal Suryanto
Gradient Descent,
Stochastic Gradient Descent,
Convex Optimization
Naufal Suryanto
11/23/2020
Basic Optimization
Introduction
Example:
-> the minimum value is 1, when x = 0
-> the maximum value is 3, when the x = 0
2
( 𝑥 + 1 ) (−
2 𝑥2 +1 )
How about:
Basic Optimization
Introduction
𝑓 ( 𝑥 )= 𝑥 4 +7 𝑥 3 +5 𝑥 2 − 17 𝑥+3
𝑑 𝑓 ( 𝑥)
= 4 𝑥 3+ 21 𝑥 2+10 𝑥 − 17
𝑑𝑥
𝑀𝑖𝑛𝑖𝑚𝑢𝑚 𝑜𝑟 𝑚𝑎𝑥𝑖𝑚𝑢𝑚 ?
2
𝑑 𝑓 ( 𝑥) 2
2
=12 𝑥 + 42 𝑥 +10 [minimum]
𝑑𝑥
[maximum]
[minimum]
How about:
or
3
Basic Optimization
Optimization Algorithm Classification
Optimization
Algorithm
Stochastic
Deterministic
(not
(guaranteed)
guaranteed)
Linear Simple
Meta-Heuristics
Programming Heuristics
Non-Linear Population
Local Search Trajectory Based
Programing Based
… …
Basic Optimization
Optimization using Gradient Descent
min
𝑓 (𝑥) -> where x is multi-dimensional variable such as (x1, x2,…)
𝑥
Example: Gradient Descent Algorithm
𝑛=2
where = learning rate / step size
) = +
5
Basic Optimization
Optimization using Gradient Descent
Initial
)=+
, , and = 0.1
Iteration 1
Iteration 2
Iteration 3
Iteration n
6
Basic Optimization
Effect of Learning Rate in Gradient Descent
7
Basic Optimization
Gradient Descent with Momentum
Where:
= learning rate / step size
= momentum rate
Example:
) = + 20
Initial:
8
Basic Optimization
Stochastic Gradient Descent (SGD)
Computing
the gradient can be very time consuming and resource exhausting.
However, often it is possible to find a “cheap” approximation of the gradient.
Approximating the gradient is still useful as long as it points in roughly the same
direction as the true gradient.
For example in machine learning, given n = 1,…,N data points, we often consider
objective functions that are the sum of the losses Ln incurred by each example n.
In mathematical notation, we have the form
Rather then calculating the gradient of all data points, SGD select one data point,
calculate the gradient, and update its variable.
9
Basic Optimization
(Batch) Gradient Descent VS Stochastic Gradient Descent (SGD)
10
Basic Optimization
Convex Optimization
Convex Problem
11
Basic Optimization
Convex Optimization
Linear Programming
Linear programming is a method to achieve the best outcome in a mathematical model whose
requirements are represented by linear relationships.
Example case:
A company manufactures and sells 2 types of Products: ‘A’ and ‘B’.
The cost of production of each units of ‘A’ and ‘B’ Is $200 and $150, respectively.
Each units of ‘A’ yields a profit of $20 and each unit of ‘B’ yields a profit of $15 on selling.
Company estimates the monthly demand of ‘A’ and ‘B’ to be at a maximum of 300 units in all.
The production budget for this month is set at $ 50,000. How many units should the company
manufacture in order to earn maximum profit from its monthly sales of ‘A’ and ‘B’?
Cost of production
Products Profit per unit Total Demand
per unit
A $ 200 $ 20
300 units
B $ 150 $ 15
Production
Budget $ 50,000 - -
12
Basic Optimization
Convex Optimization
Cost of production
Products per unit Profit per unit Total Demand
A $ 200 $ 20
300 units
B $ 150 $ 15
Production
Budget $ 50,000 - -
Number of units of A x
Number of units of B y
Budget constraint:
Non-negative constraint: 200 x + 150 y ≤ 50,000
x≥0
y≥0 Total demand constraint:
x + y ≤ 300
13
Basic Optimization
Convex Optimization
y
4x + 3y = 1000 All edges points are the optimum value:
x + y = 300
--------------------- (0, 0) -> Z = 20 (0) + 15 (0) =0 minimum
4x + 3y = 1000 (250, 0) -> Z = 20 (250) + 15 (0) = 5000 maximum
(0, 333) 3x + 3y = 900 (0, 300) -> Z = 20 (0) + 15 (300) = 4500
(0, 300) ----------------------
(100, 200) (100, 200) -> Z = 20 (100) + 15 (200) = 5000 maximum
(-)
All values that
satisfied the x = 100
constrained The maximum profit will be earned when selling
(100) + y = 300 Product A: 250 or
x y = 200 Product A: 100 and Product B: 200
(0, 0) (250, 0) (300, 0)
14
감사합니다
Q&A
15