0% found this document useful (0 votes)
46 views

Basic Optimization: Naufal Suryanto

This document summarizes several optimization algorithms and techniques, including: 1. Gradient descent, stochastic gradient descent, and convex optimization. 2. Gradient descent is used to find the minimum or maximum of a function by taking steps proportional to the negative of the gradient. 3. Stochastic gradient descent approximates the gradient using a subset of data to reduce computation.

Uploaded by

Naufal Suryanto
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views

Basic Optimization: Naufal Suryanto

This document summarizes several optimization algorithms and techniques, including: 1. Gradient descent, stochastic gradient descent, and convex optimization. 2. Gradient descent is used to find the minimum or maximum of a function by taking steps proportional to the negative of the gradient. 3. Stochastic gradient descent approximates the gradient using a subset of data to reduce computation.

Uploaded by

Naufal Suryanto
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Basic Optimization

Gradient Descent,
Stochastic Gradient Descent,
Convex Optimization

Naufal Suryanto
11/23/2020
Basic Optimization
Introduction

 • Optimization is a procedure to find the best solution (optimum value) of a given


function.
• The best solution can be minimum or maximum value of a function

Example:
-> the minimum value is 1, when x = 0
-> the maximum value is 3, when the x = 0

2
 ( 𝑥 + 1 ) (−
  2 𝑥2 +1 )
  How about:
Basic Optimization
Introduction

 𝑓 ( 𝑥 )= 𝑥 4 +7 𝑥 3 +5 𝑥 2 − 17 𝑥+3
 𝑑 𝑓 ( 𝑥)
= 4 𝑥 3+ 21 𝑥 2+10 𝑥 − 17  
𝑑𝑥
 𝑀𝑖𝑛𝑖𝑚𝑢𝑚 𝑜𝑟 𝑚𝑎𝑥𝑖𝑚𝑢𝑚 ?
2
 𝑑 𝑓 ( 𝑥) 2
2
=12 𝑥 + 42 𝑥 +10   [minimum]
𝑑𝑥
  [maximum]

  [minimum]

  How about:

or

3
Basic Optimization
Optimization Algorithm Classification

Optimization
Algorithm

Stochastic
Deterministic
(not
(guaranteed)
guaranteed)

Linear Simple
Meta-Heuristics
Programming Heuristics

Non-Linear Population
Local Search Trajectory Based
Programing Based

Greedy Simulated Evolutionary Swarm


Gradient Based
Algorithm Annealing Algorithm Intelligence

Genetic Particle Swarm


… … …
Algorithm Optimization

… …
Basic Optimization
Optimization using Gradient Descent

min
  𝑓 (𝑥) -> where x is multi-dimensional variable such as (x1, x2,…)
𝑥

 
Example:  Gradient Descent Algorithm

𝑛=2
 
where = learning rate / step size
 ) = +

 Gradient Descent Algorithm:

Step 1. Initialize Random Position and Learning Rate


• Example , , and = 0.1

Step 2. Update variable using gradient descent algorithm

Step 3. Repeat step 2 until convergence

5
Basic Optimization
Optimization using Gradient Descent

Initial
 
)=+
, , and = 0.1

Iteration 1

Iteration 2

Iteration 3

Iteration n

6
Basic Optimization
Effect of Learning Rate in Gradient Descent

7
Basic Optimization
Gradient Descent with Momentum

Gradient Descent has a  Gradient Descent with Momentum Algorithm


problem when the scale
between each axis is
highly different =

Where:
= learning rate / step size
= momentum rate

Momentum give the memory of


previous step

  Example:
) = + 20

Initial:

8
Basic Optimization
Stochastic Gradient Descent (SGD)

Computing
  the gradient can be very time consuming and resource exhausting.
However, often it is possible to find a “cheap” approximation of the gradient.
Approximating the gradient is still useful as long as it points in roughly the same
direction as the true gradient.

Stochastic gradient descent (SGD) is a stochastic approximation of the gradient


descent method for minimizing an objective function that is written as a sum of
differentiable functions.

For example in machine learning, given n = 1,…,N data points, we often consider
objective functions that are the sum of the losses Ln incurred by each example n.
In mathematical notation, we have the form

Rather then calculating the gradient of all data points, SGD select one data point,
calculate the gradient, and update its variable.

9
Basic Optimization
(Batch) Gradient Descent VS Stochastic Gradient Descent (SGD)

10
Basic Optimization
Convex Optimization

Convex Problem

Convex Set Non-Convex Set

11
Basic Optimization
Convex Optimization

Linear Programming
Linear programming is a method to achieve the best outcome in a mathematical model whose
requirements are represented by linear relationships.

Example case:
A company manufactures and sells 2 types of Products: ‘A’ and ‘B’.
The cost of production of each units of ‘A’ and ‘B’ Is $200 and $150, respectively.
Each units of ‘A’ yields a profit of $20 and each unit of ‘B’ yields a profit of $15 on selling.
Company estimates the monthly demand of ‘A’ and ‘B’ to be at a maximum of 300 units in all.
The production budget for this month is set at $ 50,000. How many units should the company
manufacture in order to earn maximum profit from its monthly sales of ‘A’ and ‘B’?

Cost of production
Products Profit per unit Total Demand
per unit

A $ 200 $ 20
300 units
B $ 150 $ 15
Production
Budget $ 50,000 - -

12
Basic Optimization
Convex Optimization

Linear Programming - Mathematics

Cost of production
Products per unit Profit per unit Total Demand

A $ 200 $ 20
300 units
B $ 150 $ 15
Production
Budget $ 50,000 - -

Number of units of A x
Number of units of B y
Budget constraint:
Non-negative constraint: 200 x + 150 y ≤ 50,000
x≥0
y≥0 Total demand constraint:
x + y ≤ 300

Z = Total profit Goal: Maximize ‘Z’ with


Z = 20 x + 15 y defined constrained

13
Basic Optimization
Convex Optimization

Linear Programming - Mathematics

Maximize the profit function:


Z = 20 x + 15 y

Inequalities Corresponding Co-ordinates of X- Co-ordinates of Y-


Subject to constraints: Equations intercept intercept
200 x + 150 y ≤ 50,000
x + y ≤ 300 4x + 3(0) = 1000
200x + 150y ≤ 50,000 4x + 3y = 1000 x = 250, y = 0 4(0) + 3y = 1000
x = 0, y = 333
x≥0 (250, 0)
y≥0 x + (0) = 300 (0) + y = 300
x + y ≤ 300 x + y = 300 x = 300, y = 0 x = 0, y = 300
(300, 0) (0, 300)

y
4x + 3y = 1000 All edges points are the optimum value:
x + y = 300
--------------------- (0, 0) -> Z = 20 (0) + 15 (0) =0 minimum
4x + 3y = 1000 (250, 0) -> Z = 20 (250) + 15 (0) = 5000 maximum
(0, 333) 3x + 3y = 900 (0, 300) -> Z = 20 (0) + 15 (300) = 4500
(0, 300) ----------------------
(100, 200) (100, 200) -> Z = 20 (100) + 15 (200) = 5000 maximum
(-)
All values that
satisfied the x = 100
constrained The maximum profit will be earned when selling
(100) + y = 300 Product A: 250 or
x y = 200 Product A: 100 and Product B: 200
(0, 0) (250, 0) (300, 0)

14
감사합니다
Q&A

15

You might also like