0% found this document useful (0 votes)

271 views51 pages

Continuous Optimization

This document discusses continuous optimization techniques used in machine learning. It introduces gradient descent, which iteratively moves parameters in the direction of steepest descent to minimize an objective function. Variants like stochastic gradient descent and momentum are also covered. Convexity and convex functions are defined, and it is shown that the minimum of a convex function over a convex set is globally optimal. Constrained optimization using Lagrange multipliers is also summarized.

Uploaded by

laphv494

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

271 views51 pages

Continuous Optimization

Uploaded by

laphv494

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 51

7 Continuous Optimization

Introduction

• Since machine learning algorithms are implemented

on a computer, the mathematical formulations are
expressed as numerical optimization methods
• Training a machine learning model: finding a good
set of parameters determined by the objective
function or the probabilistic model
 Use optimization algorithms

2/25/2023 Chapter 7 - Continuous Optimization 2

Optimization Using Gradient Descent
• We solve for the minimum of f(x): d → 
• Gradient descent exploits the fact that f(x0) decreases fastest if one
moves from x0 in the direction of the negative gradient −(∇f(x0))T of f
at x0
• For a “good” step-size  > 0, if f(x) = x2 + 1
x1 = x0 − (∇f(x0))T,
then
f(x1)  (x0)

2/25/2023 Chapter 7 - Continuous Optimization 3

Gradient Descent Algorithm
Algorithm.
1. Choose initial guess x0
2. Compute xi iteratively until we meet the stopping criteria using
xi+1 = xi − i(∇f(xi))T
3. Return parameter xi
{f(xi)  minxf(x)}
For suitable learning rate i, the sequence f(x0) > f(x1) > . . . converges
to a local minimum

2/25/2023 Chapter 7 - Continuous Optimization 4

Gradient Descent Algorithm – Ex 1
Find local minimum of f(x) = (x3-3x)/(x2+3)
Gradient
3x4 −2x3+3x2 − 9
∇f(x) =
(x2+3)2
x0 = 0, step-size =  = 0.05
number of iterations = 30
xi = xi-1 − (∇f(xi-1))T

2/25/2023 Chapter 7 - Continuous Optimization 5

Learning rate (Step-size) 
• Choosing a good step-size (learning rate) is important in GD
• step-size is too small  GD can be slow
• step-size is too large  GD can overshoot, fail to converge, or even
diverge

 = 0.01 =3

 = 0.01: Slowly converges  = 3: Fail to converge

2/25/2023 Chapter 7 - Continuous Optimization 6

Gradient Descent Algorithm – Ex 2
• Find local minimum of z = f(x, y) = x2 + 2y2 + 4
 ∇f(x, y) = [2x 4y]
x0 −1
• Choose X0 = y =
0 2
• Learning rate =  = 0.1
• GD Xi+1 = Xi - ∇f(Xi)T
−1 −2 −0.8
X1 = - 0.1 =
2 8 1.2

2/25/2023 Chapter 7 - Continuous Optimization 7

Gradient Descent Algorithm – Ex 2
x y z = f(x, y)

2/25/2023 Chapter 7 - Continuous Optimization 8

Too large 

GD with
 = 0.4
 zigzag
shape

2/25/2023 Chapter 7 - Continuous Optimization 9

How to choose suitable learning rate 
• Adaptive gradient methods rescale the learning rate  at each
iteration, depending on local properties of the function f
• f increases after a gradient step, the learning rate  was too large 
undo the step and decrease the learning rate 
• f decreases, the step could have been larger  try to increase the
learning rate 

2/25/2023 Chapter 7 - Continuous Optimization 10

Gradient Descent With Momentum (1986)
• A method that introduces an additional term to remember what
happened in the previous iteration
• The momentum-based method remembers the update ∆xi at each
iteration i and determines the next update as a linear combination of
the current and previous gradients
xi+1 = xi − i(∇f(xi))T + α∆xi
∆xi = xi − xi−1 = α∆xi−1 − i-1(∇f(xi−1))T,
where α ∈ [0, 1].

2/25/2023 Chapter 7 - Continuous Optimization 11

GD with Momentum
Find local minimum value of f(x) = 12x3 - 48x2 + 36x

Without momentum With momentum

(x0 = -1,  = 0.01) ( = 0.7, x0 = -1,  = 0.01)

2/25/2023 Chapter 7 - Continuous Optimization 12

Different values of 

With momentum With momentum

( = 0.7, x0 = 4,  = 0.01) ( = 1, x0 = 4,  = 0.01)

2/25/2023 Chapter 7 - Continuous Optimization 13

Stochastic Gradient Descent (SGD)
• Computing the gradient can be very time consuming
 find a “cheap” approximation of the gradient
• Since the goal in machine learning does not necessarily need a
precise estimate of the minimum of the objective function,
approximate gradients have been widely used
• SGD is very effective in large-scale machine learning problems such
as training deep neural networks on millions of images

2/25/2023 Chapter 7 - Continuous Optimization 14

Mini-batch Gradient Descent (SGD)
In ML, given N data points, consider the sum of the losses Ln incurred
by each example n.
L(θ) = 𝑁𝑛=1 Ln(θ), where θ are parameters
• Standard GD (“batch” optimization method) is performed using

very expensive evaluations

• In contrast to batch gradient descent, which uses all Ln, we randomly
choose a subset of Ln for mini-batch gradient descent

2/25/2023 Chapter 7 - Continuous Optimization 15

Convex sets

Definition. A set C is a convex set if for any x, y ∈ C and for any scalar
θ with 0  θ  1, we have
θx + (1 − θ)y ∈ C

Example of Example of a
a convex set non-convex set

Note. Convex sets are sets such that a straight line connecting any
two elements of the set lie inside the set.

2/25/2023 Chapter 7 - Continuous Optimization 16

Some convex sets
• In R, every interval (a, b) is convex
• In R2, C1 = {(x, y) | x2 + y2 < 1} is convex but C2 = {(x, y) | 0 < x2 + y2
< 1} is not.
• In Rn, C = (x1, x2, …, xn) | c1x1 + c2x2 + … + cnxn  b} is convex, for
all real numbers c1, c2, …, cn, b
Theorem. The intersection of two convex sets is also convex.

convex Intersection is a convex set

convex

2/26/2023 Chapter 7 - Continuous Optimization 17

Convex functions
• Definition. Let  be a convex set of D.
A function f :  →  is called a convex function if
f(θx + (1 − θ)y)  θf(x) + (1 − θ)f(y), x ∈ , y ∈ ,θ[0, 1]
Note. A concave function is the negative of a convex function

2/25/2023 Chapter 7 - Continuous Optimization 18

Concave functions
Definition. Let  be a convex set of D.

A function f :  →  is called a concave function if

f(θx + (1 − θ)y)  θf(x) + (1 − θ)f(y), x ∈ , y ∈ ,θ[0, 1]

Note. A concave function is the negative of a convex function

2/25/2023 Chapter 7 - Continuous Optimization 19

Theorem
Suppose that C is a convex set,
• If f : C →  is a convex function, then a local minimum is a global
minimum of f over C.
• If f : C →  is a concave function, then a local maximum is a global
maximum of f over C.

2/25/2023 Chapter 7 - Continuous Optimization 20

Convexity test
• If a function f : n →  is twice differentiable, then

• f(x) is convex if and only if for any two points x, y it holds that
f(y) > f(x) + ∇xf(x)T(y − x)

• f(x) is convex if and only if Hessian ∇x2f(x) is positive semidefinite

Example

2/25/2023 Chapter 7 - Continuous Optimization 21

Convex functions - Ex
• The negative entropy f(x) = xlog2x is convex for x > 0
• In fact,
Gradient ∇xf(x) = log2x + x(log2x)
= log2x + log2e

Hessian ∇x2f(x) = (1/x)log2e > 0,

for all x > 0

2/25/2023 Chapter 7 - Continuous Optimization 22

Some common convex functions
• ax + b on  for any a, b 
• ax on  for any a 
• |x|p on  for p  1
• xlogx, x > 0
• cTx + b, xn for any cn, b 
• Every norm in n
• Spectral norm of a matrix: A2 = max(A) = [max(ATA)]1/2

2/26/2023 Chapter 7 - Continuous Optimization 23

Sum of convex functions is convex
Theorem. If f and g are convex functions, then so is f + g.
• In fact, suppose f and g are convex functions
• Then, f(θx + (1 − θ)y)  θf(x) + (1 − θ)f(y)
and g(θx + (1 − θ)y)  θg(x) + (1 − θ)g(y), for any 0  θ  1
 Summing up both sides
f (θx + (1 − θ)y) + g (θx + (1 − θ)y)  θf (x) + (1 − θ)f (y) + θg(x) + (1 −
θ)g(y) = θ(f(x) + g(x)) + (1 − θ)(f (y) + g(y))
 f + g is convex

2/26/2023 Chapter 7 - Continuous Optimization 24

Constrained Optimization
Ex. Find the minimum values of the function f(x, y) = x2 + 2y2 subject to
the constraint x2 + y2 = 1.
Level curve of f(x, y)

g(x, y) = 0

2/26/2023 Chapter 7 - Continuous Optimization 25

Constrained Optimization. Lagrange multipliers
To minimize f(x, y)
Level/contour curve of f(x, y)
subject to the constraint g(x, y) = 0 is to find
the smallest value of c such that the level
curve f(x, y) = c intersects g(x, y) = 0.

Two curves are tangent at (x0, y0) and

their gradients are parallel.
f(x0, y0) = g(x0, y0) for some scalar .

: Lagrange multiplier
L(x, y, ) := f(x, y) + g(x, y) is called g(x, y) = 0
Lagrangian

2/25/2023 Chapter 7 - Continuous Optimization 26

Constrained Optimization. Lagrange multipliers
Ex0. Minimize f(x, y) = x2 + 2y2 s.t x2 + y2 = 1.

2/26/2023 Chapter 7 - Continuous Optimization 27

Constrained Optimization. Lagrange multipliers
Ex1. Minimize f(x, y) = x2 + y2 s.t x – y = 1.

Set 0 = g(x, y) = x – y – 1
and L(x, y, ) = f(x, y) + g(x, y) = x2 + y2 + (x – y – 1)
We find all values of (x0, y0, ) such that
Lx(x0,y0,)= 0, Ly(x0,y0,) = 0, and L(x0,y0,) = 0 // partial derivatives of L
2x0 +  = 0, 2y0 -  = 0, and x0 – y0 – 1 = 0
 x0 = 1/2, y0 = -1/2,  = -1
The minimum value of f s.t. x – y – 1 = 0 is f(x0,y0) = ½.
(Note that we can use the fact y = x – 1 and plug in f(x, y) = x2 + y2.)

2/25/2023 Chapter 7 - Continuous Optimization 28

Constrained Optimization. Lagrange multipliers
Ex2. Minimize f(x, y) = 2x + y
s.t x2 + y2 = 1.

Set 0 = g(x, y) = x2 + y2 – 1
and L(x, y, ) = f(x, y) + g(x, y) = 2x + y + (x2 + y2 – 1)
We find all values of (x0, y0, ) such that
Lx(x0,y0,)= 0, Ly(x0,y0,) = 0, and L(x0,y0,) = 0
 2 + 2x0 = 0, 1 + 2y0 = 0, and x02 + y02 – 1 = 0
 (x0, y0, ) = (2/5,1/5,-5/2), (x0, y0, ) = (-2/5,-1/5, 5/2)
The minimum value of f s.t. g(x, y) = 0 is f(-2/5, -1/5) = -5.

2/25/2023 Chapter 7 - Continuous Optimization 29

Constrained Optimization. Lagrange multipliers
For real-valued functions f : D → , we consider the constrained
optimization problem
minxf(x)
subject to gi(x)  0 for all i = 1, . . . , m

For λ = [λ1 λ2 … λm]T, λi  0, Lagrange multipliers λi

set L(x, λ) := f(x) + λTg(x) // Lagrangian

2/25/2023 Chapter 7 - Continuous Optimization 30

Dual Lagrangian
In general, duality in optimization is the idea of converting an optimization
problem in one set of variables x (called the primal variables) into another one in
a different set of variables λ (called the dual variables).

Primal problem minx f(x)

s. t. gi(x)  0, for all i = 1, 2, …, m
Dual Lagrangian
D() = minxL(x, ) Lagrange multipliers are named
after the French-Italian
mathematician Joseph-Louis
Lagrange (1736–1813).
Lagrangian dual problem maxD()
s. t.   0

2/25/2023 Chapter 7 - Continuous Optimization 31

Weak duality vs Strong duality minimax  maximin

f(x) D()
f(x) D() f(x)
f(x)

D() D()

Weak duality Strong duality:

minxmaxλL(x, λ)  maxλminxL(x, λ) minxmaxλL(x, λ) = maxλminxL(x, λ)
f(·) and gi(·) may be nonconvex f(·) and gi(·) are convex
D(λ) = minxL(x, λ) is concave even though f(·) and gi(·) may be nonconvex. The outer problem, maximization
D(λ) over λ, is the maximum of a concave function and can be efficiently computed.

2/25/2023 Chapter 7 - Continuous Optimization 32

Convex Optimization
• Convex optimization problem
• f(·) is a convex function,
• the constraints involving g(·) and h(·) are convex sets
 strong duality: The optimal solution of the dual problem is the same
as the optimal solution of the primal problem

2/25/2023 Chapter 7 - Continuous Optimization 33

Convex Optimization
Problem
minxf(x)
subject to gi(x)  0 for all i = 1, . . . , m
hj(x) = 0 for all j = 1, . . . , n,

where all functions f(x) and gi(x) are convex functions,

and all hj(x) = 0 are convex sets

2/27/2023 Chapter 7 - Continuous Optimization 34

Convex optimization
Ex3. Minimize f(x, y) = x2 – 4y s.t. g(x, y) = y2 – 2x  0

L(x, y, ) = f(x, y) + g(x, y) = x2 – 4y + (y2 – 2x)

Lx = 0, and Ly = 0
 2x - 2 = 0, and -4 + 2y = 0
 x = , and y = 2/
min(x,y)L(x, y, ) = -2 – 4/ =: D()
3 3
 max0 D() = D( 2) = -3 4
3
 Result = -3 4

2/25/2023 Chapter 7 - Continuous Optimization 35

Example
min 2 x 2y 3
x ,y
• Consider the problem
s .t . x 2 y2 4
1/ Find the Lagrangian L(x, y, )

2/ Find the dual Lagrangian D()

2/25/2023 Chapter 7 - Continuous Optimization 36

Linear Programming
• Consider the special case when all the preceding functions are linear,
i.e.,
minx cTx
subject to Ax  b,
where A ∈ m×d and b ∈ m
• This is known as a linear program, which has d variables and m linear
constraints

2/25/2023 Chapter 7 - Continuous Optimization 37

Linear program - Ex
• Consider the linear program
5 T x1
min −
𝑥∈2 3 x2

2 2 33
subject to  2 4  8
 x   
 2 1   1    5 
   x2   
 0 1  1
 0 1   8 

2/26/2023 Chapter 7 - Continuous Optimization 38

Linear program - Ex
• Consider the linear program
T
5 x1
min −
𝑥∈2 3 x2

2 2 33
 2 4  8
 x   
subject to  2 1  1   5 
 
   x2   
 0 1  1
 0 1   8 

2/26/2023 Chapter 7 - Continuous Optimization 39

Linear program – Exercise
• Consider the linear programming

• Write the program in standard form (matrix notation).

2/26/2023 Chapter 7 - Continuous Optimization 40

Linear program - Lagrangian
• The Lagrangian is given by
L(x, λ) = cTx + λT(Ax − b)
= (c + ATλ)Tx − λTb
𝜕
• L(x, λ) = 0  c + ATλ = 0
𝜕x
• Therefore, the dual Lagrangian is
D(λ) = minx L(x, λ) = −λTb,
And we would like to find maxλ0D(λ)

2/25/2023 Chapter 7 - Continuous Optimization 41

Linear program - Dual program
• The dual optimization problem
maxλ (− bTλ)
subject to c + ATλ = 0,
m  λ  0
This is also a linear program, but with m variables
• We have two choices
• Solve the primal program for d variables
• Solve the dual program for m variables

2/25/2023 Chapter 7 - Continuous Optimization 42

Linear program - Lagrangian
• Lagrangian 33
• D(λ) = minx L(x, λ) = −λTb 8
 
= [-1 -2 -3 -4 -5] 5
 
 1
 8 
 D(λ) = -331 -82 -53 +4 -85

2/25/2023 Chapter 7 - Continuous Optimization 43

Example
• Consider the linear program
min 2
2x 1 x2
x 1,x 2

1 2 1
x1
s .t . 3 1 4
x2
2 3 3

• Find the dual Lagrangian D()

2/25/2023 Chapter 7 - Continuous Optimization 44

Quadratic Programming
• Consider the problem
1 T
minx x Qx + cTx
2
subject to Ax  b,
where A ∈ m×d, b ∈ m, and c ∈ d,
Q ∈ d×d is positive definite (and therefore the objective function is
convex)
• This is known as a quadratic program with d variables and m linear
constraints

2/25/2023 Chapter 7 - Continuous Optimization 45

Quadratic Programming – Ex

The optimal value must lie in the shaded region, and is indicated by the star

2/25/2023 Chapter 7 - Continuous Optimization 46

Quadratic Programming – Exercise
Consider the quadratic programming

Write the program in standard form (matrix notation).

2/26/2023 Chapter 7 - Continuous Optimization 47

Quadratic Programming - Lagrangian
• The Lagrangian is given by
1
L(x, λ) = xTQx + cTx + λT(Ax − b)
2
1
= xTQx + (c + ATλ)Tx − λTb,
2
Taking the derivative of L(x, λ) with respect to x and setting it to zero
gives
Qx + (c + ATλ) = 0
Assuming that Q is invertible, we get
x = −Q−1(c + ATλ)

2/25/2023 Chapter 7 - Continuous Optimization 48

Quadratic Programming – Dual Lagrangian
• The dual Lagrangian
1
D(λ) = − (c + ATλ)TQ−1(c + ATλ) − λTb
2
Therefore, the dual optimization problem is given by
1
maxλ − (c + ATλ)TQ−1(c + ATλ) − λTb
2
subject to λ  0

We will see an application of quadratic programming in ML in Chapter

12 Support Vector Machines

2/25/2023 Chapter 7 - Continuous Optimization 49

Example
T
1 x1 2 2
• Consider the linear program min x1 x 2
x 1,x 2 2
2 x2 2 4
2 1 1
x1
s .t . 3 2 2
x2
1 1 3

• Find the dual Lagrangian D()

2/25/2023 Chapter 7 - Continuous Optimization 50

THANKS

2/25/2023 Chapter 7 - Continuous Optimization 51

Digital Communications Fundamentals and Ap - Bernard Sklar Book Part 1
83% (6)
Digital Communications Fundamentals and Ap - Bernard Sklar Book Part 1
310 pages
Math 273a: Optimization: Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015
No ratings yet
Math 273a: Optimization: Instructor: Wotao Yin Department of Mathematics, UCLA Fall 2015
17 pages
Numopt 0
No ratings yet
Numopt 0
163 pages
Continuous Optimization - Vaithilingam Jeyakumar, Alexander Rubinov
100% (1)
Continuous Optimization - Vaithilingam Jeyakumar, Alexander Rubinov
453 pages
A First Course in Linear Optimization
No ratings yet
A First Course in Linear Optimization
196 pages
Konveksna Optimizacija
No ratings yet
Konveksna Optimizacija
179 pages
15 Optimization Script
No ratings yet
15 Optimization Script
62 pages
Cours D'optimisation
No ratings yet
Cours D'optimisation
159 pages
Convex Optimization With Engineering Applications
100% (1)
Convex Optimization With Engineering Applications
38 pages
Continuous Optimization
No ratings yet
Continuous Optimization
23 pages
Chapter 07
No ratings yet
Chapter 07
20 pages
Optimization PDF
No ratings yet
Optimization PDF
59 pages
Convexity II: Optimization Basics: Ryan Tibshirani Convex Optimization 10-725
No ratings yet
Convexity II: Optimization Basics: Ryan Tibshirani Convex Optimization 10-725
28 pages
Convex Optimization - Introduction (S.l. Dr. Ing. Carmen Voicu)
No ratings yet
Convex Optimization - Introduction (S.l. Dr. Ing. Carmen Voicu)
32 pages
MOSEKModeling Cookbook
No ratings yet
MOSEKModeling Cookbook
127 pages
CS675: Convex and Combinatorial Optimization Fall 2019 Convex Optimization Problems
No ratings yet
CS675: Convex and Combinatorial Optimization Fall 2019 Convex Optimization Problems
62 pages
OpTimIzation Overview
No ratings yet
OpTimIzation Overview
47 pages
BV Cvxslides PDF
No ratings yet
BV Cvxslides PDF
301 pages
Convex Functions
No ratings yet
Convex Functions
13 pages
1.1 Mathematical Optimization
No ratings yet
1.1 Mathematical Optimization
8 pages
Sparsity and Its Mathematics
No ratings yet
Sparsity and Its Mathematics
44 pages
1 Introduction
No ratings yet
1 Introduction
24 pages
MOSEKModelingCookbook Letter
No ratings yet
MOSEKModelingCookbook Letter
131 pages
Lecture 01 - Intro
No ratings yet
Lecture 01 - Intro
20 pages
Few AIML Lab Viva QA
No ratings yet
Few AIML Lab Viva QA
3 pages
Notes HQ
No ratings yet
Notes HQ
96 pages
01 Intro Notes Cvxopt f22
No ratings yet
01 Intro Notes Cvxopt f22
25 pages
Math562 ContinuousOptimization
No ratings yet
Math562 ContinuousOptimization
126 pages
Transportation Problems
100% (1)
Transportation Problems
30 pages
Ejercicios Interpolacion Lagrange
No ratings yet
Ejercicios Interpolacion Lagrange
1 page
Unit 2 Introduction To Deep Learning
No ratings yet
Unit 2 Introduction To Deep Learning
79 pages
Ex. No: 9.a Power Spectral Density Estimation Date: by Bartlett Method
No ratings yet
Ex. No: 9.a Power Spectral Density Estimation Date: by Bartlett Method
7 pages
The Applications and Simulation of Adaptive Filter in Speech Enhancement
No ratings yet
The Applications and Simulation of Adaptive Filter in Speech Enhancement
7 pages
1 Intro
No ratings yet
1 Intro
25 pages
ConvexOptimization Boyd Slides
No ratings yet
ConvexOptimization Boyd Slides
394 pages
Introduction To Hill Climbing
No ratings yet
Introduction To Hill Climbing
5 pages
Cheatsheet
No ratings yet
Cheatsheet
2 pages
Optimisation
No ratings yet
Optimisation
38 pages
Matinf 2360 Part 3
No ratings yet
Matinf 2360 Part 3
106 pages
Optimizatio With Matlab
No ratings yet
Optimizatio With Matlab
49 pages
DAA Quiz Answers
No ratings yet
DAA Quiz Answers
5 pages
Haar Wavelet
No ratings yet
Haar Wavelet
56 pages
Math Chapter 7
No ratings yet
Math Chapter 7
4 pages
Constrained Op Tim Ization
No ratings yet
Constrained Op Tim Ization
76 pages
ML Module 5 Full Notes
No ratings yet
ML Module 5 Full Notes
23 pages
10 Convex Optimisation
No ratings yet
10 Convex Optimisation
31 pages
Mathematics Behind Logistic Regression Model 1598272636
No ratings yet
Mathematics Behind Logistic Regression Model 1598272636
6 pages
Quadratic Equations #BB2.0
100% (2)
Quadratic Equations #BB2.0
117 pages
Lect5 Removed
No ratings yet
Lect5 Removed
35 pages
Classification of Optimization Methods
No ratings yet
Classification of Optimization Methods
68 pages
Lecture#02, DAA, Designing Algorithms, Calculating Costs
No ratings yet
Lecture#02, DAA, Designing Algorithms, Calculating Costs
24 pages
CZ4032 Data Analytics & Mining Notes
No ratings yet
CZ4032 Data Analytics & Mining Notes
16 pages
Machine Learning 2: Exercise Sheet 1
No ratings yet
Machine Learning 2: Exercise Sheet 1
2 pages
Opte - Optimization
No ratings yet
Opte - Optimization
125 pages
ConvexSpring25 Week 1 2
No ratings yet
ConvexSpring25 Week 1 2
46 pages
Week 10 Notes MLF
No ratings yet
Week 10 Notes MLF
20 pages
Computational OPT Book 2023 Chapter 01
No ratings yet
Computational OPT Book 2023 Chapter 01
18 pages
Newton Raphson (NM)
No ratings yet
Newton Raphson (NM)
28 pages
9 Lectures 2025
No ratings yet
9 Lectures 2025
398 pages
Optimization
No ratings yet
Optimization
49 pages
03 - Line Rasterization
No ratings yet
03 - Line Rasterization
42 pages
Lectures HD
No ratings yet
Lectures HD
301 pages
Lesson 5 6 Linear Regression Prerequisites II
No ratings yet
Lesson 5 6 Linear Regression Prerequisites II
10 pages
IO621PE: MACHINE LEARNING (Professional Elective - II) B.Tech. III Year II Sem. L T P C 3 0 0 3 Course Objectives
No ratings yet
IO621PE: MACHINE LEARNING (Professional Elective - II) B.Tech. III Year II Sem. L T P C 3 0 0 3 Course Objectives
1 page
Setting Parameters of A Deep Neural Network - Hierarchical Representations
No ratings yet
Setting Parameters of A Deep Neural Network - Hierarchical Representations
10 pages
Lecture 7
No ratings yet
Lecture 7
46 pages
Important Questions For DES
No ratings yet
Important Questions For DES
4 pages
Optimization
No ratings yet
Optimization
16 pages
47-Insert Interval (Medium) - Grokking The Coding Interview - Patterns For Coding Questions
No ratings yet
47-Insert Interval (Medium) - Grokking The Coding Interview - Patterns For Coding Questions
8 pages
CH 4
No ratings yet
CH 4
28 pages
Data Structure and Algorithm Analysis
No ratings yet
Data Structure and Algorithm Analysis
2 pages
2025 Lecture03 AdversarialSearch
No ratings yet
2025 Lecture03 AdversarialSearch
51 pages
2021 Mts 5d 03 - 10673 - Linear Mathematical Models
No ratings yet
2021 Mts 5d 03 - 10673 - Linear Mathematical Models
3 pages
Tdondesh 1
No ratings yet
Tdondesh 1
2 pages
Chapter 7: Continuous Optimization (Math For Machine Learning)
No ratings yet
Chapter 7: Continuous Optimization (Math For Machine Learning)
65 pages
AI
No ratings yet
AI
9 pages
ML Unit 01
No ratings yet
ML Unit 01
4 pages
Digital Control 4 - Lecture Notes 2023
No ratings yet
Digital Control 4 - Lecture Notes 2023
29 pages
ConvexSpring25 Week9
No ratings yet
ConvexSpring25 Week9
26 pages
ML Lab Report
No ratings yet
ML Lab Report
6 pages
Important Questions
No ratings yet
Important Questions
4 pages
Opt Cours
No ratings yet
Opt Cours
67 pages
Hybridization Approach To Hippopotamus Optimization Algorithm (HO) Using Particle Swarm Optimization (PSO)
No ratings yet
Hybridization Approach To Hippopotamus Optimization Algorithm (HO) Using Particle Swarm Optimization (PSO)
6 pages
Chap04 ConvexOptimizationBasics
No ratings yet
Chap04 ConvexOptimizationBasics
29 pages
Atic L5
No ratings yet
Atic L5
25 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Solving Math Problems
From Everand
Solving Math Problems
George N. Frempong
No ratings yet

Continuous Optimization

Uploaded by

Continuous Optimization

Uploaded by

7 Continuous Optimization

• Since machine learning algorithms are implemented

2/25/2023 Chapter 7 - Continuous Optimization 2

2/25/2023 Chapter 7 - Continuous Optimization 3

2/25/2023 Chapter 7 - Continuous Optimization 4

2/25/2023 Chapter 7 - Continuous Optimization 5

 = 0.01: Slowly converges  = 3: Fail to converge

2/25/2023 Chapter 7 - Continuous Optimization 6

2/25/2023 Chapter 7 - Continuous Optimization 7

2/25/2023 Chapter 7 - Continuous Optimization 8

2/25/2023 Chapter 7 - Continuous Optimization 9

2/25/2023 Chapter 7 - Continuous Optimization 10

2/25/2023 Chapter 7 - Continuous Optimization 11

Without momentum With momentum

2/25/2023 Chapter 7 - Continuous Optimization 12

With momentum With momentum

2/25/2023 Chapter 7 - Continuous Optimization 13

2/25/2023 Chapter 7 - Continuous Optimization 14

very expensive evaluations

2/25/2023 Chapter 7 - Continuous Optimization 15

2/25/2023 Chapter 7 - Continuous Optimization 16

convex Intersection is a convex set

2/26/2023 Chapter 7 - Continuous Optimization 17

2/25/2023 Chapter 7 - Continuous Optimization 18

A function f :  →  is called a concave function if

Note. A concave function is the negative of a convex function

2/25/2023 Chapter 7 - Continuous Optimization 19

2/25/2023 Chapter 7 - Continuous Optimization 20

• f(x) is convex if and only if Hessian ∇x2f(x) is positive semidefinite

2/25/2023 Chapter 7 - Continuous Optimization 21

Hessian ∇x2f(x) = (1/x)log2e > 0,

2/25/2023 Chapter 7 - Continuous Optimization 22

2/26/2023 Chapter 7 - Continuous Optimization 23

2/26/2023 Chapter 7 - Continuous Optimization 24

2/26/2023 Chapter 7 - Continuous Optimization 25

Two curves are tangent at (x0, y0) and

2/25/2023 Chapter 7 - Continuous Optimization 26

2/26/2023 Chapter 7 - Continuous Optimization 27

2/25/2023 Chapter 7 - Continuous Optimization 28

2/25/2023 Chapter 7 - Continuous Optimization 29

For λ = [λ1 λ2 … λm]T, λi  0, Lagrange multipliers λi

set L(x, λ) := f(x) + λTg(x) // Lagrangian

2/25/2023 Chapter 7 - Continuous Optimization 30

Primal problem minx f(x)

2/25/2023 Chapter 7 - Continuous Optimization 31

Weak duality Strong duality:

2/25/2023 Chapter 7 - Continuous Optimization 32

2/25/2023 Chapter 7 - Continuous Optimization 33

where all functions f(x) and gi(x) are convex functions,

2/27/2023 Chapter 7 - Continuous Optimization 34

L(x, y, ) = f(x, y) + g(x, y) = x2 – 4y + (y2 – 2x)

2/25/2023 Chapter 7 - Continuous Optimization 35

2/ Find the dual Lagrangian D()

2/25/2023 Chapter 7 - Continuous Optimization 36

2/25/2023 Chapter 7 - Continuous Optimization 37

2/26/2023 Chapter 7 - Continuous Optimization 38

2/26/2023 Chapter 7 - Continuous Optimization 39

• Write the program in standard form (matrix notation).

2/26/2023 Chapter 7 - Continuous Optimization 40

2/25/2023 Chapter 7 - Continuous Optimization 41

2/25/2023 Chapter 7 - Continuous Optimization 42

2/25/2023 Chapter 7 - Continuous Optimization 43

• Find the dual Lagrangian D()

2/25/2023 Chapter 7 - Continuous Optimization 44

2/25/2023 Chapter 7 - Continuous Optimization 45

2/25/2023 Chapter 7 - Continuous Optimization 46

Write the program in standard form (matrix notation).

2/26/2023 Chapter 7 - Continuous Optimization 47

2/25/2023 Chapter 7 - Continuous Optimization 48

We will see an application of quadratic programming in ML in Chapter

2/25/2023 Chapter 7 - Continuous Optimization 49

• Find the dual Lagrangian D()

2/25/2023 Chapter 7 - Continuous Optimization 50

2/25/2023 Chapter 7 - Continuous Optimization 51

You might also like