Op Tim Ization Notes
Op Tim Ization Notes
Optimization
Nirav Bhatt
Email: [email protected]
Office: BT 307 Block II
Biotechnology Department
m
1 X
min − (yi log(pi (x, θ)) + (1 − yi )log(1 − pi (x, θ))
θ m
i=1
I Types of constraints
I Linear Constraints
Ax ≤ b (Inequality )
g(x) ≤ 0 (Inequality )
h(x) = 0 (Equality )
I Bounds: xlb ≤ x ≤ xub
I x ∈ S For example, S = {−1, 0, 1}
I Constrained Optimization
max f (x)
x
s.t. gi (x) ≤ bi , i = 1, . . . , p
(2)
aTj x = cj , j = 1, . . . , q
LB
x ≤ x ≤ xUB
ZT
max J[x] = f (t, x(t), ẋ(t))dt
(3)
t=1
s.t. x(t) ≥ 0, x(0) = x0
I Objective: Find the function x(t) that maximizes the functional
J[x]
I x ∗ (t): Optimal trajectory
min cT x
x
s.t. Ax ≤ b,
(4)
Cx = d,
xLB ≤ x ≤ xUB
I Quadratic Programming
min xT Hx + cT x
x
s.t. Ax ≤ b,
(5)
Cx = d,
xLB ≤ x ≤ xUB
H: Symmetric matrix
September 27, 2024 10/44
Optimization
I Nonlinear Programming
min f (x)
x
s.t. gi (x) ≤ 0, i = 1, . . . , M,
(6)
hj (x) = 0, j = 1, . . . , N
xLB ≤ x ≤ xUB
min xT Hx + cT x
x
s.t. Ax ≤ b,
(7)
Cx = d,
some x are integer
I Example
I ∂f ∗ ⇒ 4000 − 2x = 0
∂x (x ) = 0
∗
x = 2000
I ∂2f
∂x 2
= −2 < 0
p = 1, 2, 3 . . . , ∞
I Example
I Local Minimizer
A point x∗ is a local minimizer if there is a neighborhood N of
x∗ such that f (x∗ ) ≤ f (x) for all x ∈ N .
I Global Minimizer
A point x∗ is a global minimizer if f (x∗ ) ≤ f (x) for all x
I x∗ is a stationary point if ∇f (x∗ ) = 0.
min f (x)
(12)
s.t. hi (x) = 0, i = 1, 2, . . . , m
hi (x ∗ ) = 0, i = 1, 2, . . . , m
I Second-order necessary condition
where
m
X
∗ ∗ ∗
2
Lxx (x , λ ) = ∇ f (x ) + λ∗i ∇2 hi (x ∗ )
i=1
Lxx(x ∗ , λ∗ )
is positive definite on the tangent space defined by
∗
∇hi (x ).w = 0 the tangent space
2
1 (1,1)
2
0
2
−1
1
2
x1
2
2
−1
1
−2
2
−1
2
(−1,−1)
−2
2
min f (x)
s.t. hi (x) = 0, i = 1, . . . , m
s.t. gj (x) ≤ 0, j = 1, . . . , n
w T ∇2 L(x ∗ λ∗ , µ∗ ) w ≥ 0, ∀w ∈ T (x ∗ )
j=1
(b) µ1 , µ2 < 0,
(c) µ1 > 0, µ2 < 0,
(d) µ1 , µ2 > 0
w T ∇2 L(x ∗ λ∗ , µ∗ ) w ≥ 0, ∀w ∈ T (x ∗ )
j=1
(b) µ1 , µ2 < 0,
(c) µ1 > 0, µ2 < 0,
(d) µ1 , µ2 > 0
I Optimization problem
min x12 + 2x22
s.t. x1 + x2 ≥ 3
s.t. x2 − x12 ≥ 1
I The Lagrangian function:
I Optimization Algorithms:
generate a sequence of iterates: {xk }∞
0
I Termination Criteria:
I No more progress can be made
I A solution has been approximated with sufficient accuracy
I How to generate a new point xk +1 from xk
Use information about f at xk and/or previous iteration points
I xk +1 must be such that f (xk +1 ) < f (xk )
I Strategies to find a new xk +1 :
I Line search
I Trust region
I Strategies
I Line search
I Trust region
I Line search Strategy: Choose a direction pk from xk so that
f (xk +1 ) < f (xk )
I xk +1 = xk + αpk , α > 0
I Choose a direction pk and find a step length α.
I Objective function:
I Step size α
I pk : Provide direction
I αk : Helps in reducing f value
I Challenge: Many evaluations of f (some time ∇f )
I mk : Quadratic approximation
w w w
What should be ∆ w?
∆w = −η∇Lossw⇒ w − η∇Lossw, where η : Learning rate
September 27, 2024 38/44
Gradient Descent
Learning Rate
Loss Loss
function function
w w
wk +1 = wk − η∇w L(w, xi , yi ) + βv
2
Schaul, et al. "No more pesky learning rates." International Conference on Machine Learning. 2013.
2
Schaul, et al. "No more pesky learning rates." International Conference on Machine Learning. 2013.