0% found this document useful (0 votes)

4 views

optimization techniques lec notes

The document provides lecture notes on optimization, covering topics such as unconstrained optimization, convex optimization, and constrained optimization. It outlines the steps involved in solving optimization problems, including modeling, understanding problem characteristics, selecting solvers, and implementing functions. Additionally, it discusses necessary and sufficient optimality conditions for identifying minimizers in optimization problems.

Uploaded by

quantraptors

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

optimization techniques lec notes

Uploaded by

quantraptors

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

Lecture Notes

Introduction to Optimization
Fall Semester 2022

Roland Herzog∗

2023-03-29

∗ Interdisciplinary
Center for Scientific Computing, Heidelberg University, 69120 Heidelberg, Germany
([email protected], https://scoop.iwr.uni-heidelberg.de/team/rherzog/).
R. Herzog cbn

Material for 4 lectures of about 2 hours each.

2 https://scoop.iwr.uni-heidelberg.de/teaching/2022ws/short-course-optimization/ 2023-03-29
Contents

§1 Introduction 5
§2 Unconstrained Optimization and Parameter Estimation 9
§ 2.1 Algorithmic Sketches for Generic Unconstrained Problems 9
§ 2.2 Algorithmic Sketches for Unconstrained Least-Squares Problems 11
§3 Convex Optimization 17
§4 Constrained Optimization 23
§5 Infinite-Dimensional Optimization 30
§6 Summary and Practical Advice 32
§7 Solutions 33
§ 7.1 Solution of Example 2.2 (Rosenbrock) using CasADi (Problem-Based) 33
§ 7.2 Solution of Example 2.2 (Rosenbrock) using CasADi (Opti Interface-Based) 34
§ 7.3 Solution of Example 2.3 (Airplane) using CasADi (Problem-Based) 35
§ 7.4 Solution of Example 4.1 (Production) using Matlab’s Optimization Toolbox
(Solver-Based) 36
§ 7.5 Solution of Example 4.1 (Production) using Matlab’s Optimization Toolbox
(Problem-Based) 37
§ 7.6 Solution of Example 4.1 (Production) using scipy.optimize 38
§ 7.7 Solution of Example 4.1 (Production) using CasADi (Problem-Based) 39
§ 7.8 Solution of Example 4.1 (Production) using CasADi (Opti Interface-Based) 40
§ 7.9 Solution of Example 4.12 (Post-Office) using CasADi (Opti Interface-Based) 41
§ 7.10 Solution of Example 5.1 (van der Pol) using CasADi (Problem-Based) 42

https://scoop.iwr.uni-heidelberg.de/teaching/2022ws/short-course-optimization/
cbn Introduction to Optimization

§1 Introduction

Optimization is about solving problems of the form

Minimize 𝑓 (𝑥) where 𝑥 ∈ R𝑛 (objective (function)) 



subject to 𝑔𝑖 (𝑥) ≤ 0 with 𝑖 = 1, . . . , 𝑛 ineq (inequality constraints) (1.1)



and ℎ 𝑗 (𝑥) = 0 with 𝑗 = 1, . . . , 𝑛 eq . (equality constraints)





𝑥 is termed the optimization variable, decision variable, or simply the variable of the problem.
Moreover, we consider problems where

• the functions 𝑓 , 𝑔𝑖 , ℎ 𝑗 : R𝑛 → R are sufficiently smooth,

• the number 𝑛 ineq of inequality constraints and the number 𝑛 eq of equality constraints are finite
(possibly zero).

More precisely, problems of type (1.1) fall into the realm of continuous optimization. In case some
or all components of 𝑥 are to be sought in a discrete set (such as the integers Z), we have a mixed
discrete-continuous problem or even a fully discrete problem, which we do not consider in this
course.

Solving an optimization problem usually includes the following steps:

(1) Modeling the problem.

This step can be one of the most complex ones in the entire process. In particular, we need to
answer the following questions:

(a) What do we use as our optimization variable(s)? What is the significance of each component
of 𝑥?
(b) How do we formulate the objective function? What is its significance?
(c) Do we need to formulate constraints? How do we model them? What is the significance of
each of the constraints?
(d) Are there any constants in the problem? What are their values and what is their signifi-
cance?

(2) Understanding the problem’s characteristics.

There are various dimensions along which you can classify optimization problems. Here are
some examples:

(a) What is the dimension 𝑛 of the optimization variable 𝑥 ∈ R𝑛 ?

(b) Is the problem constrained or unconstrained?
(c) Is the objective linear or nonlinear? Are the constraints (if any) linear or nonlinear?
(d) Is the problem convex or non-convex?

The answer to these questions has an impact on the next step, the selection of a suitable solver.
The more specifically the solver fits the problem, the more efficient the solution process usually

https://scoop.iwr.uni-heidelberg.de/teaching/2022ws/short-course-optimization/ 5
R. Herzog cbn

is. For instance, a linear optimization problem (where the objective and all constraints are affine
functions of 𝑥) is best solved using a solver tailored to linear optimization problems (such as an
implementation of the simplex method). It is “overkill” to solve a linear optimization problem
using a solver capable of solving general nonlinear constrained problems and the more general
optimization procedure will typically perform poorer than specialized routines (although it
should usually succeed).

(3) Picking a numerical solver.

Pick your solver depending on the problem’s characteristic as well as availability/license and
your preferred programming language.

The following resources can help you identify suitable solvers:

• Hans Mittelmann’s Decision Tree for Optimization Software

• the Optimization Tree of the NEOS project (a free internet-based service for solving
optimization problems)
• the list of available solvers in the COIN-OR project
• the respective documentation of various optimization libraries, such as
⊲ Matlab’s Optimization Toolbox
⊲ the scipy.optimize Python package
⊲ the CasADi package, which has Matlab, Python and C++ bindings

(4) Implementing the functions describing the problem for the solver.
You will need to implement the objective and constraint functions in a format suitable for the
solver selected in the respective programming language (such as Matlab, Python, Julia, or
C++ for instance).

There are also specific modeling languages for optimization problems, e. g., AMPL, which are, in
principle, independent of the solver. However, not all solvers will accept all input formats.

Finally, some solver packages such as CasADi (Python, Matlab) and JuMP (Julia), include
their own modeling paradigms which facilitate the formulation of optimization problems.

(5) Running the solver.

Launch the solver selected for the problem. In some cases, you may want to modify the default
settings (such as subalgorithms, tolerances, maximum number of iterations etc.). Most solvers
will accept an initial guess (often termed 𝑥 0 ) for the solution, which is often beneficial to set to a
reasonable value.

(6) Checking and interpreting the result.

Carefully check the flag or message returned by the solver, i. e., whether it has converged or not.
When successful, interpret and verify the result. Check that the solution “makes sense”. Are all
the constraints satisfied as expected? Are you happy with the result, or did you perhaps expect
something different? Did you forget to include certain constraints? (A classical example is that
you expect the solution to be non-negative but forget to model this constraint.)

6 https://scoop.iwr.uni-heidelberg.de/teaching/2022ws/short-course-optimization/ 2023-03-29
cbn Introduction to Optimization

The above steps are not necessarily run through sequentially. For instance, it might turn out later in
the process that a different formulation of the problem than originally selected is more favorable for
the solver, so one needs to go back. Or we might figure out from the solution that we forgot to model
certain constraints, which we then include and run the solver again.

It is also important to understand that, given a problem in text form, the above steps do not have
a predetermined outcome. It is up to us to model the problem so that we can identify (or design
ourselves) a numerical solver that can efficiently solve the problem. Your modeling skills will get better
with practice, and this process is partially based on trial-and-error.

Example 1.1 (Choice of optimization variables). This examples illustrates that there is no single correct
answer to modeling a problem. For instance, what we choose to be our optimization variables is a modeling
decision. Consider, for instance, an investor, who seeks to distribute a certain amount of money 𝑀 between
a bank account with fixed interest and a stock investment with uncertain future stock prices. The investor’s
objective may be to maximize the expected amount of their wealth at a certain time in the future.

The investor may choose optimization variables

𝑥1 amount of money invested in the bank account,

𝑥2 amount of money invested in the stock.

We are not specifying the problem any further here, but we mention that it is likely going to contain the
constraints 𝑥 1 ≥ 0, 𝑥 2 ≥ 0 and 𝑥 1 + 𝑥 2 = 𝑀.

Alternatively, instead of determining how much money to put in either investment, the investor may
describe their optimization variable using a single decision, namely what fraction (“percentage”) of the
money 𝑀 is going to be invested in the bank account. That is, the investor may choose their optimization
variable as
𝑥 1 fraction of 𝑀 to be invested in the bank account,
which is subject to the constraint 0 ≤ 𝑥 1 ≤ 1. Naturally, the remainder fraction 1 − 𝑥 1 of 𝑀 is going to be
invested in the stock.

We now discuss what it means to solve an optimization problem, i. e., to find and recognize minimiz-
ers.

Definition 1.2 (Basic concepts).

(𝑖) The set of points 𝑥 ∈ R𝑛 satisfying all the constraints, i. e.,

𝐹 B 𝑥 ∈ R𝑛 𝑔𝑖 (𝑥) ≤ 0 for all 𝑖 = 1, . . . , 𝑛 ineq and ℎ 𝑗 (𝑥) = 0 for all 𝑗 = 1, . . . , 𝑛 eq (1.2)

is termed the feasible set. Every 𝑥 ∈ 𝐹 is called a feasible point.

(𝑖𝑖) The inequality 𝑔𝑖 (𝑥) ≤ 0 is said to be active at the point 𝑥 in case of 𝑔𝑖 (𝑥) = 0. It is called inactive
in case 𝑔𝑖 (𝑥) < 0. It is called violated in case 𝑔𝑖 (𝑥) > 0.

https://scoop.iwr.uni-heidelberg.de/teaching/2022ws/short-course-optimization/ 7
R. Herzog cbn

(𝑖𝑖𝑖) A point 𝑥 ∗ ∈ 𝐹 is a global minimizer or global solution of (1.1) if

𝑓 (𝑥 ∗ ) ≤ 𝑓 (𝑥) for all 𝑥 ∈ 𝐹 .

(𝑖𝑣) A point 𝑥 ∗ ∈ 𝐹 is a local minimizer or local solution of (1.1) if there exists a (potentially small)
neighborhood of 𝑥 ∗1 𝑈 (𝑥 ∗ ) such that

𝑓 (𝑥 ∗ ) ≤ 𝑓 (𝑥) for all 𝑥 ∈ 𝐹 ∩ 𝑈 (𝑥 ∗ ).

While Definition 1.2 states what minimizers are, the definition is not very helpful when it comes to
checking whether a given point 𝑥 ∗ is indeed, say, a local minimizer. In order to verify this on the
grounds of the definition would require us to compare the value of the objective 𝑓 (𝑥 ∗ ) with the value
of the objective 𝑓 (𝑥) at an (uncountable!) number of nearby points 𝑥! This is clearly impractical.

Therefore, we need to resort to different ways of verifying that a certain point is a minimizer, and also of
excluding certain points from the list of candidate minimizers. To this end, we can use derivative-based
optimality conditions. Optimality conditions come in two flavors:

(1) A necessary optimality condition (NC) is a statement of the following form: “Suppose that
𝑥 ∗ is a local minimizer of problem (1.1). Then (NC) holds at 𝑥 ∗ .”

Necessary optimality conditions can be used in the following ways:

(a) Finding points 𝑥 ∗ ∈ R𝑛 that satisfy (NC) means to compile a list of candidates for local
minimizers. Those can then by further examined by other means.

(b) A necessary optimality condition can also be read as:2 “Suppose that (NC) does not hold at
𝑥 ∗ , then 𝑥 ∗ cannot be a local minimizer.” Consequently, necessary optimality conditions
can be used to exclude points from the search for local minimizers.

(2) A sufficient optimality condition (SC) is a statement of the following form: “Suppose that
(SC) holds at 𝑥 ∗ . Then 𝑥 ∗ is a local minimizer.”

Sufficient optimality conditions thus allow us to verify candidates as local minimizers.

We will discuss appropriate forms of necessary and sufficient optimality conditions for the particular
type of problem in each of the following sections.

1 Note that the definition is equivalent if we were to replace “neighborhood of 𝑥 ∗ ” by “ball around 𝑥 ∗ ”, i. e., 𝐵𝜀 (𝑥 ∗ ) = {𝑥 ∈
R𝑛 | ∥𝑥 − 𝑥 ∗ ∥ 2 < 𝜀}.
2 This is called the contraposition of the original statement.

8 https://scoop.iwr.uni-heidelberg.de/teaching/2022ws/short-course-optimization/ 2023-03-29
cbn Introduction to Optimization

§2 Unconstrained Optimization and Parameter Estimation

As the name suggests, unconstrained optimization is about minimizing an objective in the absence of
any equality or inequality constraints. In this case, our generic problem (1.1) reduces to

Minimize 𝑓 (𝑥) where 𝑥 ∈ R𝑛 . (2.1)

For this problem, “derivative equals zero” is a necessary optimality condition. Points which satisfy
𝑓 ′ (𝑥) = 0 are called stationary points.

Theorem 2.1 (Necessary optimality condition for unconstrained problems).

Suppose that 𝑥 ∗ is a local minimizer for (1.1). Then 𝑥 ∗ is a stationary point, i. e., 𝑓 ′ (𝑥 ∗ ) = 0 holds.

Sufficient optimality conditions are usually based on second-order derivatives, and we do not state
them here. Instead, we proceed to sketch some algorithms. Indeed, algorithms usually find stationary
points, which may or may not be local minimizers. Nevertheless, we use the term “optimization
algorithms” and continue to speak of “minimizing the objective”.

§ 2.1 Algorithmic Sketches for Generic Unconstrained Problems

Optimization algorithms proceed in a sequential way and break down problem (1.1) into a sequence of
simpler problems. In doing so, they generate a sequence 𝑥 (𝑘 ) of iterates, which (under some conditions)
converges to a stationary point 𝑥 ∗ .

The vast majority of algorithms for (1.1) proceeds as follows. At the current iterate 𝑥 (𝑘 ) , they form a
quadratic model of the objective. A quadratic model is a quadratic (i. e., second-order) polynomial
that shares some properties with the true objective in the vicinity of the current iterate 𝑥 (𝑘 ) . The
quadratic model at 𝑥 (𝑘 ) is of the form (see Figure 2.1 for an illustration in 1D)
linear term
1
z }| {
𝑞 (𝑘 ) (𝑥) B 𝑓 (𝑥 (𝑘 ) ) + 𝑓 ′ (𝑥 (𝑘 ) ) (𝑥 − 𝑥 (𝑘 ) ) + (𝑥 − 𝑥 (𝑘 ) ) ᵀ𝐻 (𝑘 ) (𝑥 − 𝑥 (𝑘 ) ) . (2.2)
| {z } 2
constant term
| {z }
quadratic term

The matrix 𝐻 (𝑘 ) is called the model Hessian and it is designed to be symmetric.

At 𝑥 = 𝑥 (𝑘 ) , the model 𝑞 (𝑘 ) shares its function value and its derivative with the true objective 𝑓 .
(Quiz 2.1: Can you show this?)

Algorithms proceed by minimizing the current quadratic model 𝑞 (𝑘 ) , or at least they look for a
stationary point of the model:
!
[𝑞 (𝑘 ) ] ′ (𝑥) = 𝑓 ′ (𝑥 (𝑘 ) ) + (𝑥 − 𝑥 (𝑘 ) ) ᵀ𝐻 (𝑘 ) = 0.

https://scoop.iwr.uni-heidelberg.de/teaching/2022ws/short-course-optimization/ 9
R. Herzog cbn

𝑞 (𝑘 ) (𝑥) B 𝑓 (𝑥 (𝑘 ) ) + 𝑓 ′ (𝑥 (𝑘 ) ) (𝑥 − 𝑥 (𝑘 ) )
+ 21 (𝑥 − 𝑥 (𝑘 ) ) ᵀ𝐻 (𝑘 ) (𝑥 − 𝑥 (𝑘 ) ) 𝑓 (𝑥)

𝑞 (𝑘 ) (𝑥) B 𝑓 (𝑥 (𝑘 ) ) + 𝑓 ′ (𝑥 (𝑘 ) ) (𝑥 − 𝑥 (𝑘 ) )
+ 21 (𝑥 − 𝑥 (𝑘 ) ) ᵀ𝐻 (𝑘 ) (𝑥 − 𝑥 (𝑘 ) )
𝑥
Figure 2.1: A function 𝑓 B R → R and two possible quadratic models 𝑞 (𝑘 ) about the point 𝑥 (𝑘 ) (black
dot). The red model is the second-order Taylor polynomial of 𝑓 at 𝑥 (𝑘 ) . In other words,
the red model uses 𝐻 (𝑘 ) = 𝑓 ′′ (𝑥 (𝑘 ) ). The blue model instead uses an “incorrect” Hessian
𝐻 (𝑘 ) ≠ 𝑓 ′′ (𝑥 (𝑘 ) ).

This amounts to solving the linear(!) system of equations

𝐻 (𝑘 ) (𝑥 − 𝑥 (𝑘 ) ) = −∇𝑓 (𝑥 (𝑘 ) ).

Usually this is written in the form

𝐻 (𝑘 ) 𝑑 = −∇𝑓 (𝑥 (𝑘 ) ) (2.3)
whose solution gives us an update direction 𝑑 in the space of optimization variables.

In practical algorithms, this direction is then scaled with a suitable step length 𝛼 (𝑘 ) > 0, which accounts
for the deviation of the quadratic model from the original objective and gives us the update

𝑥 (𝑘+1) B 𝑥 (𝑘 ) + 𝛼 (𝑘 ) 𝑑.

The step length 𝛼 (𝑘 ) has the purpose of making the step between consecutive iterates small when
the model does not agree well with the true objective. This is done in an effort to obtain robust
convergence properties regardless of the quality of the initial guess provided by the user. Since 𝛼 (𝑘 ) is
determined by a line search procedure, in which trial step sizes are proposed and evaluated until a
satisfactory one is found, one speaks of a line search algorithm. An alternative concept is to limit
the size of 𝑑 in the first place by constraining the norm ∥𝑑 ∥ 2 ≤ Δ (𝑘 ) when minimizing the model
(2.2). The solution of this problem is more complex than solving a linear system (2.3). This leads to
trust-region algorithms, and Δ (𝑘 ) is the trust-region radius.

Depending on the choice of the model Hessians 𝐻 (𝑘 ) implemented in an algorithm, we refer to it by

different names:3
3 Combined with the choice above, one then obtains a line-search quasi-Newton method, or a trust-region Newton algorithm,
for instance. The combination trust-region gradient method is, however, uncommon.

10 https://scoop.iwr.uni-heidelberg.de/teaching/2022ws/short-course-optimization/ 2023-03-29
cbn Introduction to Optimization

gradient method (steepest descent) 𝐻 (𝑘 ) = Id (identity matrix) linear convergence

preconditioned gradient method 𝐻 (𝑘 ) =𝑀 linear convergence
quasi-Newton method 𝐻 (𝑘 ) varies often superlinear convergence
Newton method 𝐻 (𝑘 ) = 𝑓 ′′ (𝑥 (𝑘 ) ) superlinear convergence

Gradient methods are simple in the sense that the linear system (2.3) solved in each iteration has a
constant coefficient matrix, which can be exploited. Gradient methods, however, typically exhibit slow
convergence. On the other hand, Newton methods convergence faster once they are near a minimizer;
however, the evaluation of the objective’s second derivative matrix in every iteration is usually costly.
In practice, quasi-Newton methods are a good compromise, in which 𝐻 (𝑘 ) tries to capture some of the
properties of the objective’s second derivative without actually computing it.

We close this section by mentioning what functions a user has to implement in order to run an
algorithm for solving an unconstrained problem (2.1):

objective 𝑓 (𝑥)
derivative 𝑓 ′ (𝑥) optional (finite difference fallback)
2nd derivative 𝑓 ′′ (𝑥) or 𝑓 ′′ (𝑥) 𝑑 optional, only for Newton methods

The evaluation of the first-order derivative 𝑓 ′ (𝑥), although required by all algorithms, is usually
optional for the user. When it is not provided, the algorithm will fall back to an internal approximation
of 𝑓 ′ (𝑥) by finite differences. While this may sound convenient, it may be very time consuming and
introduce inaccuracies. If possible, the user should provide first-order derivatives. To facilitate the
task, they may use algorithmic differentiation (AD) for this purpose. This is beyond the scope of
these notes.

Example 2.2 (Rosenbrock problem). The Rosenbrock problem is a classical test example in uncon-
strained optimization:

Minimize 𝑓 (𝑥, 𝑦) B (𝑎 − 𝑥) 2 + 𝑏 (𝑦 − 𝑥 2 ) 2 where (𝑥, 𝑦) ∈ R × R

It has a global minimizer at (𝑥, 𝑦) = (𝑎, 𝑎 2 ), with an optimal function value of 0. A typical choice of the
parameters is (𝑎, 𝑏) = (1, 100). A plot of the function can be found in Figure 2.2.

§ 2.2 Algorithmic Sketches for Unconstrained Least-Squares Problems

Many unconstrained optimization problems are in fact of the form

𝑀
Minimize [𝑟𝑖 (𝑥)] 2 = ∥𝑟 (𝑥) ∥ 22 where 𝑥 ∈ R𝑛 . (2.4)
∑︁
𝑓 (𝑥) B
𝑖=1

https://scoop.iwr.uni-heidelberg.de/teaching/2022ws/short-course-optimization/ 11
R. Herzog cbn

1.50

1.25

1.00

0.75

0.50
y

0.25

0.00

0.25

0.50
0.50 0.25 0.00 0.25 0.50 0.75 1.00 1.25 1.50
x

Figure 2.2: Contour plot of the Rosenbrock “banana” function (Example 2.2) with parameters (𝑎, 𝑏) =
(1, 100), and its global minimizer.

These are called least-squares problems. The primary source of least-squares problems lies in
parameter estimation, also known as parameter identification, model calibration, curve fitting,
or data assimilation.

In parameter estimation one is given a model function 𝑚 which describes a relation 𝜂 = 𝑚(𝜉) between
independent variables (model input) 𝜉 ∈ R𝑟 and dependent variables 𝜂 ∈ R (model output). The model
in fact contains yet unknown parameters 𝑥 ∈ R𝑛 , i. e., we have a relation of the form

𝜂 = 𝑚(𝑥; 𝜉).

Each measurement pair (𝜉𝑖 , 𝜂𝑖 ) contributes a residual

measurement

(2.5)
z}|{
𝑟𝑖 (𝑥) B 𝑚(𝑥; 𝜉𝑖 ) − 𝜂𝑖 ,
| {z }
model prediction

which describes the discrepancy between the value predicted by the model and the actual measure-
ment 𝜂𝑖 pertaining to the 𝑖-th input 𝜉𝑖 .

The least-squares problem (2.4) describes our desire to choose the parameters 𝑥 ∈ R𝑛 in such a way
that the model will fit the available measurement pairs (𝜉𝑖 , 𝜂𝑖 ) ∈ R𝑟 × R, 𝑖 = 1, . . . , 𝑀, in the best
possible way, in the sense of least squares.

There are statistical reasons why — at least for measurements 𝜂𝑖 perturbed by additive random
measurement errors, which are uncorrelated and identically normally distributed — the solution of
the least-squares problem (2.1) yields the best estimate of the parameters. When, more generally, the

12 https://scoop.iwr.uni-heidelberg.de/teaching/2022ws/short-course-optimization/ 2023-03-29
cbn Introduction to Optimization

measurement errors are still uncorrelated but may have non-identical variances 𝜎𝑖2 , we should consider
the weighted least-squares problem

1
𝑀
Minimize [𝑟𝑖 (𝑥)] 2 = ∥𝑟 (𝑥) ∥ 2Σ−1 where 𝑥 ∈ R𝑛 , (2.6)
∑︁
𝑓 (𝑥) B 2
𝜎
𝑖=1 𝑖

where Σ = diag(𝜎12, . . . , 𝜎𝑀
2 ) is the measurement covariance matrix. However, we do not specifically

consider the form (2.6) here. After all, (2.6) can be brought into the form (2.4) simply by scaling the
residuals by 1/𝜎𝑖 .

We could use any of the algorithmic ideas from § 2.1 to solve (2.4). However, it is usually more efficient
to use a different algorithm that exploits the particular structure of the objective in (2.4). In order to
understand this, we determine the first- and second-order derivatives of the objective from (2.4) using
the chain rule. We expect that the derivative of the residual function 𝑟 will appear, i. e., its Jacobian,
which we denote by 𝑟 ′ or by 𝐽 :
   
∇𝑟 (𝑥) ᵀ
1
   
𝜕
   
 𝜕 𝑟 (𝑥) · · · 𝜕 𝑟 (𝑥)   .
(𝑥)  ∈ R𝑀 ×𝑛 . (2.7)

𝐽 (𝑥) B 𝑟 𝑖 = 1
 𝜕𝑥 𝜕𝑥  =  .
.
𝜕𝑥 𝑗
𝑛
  
𝑖𝑗
  ∇𝑟𝑚 (𝑥) ᵀ
   
 
   
   
For cosmetic reasons it is practical to divide the objective in (2.4) by 2. This does, of course, not
influence where the minimizers lie.
1 1
𝑓 (𝑥) = ∥𝑟 (𝑥) ∥ 22 = 𝑟 (𝑥) ᵀ𝑟 (𝑥) objective (2.8a)
2 2
𝑀
first-order derivative (2.8b)
∑︁
ᵀ
∇𝑓 (𝑥) = 𝐽 (𝑥) 𝑟 (𝑥) = 𝑟𝑖 (𝑥) ∇𝑟𝑖 (𝑥)
𝑖=1
𝑀
second-order derivative. (2.8c)
∑︁
𝑓 ′′ (𝑥) = 𝐽 (𝑥) ᵀ 𝐽 (𝑥) + 𝑟𝑖 (𝑥) 𝑟𝑖′′ (𝑥)
𝑖=1

With the help of this data one could set up a quadratic model of the objective 𝑓 at the current iterate
𝑥 (𝑘 ) , determine a search direction 𝑑 (𝑘 ) by minimizing it, then find a step length 𝛼 (𝑘 ) , take the step etc.
The additional knowledge, however, that we are dealing with a least-squares problem can be exploited
even better algorithmically. We describe this using the popular Levenberg-Marquardt method4 as
an example.

The Levenberg-Marquardt method sets up a certain quadratic model about the point 𝑥 (𝑘 ) :
(𝑘 ) 1 (𝑘 )
𝑞 LM (𝑑) B𝑓 (𝑥 (𝑘 ) ) + ∇𝑓 (𝑥 (𝑘 ) ) ᵀ𝑑+ 𝑑 ᵀ𝐻 LM 𝑑
2
1 1 (𝑘 )
= 𝑟 (𝑥 (𝑘 ) ) ᵀ𝑟 (𝑥 (𝑘 ) ) + 𝑟 (𝑥 (𝑘 ) ) ᵀ 𝐽 (𝑥 (𝑘 ) ) 𝑑 + 𝑑 ᵀ𝐻 LM 𝑑 by (2.8). (2.9)
2 2
The Levenberg-Marquardt method uses
(𝑘 )
𝐻 LM B 𝐽 (𝑥 (𝑘 ) ) ᵀ 𝐽 (𝑥 (𝑘 ) ) + 𝜆 (𝑘 ) Id (2.10)
as the model Hessian. Here 𝜆 (𝑘 ) > 0 is a positive number. This has the following advantages:
4 proposed by Levenberg (1944), rediscovered by Marquardt (1963)

https://scoop.iwr.uni-heidelberg.de/teaching/2022ws/short-course-optimization/ 13
R. Herzog cbn

(𝑖) One uses the “good” first part 𝐽 (𝑥) ᵀ 𝐽 (𝑥) of the true Hessian (2.8c). This part is at least positive
semidefinite and it can be computed without further effort since we need 𝐽 (𝑥) anyway for the
first-order derivative ∇𝑓 (𝑥).

(𝑖𝑖) The “bad” part 𝑖=1 𝑟𝑖 (𝑥) 𝑟𝑖′′ (𝑥) of the Hessian (2.8c) is omitted. Since this part depends on the
Í𝑀
second-order derivatives 𝑟𝑖′′ (𝑥), it is expensive to evaluate, and it can potentially destroy the
positive definiteness. Moreover we hope that the residuals 𝑟𝑖 (𝑥 ∗ ) at the solution 𝑥 ∗ will be
small (near zero), i. e., we hope that we will have a good fit, and this part of the Hessian can be
disregarded.

(𝑖𝑖𝑖) Instead one adds an auxiliary term 𝜆 (𝑘 ) Id. In view of 𝜆 (𝑘 ) > 0, positive definiteness of the model
Hessian 𝐻 LM
(𝑘 )
is ensured, which implies that the linear problems that need to be solved in the
algorithms substeps are uniquely solvable.

As with the methods of § 2.1, one is interested in the minimizer of the quadratic model (2.9) in the 𝑘-th
iteration. This minimizer is unique and it can be calculated by solving the following linear system of
equations, compare (2.3),
(𝑘 ) (𝑘 )
𝐻 LM 𝑑 = −∇𝑓 (𝑥 (𝑘 ) ). (2.11)
In more explicit terms, see (2.10) and (2.8b), we can write

𝐽 (𝑥 (𝑘 ) ) ᵀ 𝐽 (𝑥 (𝑘 ) ) + 𝜆 (𝑘 ) Id 𝑑 (𝑘 ) = −𝐽 (𝑥 (𝑘 ) ) ᵀ𝑟 (𝑥 (𝑘 ) ). (2.12)

The size of the Levenberg-Marquardt parameter 𝜆 (𝑘 ) implicitly determines the length of the step 𝑑.
(Quiz 2.2: What happens in the extreme cases 𝜆 (𝑘 ) → 0 and 𝜆 (𝑘 ) → ∞, respectively?) 𝜆 (𝑘 ) itself is
controlled depending on how well the model (2.9) was able to predict the actual decrease in the values
of the objective (2.4).

We mention what functions a user has to implement in order to run a dedicated least-squares algorithm
for solving an problem of type (2.4):

residual 𝑟 (𝑥)
Jacobian 𝐽 (𝑥) optional (finite difference fallback)

Often times, the user has a choice (through different interfaces to the algorithm) to specify the model
function 𝑚 and the set of measurement pairs (𝜉𝑖 , 𝜂𝑖 ) ∈ R𝑟 × R, 𝑖 = 1, . . . , 𝑀, and have the method itself
evaluate the residual function with components (2.5) and Jacobian

 𝜕 𝑚(𝑥; 𝜉 1 ) · · · ∇𝑚(𝑥; 𝜉 1 ) ᵀ
𝜕𝑥𝑛 𝑚(𝑥; 𝜉 1 )
𝜕   
 𝜕𝑥 1 
.. .. ..
 
′

𝐽 (𝑥) = 𝑟 (𝑥) =  = .
  
 𝜕 . .   . 
 𝑚(𝑥; 𝜉 𝑀 ) · · · 𝑚(𝑥; 𝜉 ) ∇𝑚(𝑥; 𝜉 𝑀 ) ᵀ
𝜕   
 𝜕𝑥 1 𝜕𝑥𝑛 𝑀   

In this case, the user provides

14 https://scoop.iwr.uni-heidelberg.de/teaching/2022ws/short-course-optimization/ 2023-03-29
cbn Introduction to Optimization

model 𝑚(𝑥; 𝜉)
model gradient ∇𝑚(𝑥; 𝜉) optional (finite difference fallback)

and the set of measurement pairs (𝜉𝑖 , 𝜂𝑖 ). Again, when derivatives are not provided, most algorithms
will fall back to finite differencing.

Example 2.3 (Parameter estimation). The position (𝑥, 𝑦) of an airplane should be determined by radio
bearing. To this end, the directions towards several radio beacons are measured from the airplane. The
positions (𝑋𝑖 , 𝑌𝑖 ) of the beacons are known and the directions are measured as angles 𝛼𝑖 to the 𝑥-axis:

beacon 𝑖 beacon’s position (𝑋𝑖 , 𝑌𝑖 ) angle 𝛼𝑖

1 (8, 6) 38◦
2 (−3, −3) 220◦
3 (1, 0) 222◦
4 (8, −3) 300◦

The aim is to find the position (𝑥, 𝑦) of the airplane for which the discrepancies between the predicted and
measured angles are as small as possible in the least squares sense.

In this example, we have 𝑀 = 4 individual measurements. We require a model which maps the independent
variable 𝜉 = (𝑋, 𝑌 ) ∈ R2 to the dependent variable (𝜂 = 𝛼 ∈ R), as a function of the unknown airplane
position (𝑥, 𝑦). A model which achieves this is given by
independent variable
180◦
z}|{
𝑚((𝑥, 𝑦) ; (𝑋, 𝑌 )) = 𝑎𝑡𝑎𝑛2(𝑋 − 𝑥, 𝑌 − 𝑦) · ,
|{z} 𝜋
unknown parameter

−𝑥
which returns the angle of the vector 𝑋𝑌 −𝑦 against the 𝑥-axis in degrees. In fact, since 𝑎𝑡𝑎𝑛2 returns
angles in the interval (−𝜋, 𝜋], while our measurements are in 0◦ to 360◦ , we have to normalize 𝑎𝑡𝑎𝑛2’s
output to [0, 2𝜋) beforehand; see the listing in § 7.3 for details.

The solution obtained with the code from § 7.3 is shown in Figure 2.4.
End of Class 1

https://scoop.iwr.uni-heidelberg.de/teaching/2022ws/short-course-optimization/ 15
R. Herzog cbn

beacon 1

𝛼beacon
2
3
𝛼3
𝛼4 𝛼1
airplane

beacon 2 beacon 4

Figure 2.3: Illustration of the airplane position parameter estimation problem from Example 2.3.

6 beacon 0

2
y

0 beacon 2

2
beacon 1 beacon 3
2 0 2 4 6 8
x

Figure 2.4: Solution of the airplane position parameter estimation problem from Example 2.3.

16 https://scoop.iwr.uni-heidelberg.de/teaching/2022ws/short-course-optimization/ 2023-03-29
cbn Introduction to Optimization

§3 Convex Optimization

In this section we consider another class of optimization problems:

Minimize 𝑓 (𝑥) where 𝑥 ∈ R𝑛 . (3.1)

These problems feature a convex objective 𝑓 : R𝑛 → R ∪ {∞}. Due to the possibility of 𝑓 taking the
value ∞, we can implicitly model constraints, since points with a function value 𝑓 (𝑥) = ∞ are not
considered to be minimizers, since we exclude the pathological case 𝑓 ≡ ∞.

Definition 3.1 (Convex function). Suppose that 𝑓 : R𝑛 → R ∪ {∞}.

(𝑖) 𝑓 is said to be convex if

𝑓 (𝛼 𝑥 + (1 − 𝛼) 𝑦) ≤ 𝛼 𝑓 (𝑥) + (1 − 𝛼) 𝑓 (𝑦) (3.2)

holds for all 𝑥, 𝑦 ∈ R𝑛 and all 𝛼 ∈ [0, 1].

(𝑖𝑖) 𝑓 is said to be strictly convex if

𝑓 (𝛼 𝑥 + (1 − 𝛼) 𝑦) < 𝛼 𝑓 (𝑥) + (1 − 𝛼) 𝑓 (𝑦) (3.3)

holds for all 𝑥 ≠ 𝑦 ∈ R𝑛 and all 𝛼 ∈ (0, 1).

)
(1 − 𝛼 ) 𝑓 (𝑦
𝛼 𝑓 (𝑥 ) +

𝑥 𝑦 𝑥

Figure 3.1: Example of a (strictly) convex function on R.

The benefit of the objective being convex is clarified by the following theorem.

Theorem 3.2 (Fundamental theorem of convex optimization).

(a) Every local minimizer of (3.1) is already a global minimizer.

(b) The set of global minimizers of (3.1) is a convex set.5

(c) When 𝑓 is strictly convex, then (3.1) has at most one solution.
5A set 𝐶 ⊆ R𝑛 is said to be convex if the line joining any points in 𝐶 remains in 𝐶, i. e.: for all 𝑥, 𝑦 ∈ 𝐶 we have
𝛼 𝑥 + (1 − 𝛼) 𝑦 ∈ 𝐶 for all 𝛼 ∈ [0, 1].

https://scoop.iwr.uni-heidelberg.de/teaching/2022ws/short-course-optimization/ 17
R. Herzog cbn

Consequently, in convex optimization one does not have to distinguish local from global minimizers.

We consider in this section optimization problems of a form more particular than the generic convex
problem (3.1). Indeed, we consider composite problems of the form
Minimize 𝑓 (𝑥) + 𝑔(𝐴𝑥) where 𝑥 ∈ R𝑛 . (3.4)
In (3.4), 𝑓 : R𝑛 → R ∪ {∞} and 𝑔 : R𝑚 → R ∪ {∞} are both convex functions, and 𝐴 ∈ R𝑚,𝑛 is a
matrix. Problems of type (3.4) are encountered in many practical problems, and they allow specialized
algorithms of convex optimization to be used, compared to the generic convex problem (3.1).

Example 3.3 (Image denoising). A classical problem in image denoising is as follows. Suppose we are
given a noisy image (gray-scale for simplicity), which is represented as a matrix 𝑍 ∈ R𝑛1 ×𝑛2 . The image
height (number of pixels in vertical direction) is 𝑛 1 while 𝑛 2 is the width. We wish to remove the noise
while retaining the image’s content. In particular, we wish to preserve edges, i. e., sharp variations of the
brightness values between neighboring pixels.

A classical way to approach this task (Rudin, Osher, Fatemi, 1992) is to formulate it as an optimization
problem:
2 −1 1 −1 ∑︁
1 ∑︁
𝑛 1 ∑︁
𝑛2 𝑛 1 𝑛∑︁ 𝑛∑︁ 𝑛2
2 ∑︁
Minimize 𝑋𝑖,𝑗 − 𝑍𝑖,𝑗 + 𝛽 |𝑋𝑖,𝑗 − 𝑋𝑖,𝑗+1 | + 𝛽 |𝑋𝑖+1,𝑗 − 𝑋𝑖,𝑗 |
2 𝑖=1 𝑗=1 | {z } 𝑖=1 𝑗=1 | {z } 𝑖=1 𝑗=1 | {z } (3.5)
fidelity term horizontal jump vertical jump

where 𝑋 ∈ R𝑛1 ×𝑛2 .

The first term ensures that the resulting image 𝑋 does not deviate too much from the noisy image 𝑍
observed. The remaining terms penalize differences in brightness between all pairs of neighboring pixels,
both in horizontal and in vertical directions. These terms are also known as the total variation of the
image. The parameter 𝛽 > 0 balances both objectives.

Problem (3.5) can be written in the form (3.4) when we vectorize6 the matrix-valued optimization variable
to become 𝑥 = vec(𝑋 ) ∈ R𝑛1𝑛2 . Likewise, we vectorize the observed image: 𝑧 = vec(𝑍 ). The first term
in (3.5) then simply becomes 21 ∥𝑥 − 𝑧 ∥ 22 . To account for the remaining terms, we require a matrix 𝐴
which converts the pixel values 𝑥 to the vector of horizontal differences, followed by the vector of vertical
differences. Such a matrix 𝐴 is of dimension [𝑛 1 (𝑛 2 − 1) +𝑛 2 (𝑛 1 − 1)] × [𝑛 1 𝑛 2 ]. Each row contains precisely
one entry +1 and one entry −1, all remaining entries are zero; see Figure 3.2. Finally, 𝑔(𝑦) = 𝛽 ∥𝑦 ∥ 1 holds,
i. e., 𝑔(𝑦) is 𝛽 times the sum of the absolute values of all entries of the vector 𝑦.

The objective in Example 3.3 is convex since

1
∥𝑥 − 𝑧 ∥ 22 and 𝑔(𝐴𝑥) B 𝛽 ∥𝐴𝑥 ∥ 1
𝑓 (𝑥) B (3.6)
2
are both convex, and since sums and non-negative multiples of convex functions are again convex.

The algorithm we are considering exploits the additive and composite structure of (3.5). In order to
derive it, we require the notion of the Fenchel conjugate of a function.
6Vectorization of a matrix simply means to stack the columns of the matrix on top of each other to form a long vector with
all entries of the matrix.

18 https://scoop.iwr.uni-heidelberg.de/teaching/2022ws/short-course-optimization/ 2023-03-29
cbn Introduction to Optimization

1 2 3 4 5 6 7 8 9 10 11 12

1 2 −1 · · · · · −1 · · · · ·1
1 4 7 ·

· −1 · · · 1 −1 · · · ·2

3
· · · · −1 · · 1 · · · ·

7 9 11

 1 −1 · · · · · · −1 · · ·  4
 
3 4 1 −1 · 1 −1 · ·  5
𝐴ᵀ =  ·

2 5 8 · · · ·
· · · · 1 −1 · · · 1 · ·  6
1 · −1 ·  7

8 10 12 · · · · · · · ·
1 1 −1 8

· · · · · · · · ·
5 6
3 6 9 1 1  9

· · · · · · · · · ·


Figure 3.2: Illustration of the construction of the matrix 𝐴 for an image denoising problem (Example 3.3).
Its transpose 𝐴ᵀ is the incidence matrix of the directed graph obtained from connecting
neighboring pixels. For readibility, zeros are written as ·.

Definition 3.4 (Fenchel conjugate). Suppose that 𝑓 : R𝑛 → R ∪ {∞} is a function. Then the Fenchel
conjugate of 𝑓 is 𝑓 ∗ : R𝑛 → R ∪ {±∞}, defined by

𝑓 ∗ (𝜉) B sup 𝜉 ᵀ𝑥 − 𝑓 (𝑥). (3.7)

𝑥 ∈R𝑛

In view of
𝑓 ∗ (𝜉) = inf 𝛽 ∈ R 𝜉 ᵀ𝑥 − 𝛽 ≤ 𝑓 (𝑥) for all 𝑥 ∈ R𝑛

= inf 𝛽 ∈ R 𝑧 ≤ 𝑓 (𝑥) for all ( 𝑥𝑧 ) ∈ R𝑛 × R and 𝑧 = 𝜉 ᵀ𝑥 − 𝛽 ,

we have a geometric interpretation of 𝑓 ∗ (𝜉). We consider a hyperplane

𝐻𝑛,𝛽 = ( 𝑥𝑧 ) ∈ R𝑛 × R 𝑛 ᵀ ( 𝑥𝑧 ) = 𝛽

= ( 𝑥𝑧 ) ∈ R𝑛 × R 𝑧 = 𝜉 ᵀ𝑥 − 𝛽

in R𝑛 × R with the fixed normal vector 𝑛 = −1 𝜉
and unknown offset 𝛽, and push it upwards against
the graph of the function 𝑓 by decreasing the offset 𝛽. Then we record the minimal offset 𝛽. We can
find the (negative) offset 𝛽 when evaluating 𝑧 = 𝜉 ᵀ𝑥 − 𝛽 at 𝑥 = 0; see Figure 3.3.

The Fenchel conjugate 𝑓 ∗ of any function 𝑓 is always convex. Moreover, when 𝑓 itself is convex
and lower semicontinuous7 , then 𝑓 ∗ contains enough information to recover 𝑓 , in the sense that the
biconjugate function 𝑓 ∗∗ equals 𝑓 .

Without going into any details here, one can show the following theorem; see for instance Clason,
2017, Section 7.4.

Theorem 3.5 (Characterization of solutions to (3.4)).

7A function 𝑓 : R𝑛 → R ∪ {∞} is lower semicontinuous if 𝑓 (𝑥 0 ) ≤ lim inf 𝑥→𝑥 0 𝑓 (𝑥) holds for all 𝑥 0 ∈ R𝑛 .

https://scoop.iwr.uni-heidelberg.de/teaching/2022ws/short-course-optimization/ 19
R. Herzog cbn

2 3

0 0

-1
-1

-2
-1.5 -1 -0.5 0 0.5 1 1.5 2 -4 -3 -2 -1 0 1 2
x

Figure 3.3: Illustration of the construction of the Fenchel conjugate function 𝑓 ∗ : R ∪ {±∞} (right) of
the (convex) function 𝑓 (𝑥) = 𝑥 2 + exp(−𝑥 3 ) (left).

A point 𝑥 ∗ ∈ R𝑛 is a (global) minimizer for (3.4) if and only if there exists 𝑝 ∗ ∈ R𝑚 such that, for any
𝜎, 𝜏 ≠ 0,
1
𝑥 ∗ solves Minimize ∥𝑦 − (𝑥 ∗ − 𝜏 𝐴ᵀ𝑝 ∗ ) ∥ 22 + 𝜏 𝑓 (𝑦) where 𝑦 ∈ R𝑛 , (3.8a)
2
1
and 𝑝 ∗ solves Minimize ∥𝑞 − (𝑝 ∗ + 𝜎 𝐴𝑥 ∗ ) ∥ 22 + 𝜎 𝑔∗ (𝑞) where 𝑞 ∈ R𝑚 . (3.8b)
2

The above theorem provides the idea for an algorithm. As the unknown solutions 𝑥 ∗ and 𝑝 ∗ appear in
the respective objectives for both problems, we can convert (3.8) into a fixed-point algorithm.

Algorithm 3.6 (Primal-dual extragradient algorithm (Chambolle, Pock, 2011)).

1: Set 𝑥 (𝑘+1) to the solution of the following problem:

1
Minimize ∥𝑦 − (𝑥 (𝑘 ) − 𝜏 𝐴ᵀ𝑝 (𝑘 ) ) ∥ 22 + 𝜏 𝑓 (𝑦) where 𝑦 ∈ R𝑛 (3.9)
2
2: Set 𝑝 (𝑘+1) to the solution of the following problem:
1
Minimize ∥𝑞 − (𝑝 (𝑘 ) + 𝜎 𝐴𝑥 (𝑘+1) ) ∥ 22 + 𝜎 𝑔∗ (𝑞) where 𝑞 ∈ R𝑚 (3.10)
2

3: Set 𝑝 (𝑘+1) B 2 𝑝 (𝑘+1) − 𝑝 (𝑘 ) .

The advantage of such an algorithm is that the solution to the original problem (3.4), involving both 𝑓
and 𝑔, is broken down into the repeated solution to two different problems (3.9) and (3.10). Problem

20 https://scoop.iwr.uni-heidelberg.de/teaching/2022ws/short-course-optimization/ 2023-03-29
cbn Introduction to Optimization

(3.9) requires the minimization of 𝜏 𝑓 plus a quadratic distance term. The solution to this problem is
also known as proximity (prox) operator. More precisely, one defines
1
prox𝜏 𝑓 (𝑥) B unique solution of “Minimize ∥𝑦 − 𝑥 ∥ 22 + 𝜏 𝑓 (𝑦) where 𝑦 ∈ R𝑛 ”. (3.11)
2
Similarly, problem (3.10) requires the minimization of 𝜎 times the conjugate 𝑔∗ plus a quadratic distance
term, i. e., the evaluation of prox𝜎 𝑔∗ .

For a number of relevant problems, the solutions to problems (3.9) and (3.10) can be explicitly cal-
culated.8 We consider this now for 𝑓 and 𝑔∗ from the image denoising Example 3.3. We begin with
prox𝜏 𝑓 (𝑥), i. e., the unique solution of the problem

1 1 𝜏
Minimize ∥𝑦 − 𝑥 ∥ 22 + 𝜏 𝑓 (𝑦) = ∥𝑦 − 𝑥 ∥ 22 + ∥𝑦 − 𝑧 ∥ 22 where 𝑦 ∈ R𝑛1𝑛2 .
2 2 2
A short calculation shows (Quiz 3.1: Can you fill in the details?)
𝑥 +𝜏𝑧
prox𝜏 𝑓 (𝑥) = , (3.12)
1+𝜏
i. e., prox𝜏 𝑓 (𝑥) is simply a weighted average between 𝑥 and 𝑧. On the other hand, we first need to
evaluate
𝑔∗ (𝜉) = sup 𝜉 ᵀ𝑝 − 𝑔(𝑝) = sup 𝜉 ᵀ𝑝 − 𝛽 ∥𝑝 ∥ 1
𝑝 ∈R𝑚 𝑝 ∈R𝑚

where 𝑚 = 𝑛 1 (𝑛 2 − 1) + 𝑛 2 (𝑛 1 − 1) is the number of pairs of neighboring pixels (edges in Figure 3.2).

We calculate (Quiz 3.2: Can you fill in the details?)9

0 if ∥𝜉 ∥ ∞ ≤ 𝛽
(
∗
𝑔 (𝜉) =
∞ if ∥𝜉 ∥ ∞ > 𝛽.

Therefore we have that prox𝜎 𝑔∗ (𝑝) is the unique solution of the problem

1
Minimize ∥𝑞 − 𝑝 ∥ 22 + 𝜎 𝑔∗ (𝑞) where 𝑞 ∈ R𝑚 ,
2
which is to say that prox𝜎 𝑔∗ (𝑝) is the unique solution of

1
Minimize ∥𝑞 − 𝑝 ∥ 22 where 𝑞 ∈ R𝑚
2
subject to ∥𝑞∥ ∞ ≤ 𝛽.

A short calculation shows (Quiz 3.3: Can you fill in the details?)
𝛽 𝑝𝑖
[prox𝜎 𝑔∗ (𝑝)] 𝑖 = = proj [ −𝛽,𝛽 ] (𝑝𝑖 ), (3.13)
max{𝛽, |𝑝𝑖 |}
which turns out to be independent of 𝜎. This formula means that the values of 𝑝 are simply clipped
elementwise to lie inside [−𝛽, 𝛽].
8 The web site http://proximity-operator.net/ collects computable prox operators.
9 Itcan be shown in general that the convex conjugate of a norm is the indicator function of the unit ball w.r.t. the dual
norm.

https://scoop.iwr.uni-heidelberg.de/teaching/2022ws/short-course-optimization/ 21
R. Herzog cbn

It can be shown (see Chambolle, Pock, 2011, Theorem 1 or Clason, 2017, Theorem 7.8) that Algorithm 3.6
converges to a solution of (3.4) as long as 𝜎, 𝜏 are positive numbers satisfying 𝜎 𝜏 ∥𝐴∥ 2 < 1, where ∥𝐴∥
is the largest singular value of 𝐴.

We close this section by mentioning what functions a user has to implement in order to run the
Chambolle-Pock Algorithm 3.6 for solving (3.4):

proximity operator of 𝜏 𝑓 prox𝜏 𝑓 (𝑥)

proximity operator of 𝜎 𝑔∗ prox𝜎 𝑔∗ (𝑝)
matrix-vector product with 𝐴 𝐴𝑥
matrix-vector product with 𝐴ᵀ 𝐴ᵀ 𝑝 redundant if 𝐴 is available as a matrix

It is remarkable that the user does not need to implement the objective (3.4) or even the functions 𝑓 and 𝑔,
but needs to implement the solution operators for certain problems involving 𝑓 and the conjugate 𝑔∗ .

Figure 3.4 shows the result of an image denoising optimization run.

Figure 3.4: Image denoising model problem (Example 3.3.) Left: original image (dimensions 𝑛 1 = 𝑛 2 =
256). Middle: image with noise. Right: denoising result obtained using 200 iterations of the
Chambolle-Pock Algorithm 3.6 with 𝜎 = 1 and 𝜏 = 1/10 and weighting parameter 𝛽 = 0.08.

End of Class 2

22 https://scoop.iwr.uni-heidelberg.de/teaching/2022ws/short-course-optimization/ 2023-03-29
cbn Introduction to Optimization

§4 Constrained Optimization

We recall from § 1 that we are considering constrained optimization problems of the form

Minimize 𝑓 (𝑥) where 𝑥 ∈ R𝑛 



subject to 𝑔𝑖 (𝑥) ≤ 0 with 𝑖 = 1, . . . , 𝑛 ineq (4.1)



and ℎ 𝑗 (𝑥) = 0 with 𝑗 = 1, . . . , 𝑛 eq .





These problems are known as nonlinear optimization problems or nonlinear programming
problems (NLP). We also recall the notion of the feasible set

𝐹 B 𝑥 ∈ R𝑛 𝑔𝑖 (𝑥) ≤ 0 for all 𝑖 = 1, . . . , 𝑛 ineq and ℎ 𝑗 (𝑥) = 0 for all 𝑗 = 1, . . . , 𝑛 eq (4.2)

associated with (4.1).

Example 4.1 (Production example; slightly modified from John E. Beasley’s OR notes10 ).
A company makes two products (X and Y) using two machines (A and B). Each unit of X that is produced
requires 50 minutes processing time on machine A and 30 minutes processing time on machine B. Each
unit of Y that is produced requires 24 minutes processing time on machine A and 33 minutes processing
time on machine B. At the start of the current week there are 30 units of X and 90 units of Y in stock.
Available processing time on machine A is forecast to be 40 hours and on machine B is forecast to be
35 hours. The demand for X in the current week is forecast to be 70 units and for Y is forecast to be 95 units.
Company policy is to maximize the combined sum of the units of X and the units of Y in stock at the end
of the week.

Formulate this as an optimization problem. What kind of problem is it? What type of solver would you
use to solve it?

Note: Spoiler alert: The solution of Example 4.1 is given in § 7.

Example 4.2 (Formulating non-smooth problems as smooth ones). Try to formulate the following
optmization problems in the form (1.1) with smooth objective and smooth constraints.

Minimize − 𝑥 12 + 𝑥 2
)
(4.3)
subject to max{𝑥 1 + 𝑥 2, 2 𝑥 1 − 𝑥 2 } ≤ 3
Minimize − 𝑥 12 + 𝑥 2
)
(4.4)
subject to min{𝑥 1 + 𝑥 2, 2 𝑥 1 − 𝑥 2 } ≤ 3
Minimize max{−𝑥 12 + 𝑥 2, 𝑥 12 + 2 𝑥 2 }
)
(4.5)
subject to 𝑥 1 + 𝑥 2 ≤ 3
Minimize |−𝑥 12 + 𝑥 2 | + |𝑥 1 + 𝑥 22 |
)
(4.6)
subject to 𝑥 1 + 𝑥 2 ≤ 3
10 https://people.brunel.ac.uk/∼mastjjb/jeb/or/morelp.html

https://scoop.iwr.uni-heidelberg.de/teaching/2022ws/short-course-optimization/ 23
R. Herzog cbn

We will now look a little bit into the theory for problem (1.1) since it also tells us something about
choices we may be facing while modeling the problem.

One can easily see by example (Quiz 4.1: Can you give an example?) that the well-known condition
∇𝑓 (𝑥) = 0 is no longer a necessary condition for local optimality in case of a constrained problem. It
is one of the main conceptual difficulties in constrained optimization to come up with conditions that
replace ∇𝑓 (𝑥) = 0.

The main idea is as follows. Consider a local minimizer 𝑥 ∗ and a sequence 𝑥 (𝑘 ) of feasible points
approaching it. The elements of the sequence will eventually (i. e., for 𝑘 large enough) all remain in
the neighborhood 𝑈 (𝑥 ∗ ) of local optimality. Consequently,
𝑓 (𝑥 ∗ ) ≤ 𝑓 (𝑥 (𝑘 ) )
holds for all 𝑘 sufficiently large. Therefore, we have
0 ≤ 𝑓 (𝑥 (𝑘 ) ) − 𝑓 (𝑥 ∗ ). (4.7)
By Taylor’s theorem11 , for any 𝑘 ∈ N there exist a number 𝜉 (𝑘 ) ∈ (0, 1) such that
𝑓 (𝑥 (𝑘 ) ) = 𝑓 (𝑥 ∗ ) + ∇𝑓 (𝑥 ∗ + 𝜉 (𝑘 ) (𝑥 (𝑘 ) − 𝑥 ∗ )) ᵀ (𝑥 (𝑘 ) − 𝑥 ∗ )
holds. Plugging this into the inequality (4.7), we obtain
0 ≤ ∇𝑓 (𝑥 ∗ + 𝜉 (𝑘 ) (𝑥 (𝑘 ) − 𝑥 ∗ )) ᵀ (𝑥 (𝑘 ) − 𝑥 ∗ ) (4.8)
for sufficiently large 𝑘 ∈ N. Notice that since 𝑥 (𝑘 ) → 𝑥 ∗ and since 𝜉 (𝑘 ) is bounded, we must have
𝑥 ∗ + 𝜉 (𝑘 ) (𝑥 (𝑘 ) − 𝑥 ∗ ) → 𝑥 ∗
and therefore
∇𝑓 (𝑥 ∗ + 𝜉 (𝑘 ) (𝑥 (𝑘 ) − 𝑥 ∗ )) → ∇𝑓 (𝑥 ∗ ).
However, passing to the limit in (4.8) only yields the meaningless 0 ≤ 0 so we must do something
different.

Let us take another sequence 𝑡 (𝑘 ) which converges to zero from above: 𝑡 (𝑘 ) ↘ 0, and let us restrict
our attention to sequences 𝑥 (𝑘 ) such that the limit
𝑥 (𝑘 ) − 𝑥 ∗
𝑑 B lim
𝑘→∞ 𝑡 (𝑘 )
exists. Dividing (4.8) by 𝑡 (𝑘 ) gives
𝑥 (𝑘 ) − 𝑥 ∗

0 ≤ ∇𝑓 (𝑥 ∗ + 𝜉 (𝑘 ) (𝑥 (𝑘 ) − 𝑥 ∗ )) ᵀ
𝑡 (𝑘 )
and passing to the limit now yields the informative statement
0 ≤ ∇𝑓 (𝑥 ∗ ) ᵀ𝑑
for all directions 𝑑 ∈ R𝑛 constructed as above. We refer to those as tangent directions or tangent
vectors.
11 This requires that the objective 𝑓 is a continuously differentiable (𝐶 1 ) function.

24 https://scoop.iwr.uni-heidelberg.de/teaching/2022ws/short-course-optimization/ 2023-03-29
cbn Introduction to Optimization

Definition 4.3 (Tangent vectors, tangent cone). A vector 𝑑 ∈ R𝑛 is said to be a tangent direction or
tangent vector to the (feasible) set 𝐹 at the point 𝑥 ∗ ∈ 𝐹 if there exist sequences 𝑥 (𝑘 ) and 𝑡 (𝑘 ) such that
𝑥 (𝑘 ) → 𝑥 ∗ and 𝑡 (𝑘 ) ↘ 0 such that
𝑥 (𝑘 ) − 𝑥 ∗
𝑑 = lim
𝑘→∞ 𝑡 (𝑘 )
holds. The set of all tangent vectors to 𝐹 at 𝑥 ∗ is said to be the tangent cone12 to 𝐹 at 𝑥 ∗ and we denote it
by T𝐹 (𝑥 ∗ ).

Above we have proved the following

Theorem 4.4 (First-order necessary optimality conditions).

Suppose that 𝑥 ∗ is a local minimizer of an optimization problem with feasible set 𝐹 . Then

∇𝑓 (𝑥 ∗ ) ᵀ𝑑 ≥ 0 for all 𝑑 ∈ T𝐹 (𝑥 ∗ ). (4.9)

The difficulty with this theorem is that the tangent cone T𝐹 (𝑥 ∗ ) is generally hard to characterize
and thus impossible to use in an optimization algorithm. Notice also that T𝐹 (𝑥 ∗ ) doesn’t even use
the description of the feasible set 𝐹 (4.2) in terms of the inequality constraints 𝑔𝑖 or the equality
constraints ℎ 𝑗 .

We therefore introduce another cone as a candidate to replace T𝐹 (𝑥 ∗ ).

Definition 4.5 (Linearized feasible vector, linearizing cone). A vector 𝑑 ∈ R𝑛 is said to be a linearized
feasible direction to the (feasible) set 𝐹 described by (1.2) at the point 𝑥 ∗ ∈ 𝐹 if it satisfies the following
conditions:

∇𝑔𝑖 (𝑥 ∗ ) ᵀ𝑑 ≤ 0 for all active indices 𝑖 = 1, . . . , 𝑚, (4.10a)

∇ℎ 𝑗 (𝑥 ) 𝑑 = 0
∗ ᵀ
for all 𝑗 = 1, . . . , 𝑝. (4.10b)

The set of all linearized feasible directions to 𝐹 at 𝑥 ∗ is said to be the linearizing cone to 𝐹 at 𝑥 ∗ and we
denote it by T𝐹lin (𝑥 ∗ ).

Recall that the active indices at 𝑥 ∗ are those satisfying 𝑔𝑖 (𝑥 ∗ ) = 0. Therefore, inequalities that are
inactive at 𝑥 ∗ do not play a role in the definition.

Remark 4.6 (Tangent and linearizing cones). The tangent cone T𝐹 (𝑥 ∗ ) can be viewed as a geometric
approximation to the feasible set near the point 𝑥 ∗ . By contrast, the linearizing cone T𝐹lin (𝑥 ∗ ) is an attempt
to locally describe the feasible set in algebraic terms, by means of equalities and inequalities.
12 In mathematics, a cone in R𝑛 is any set 𝐾 ⊆ R𝑛 such that 𝑥 ∈ 𝐾 implies 𝛼 𝑥 ∈ 𝐾 for all 𝛼 > 0. In other words, a point 𝑥 in
the cone entails that the entire ray from the origin through 𝑥 belongs to the cone as well.

https://scoop.iwr.uni-heidelberg.de/teaching/2022ws/short-course-optimization/ 25
R. Herzog cbn

One can show (although we do not do it here) that T𝐹 (𝑥 ∗ ) ⊆ T𝐹lin (𝑥 ∗ ) holds at any feasible point 𝑥 ∗ .
Therefore, we cannot expect that Theorem 4.4 continues to hold if 𝑑 ∈ T𝐹 (𝑥 ∗ ) were replaced by
𝑑 ∈ T𝐹lin (𝑥 ∗ ). In order for that to be possible, we require an extra condition that ensures that T𝐹 (𝑥 ∗ ) =
T𝐹lin (𝑥 ∗ ). Such a condition guarantees the agreement of the geometric and algebraic descriptions and
it is called a constraint qualification.

There are various types of constraint qualifications, and we consider only the most simple one.

Definition 4.7 (LICQ). Suppose that the (feasible) set 𝐹 is given by (1.2) and 𝑥 ∗ ∈ 𝐹 . We say that 𝑥 ∗
satisfies the linear independence constraint qualification (in short: LICQ) if the following set of
vectors is linearly independent:
𝑝
∇𝑔𝑖 (𝑥) 𝑖 ∈ A (𝑥 ∗ ) ∪ ∇ℎ 𝑗 (𝑥) 𝑗=1 .

Here A (𝑥 ∗ ) denotes the set of active indices, i. e. A (𝑥 ∗ ) B {𝑖 = 1, . . . , 𝑚 | 𝑔𝑖 (𝑥 ∗ ) = 0}.

One can show that the LICQ at 𝑥 ∗ implies T𝐹 (𝑥 ∗ ) = T𝐹lin (𝑥 ∗ ). Provided that 𝑥 ∗ is also a local minimizer,
we thus get from Theorem 4.4 that

∇𝑓 (𝑥 ∗ ) ᵀ𝑑 ≥ 0 for all 𝑑 ∈ T𝐹lin (𝑥 ∗ ). (4.11)

What could be the benefit of this, compared to (4.9)?

By a technique called the Farkas lemma we can show that condition (4.11) can be equivalently case as
a set (4.12) of equalities and inequalities, which is amenable to numerical algorithms. We state this
result in the following lemma, without proof.

Lemma 4.8 ((4.11) as KKT conditions).

Suppose that the (feasible) set 𝐹 is given by (4.2) and 𝑥 ∗ ∈ 𝐹 . The following two statements are equivalent.

(a) (4.11) holds.

(b) There exist vectors 𝜇 ∈ R𝑚 and 𝜆 ∈ R𝑝 such that the following set of conditions hold:
𝑚 𝑝
𝜆 𝑗 ∇ℎ 𝑗 (𝑥 ∗ ) = 0, (4.12a)
∑︁ ∑︁
∗ ∗
∇𝑓 (𝑥 ) + 𝜇𝑖 ∇𝑔𝑖 (𝑥 ) +
𝑖=1 𝑗=1
ℎ(𝑥 ) = 0,
∗
(4.12b)
𝜇 ≥ 0, 𝑔(𝑥 ) ≤ 0,
∗
𝜇 𝑔(𝑥 ) = 0.
⊤ ∗
(4.12c)

The system (4.12) is known as the Karush-Kuhn-Tucker conditions (in short: KKT conditions) of
problem (4.1). The vectors 𝜇 and 𝜆 are called Lagrange multipliers associated with the inequality
and equality constraints, respectively. A point 𝑥 ∗ which admits associated Lagrange multipliers 𝜇 and
𝜆 so that (4.12) holds is called a KKT point.

Condition (4.12a) is often written in terms of the Lagrangian associated with problem (4.1), which is
defined as
L (𝑥, 𝜇, 𝜆) B 𝑓 (𝑥) + 𝜇 ᵀ𝑔(𝑥) + 𝜆 ᵀℎ(𝑥). (4.13)

26 https://scoop.iwr.uni-heidelberg.de/teaching/2022ws/short-course-optimization/ 2023-03-29
cbn Introduction to Optimization

We can easily see that (4.12a) is nothing but ∇𝑥 L (𝑥 ∗, 𝜆, 𝜇) = 0.

We summarize our findings in the following

Theorem 4.9 (First-order necessary optimality conditions under LICQ, compare Theorem 4.4).
Suppose that 𝑥 ∗ is a local minimizer of an optimization problem with feasible set 𝐹 . Moreover, suppose
that the LICQ holds at 𝑥 ∗ . Then there exist Lagrange multipliers 𝜇 ∈ R𝑛ineq and 𝜆 ∈ R𝑛eq such that the
KKT conditions (4.12) hold. Moreover, 𝜇 and 𝜆 are unique.

The uniqueness of the multipliers follows from viewing (4.12a) as a linear system of equations for 𝜇 and
𝜆 and observing that the LICQ implies that the coefficient matrix has linearly independent columns.

A number of comments concerning (4.12) are in order.

(a) In the absence of inequality constraints (𝑛 ineq = 0), (4.12) is a (generally nonlinear) system of
equations only.

(b) Condition (4.12c) — which comes in through inequality constraints — is called a complemen-
tarity system. Due to the signs of 𝜇 and 𝑔(𝑥 ∗ ), we can equivalently write 𝜇 ᵀ𝑔(𝑥 ∗ ) = 0 compo-
nentwise as 𝜇𝑖 𝑔𝑖 (𝑥 ∗ ) = 0 for all 𝑖 = 1, 𝑛 ineq . Consequently, Lagrange multipliers pertaining to
inactive inequality constraints must be zero. On the other hand, a strictly positive multiplier
𝜇𝑖 > 0 indicates that the associated constraint must be active, i. e., 𝑔𝑖 (𝑥 ∗ ) = 0.

We can ask the question how “likely” it is for the LICQ to hold for an optimization problem at hand.
First of all, the very Definition 4.7 implies that for the LICQ to hold there cannot be too many active
inequality and equality constraints in the problem. As soon as the combined number of active inequality
and equality constraints at a point 𝑥 ∗ exceeds 𝑛 (the dimension of the optimization variable), the LICQ
cannot hold at 𝑥 ∗ .

What can we learn from this? When we are modeling an optimization problem, the observation above
tells us not to unconsciously throw in additional constraints (which may not be needed) at will. The
following example gives us additional hints about things to avoid in modeling constraints:

Example 4.10 (LICQ holding or not holding?). Consider the following formulations of a (trivial)
optimization problem with variable 𝑥 ∈ R, and decide whether or not the LICQ holds at the (obvious)
solution:

Minimize 𝑥 2 
Minimize 𝑥 2 Minimize 𝑥 2 Minimize 𝑥 2
( ) ( ) ( ) 
 
 
subject to 𝑥 ≤ 0

 

subject to 𝑥 = 0 subject to 𝑥 2 = 0 subject to 𝑥 2 ≤ 0
and 𝑥 ≥ 0 

 


 

The following example shows that the failure of constraint qualifications to hold can indeed be
problematic in practice:

https://scoop.iwr.uni-heidelberg.de/teaching/2022ws/short-course-optimization/ 27
R. Herzog cbn

Example 4.11 (Minimizer where the KKT conditions fail to hold). Consider the problem

Minimize − 𝑥1 where 𝑥 ∈ R2
subject to 𝑥 2 + 𝑥 13 ≤ 0
and − 𝑥2 ≤ 0

Obviously, 𝑥 ∗ = (0, 0) ᵀ is the unique global minimizer, and there are no further local minimizers. Since
both constraints are active at 𝑥 ∗ , the KKT conditions (4.12) boil down to

0 0 0

−1

+ 𝜇1 + 𝜇2 = .
0 1 −1 0

Clearly, it is impossible to satisfy the KKT conditions with any 𝜇1, 𝜇2 ≥ 0. Hence, Lagrange multipliers do
not exist at this minimizer. In other words, the minimizer 𝑥 ∗ fails to be a KKT point.

The observation in the previous example is noteworthy since essentially all numerical algorithms for
the solution of smooth, constrained problems (4.1) are indeed looking for KKT points.

Example 4.12 (Rosenbrock’s post office problem). A simple test example for nonlinear programming is
Rosenbrock’s post office problem:

Minimize − 𝑥1 𝑥2 𝑥3 where 𝑥 ∈ R3
subject to 𝑥 1 + 2 𝑥 2 + 2 𝑥 3 − 72 ≤ 0
and 𝑥 1 + 2 𝑥 2 + 2 𝑥 3 ≥ 0
and 0 ≤ 𝑥𝑖 ≤ 42, 𝑖 = 1, 2, 3

This problem has only linear constraints13 but a nonlinear objective. The global minimizer is 𝑥 ∗ =
(24, 12, 12) ᵀ .

As we discussed for unconstrained problems in § 2, most optimization algorithms proceed in a sequential

way and break down problem (4.1) into a sequence of simpler problems. We will consider here only
the class of sequential quadratic programming (SQP) methods, where these simpler problems
are quadratic programming (QP) problems. A QP is a problem of type (4.2) whose objective is a
quadratic polynomial and all of whose constraints are linear. In an SQP method, the QP formed at

13 Problems featuring only linear constraints satisfy a constraint qualification known as Abadie constraint qualification
(ACQ) at every feasible point. Hence for problems which have only linear constraints, every local minimizer is a KKT
point. However, in contrast to LICQ, the Lagrange multipliers need not be unique.

28 https://scoop.iwr.uni-heidelberg.de/teaching/2022ws/short-course-optimization/ 2023-03-29
cbn Introduction to Optimization

iterate 𝑥 (𝑘 ) is of the form

linear term 
z }| { 


Minimize 𝑞 (𝑥) B L (𝑥 , 𝜇 , 𝜆 ) + L𝑥 (𝑥 , 𝜇 , 𝜆 ) (𝑥 − 𝑥 )
(𝑘 ) (𝑘 ) (𝑘 ) (𝑘 ) (𝑘 ) (𝑘 ) (𝑘 ) (𝑘 )




| {z } 


constant term


1



+ (𝑥 − 𝑥 (𝑘 ) ) ᵀ𝐻 (𝑘 ) (𝑥 − 𝑥 (𝑘 ) ) where 𝑥 ∈ R𝑛 (4.14)


2 

| {z } 
quadratic term





subject to 𝑔𝑖 (𝑥 ) + 𝑔𝑖′ (𝑥 (𝑘 ) ) (𝑥 − 𝑥 (𝑘 ) ) ≤ 0
(𝑘 )
with 𝑖 = 1, . . . , 𝑛 ineq





and ℎ 𝑗 (𝑥 (𝑘 ) ) + ℎ ′𝑗 (𝑥 (𝑘 ) ) (𝑥 − 𝑥 (𝑘 ) ) = 0 with 𝑗 = 1, . . . , 𝑛 eq .




In comparison to (2.2), the objective is now a quadratic model of the Lagrangian, rather than a
quadratic model of the objective. Depending on how the Hessians 𝐻 (𝑘 ) of the model are formed,
one can distinguish between quasi-Newton and Newton-type SQP methods. As was the case in § 2,
precautions need to be taken in order to achieve good convergence properties regardless of the quality
of the initial guess provided by the user. As before, there are line-search as well as trust-region but
also filter SQP methods around for this purpose.

Many other details need to be taken into account in order to obtain a robust SQP solver, including
potentially infeasible QP subproblems, update of the Lagrange multipliers, stopping criteria, deteri-
oration of superlinear convergence due to nonlinear constraints, failure of constraint qualifications
etc.

We close this section by mentioning what functions a user has to implement in order to run an SQP
for solving a constrained problem (4.1):

objective 𝑓 (𝑥)
derivative 𝑓 ′ (𝑥) optional (finite difference fallback)
2nd derivative 𝑓 ′′ (𝑥) or 𝑓 ′′ (𝑥) 𝑑 optional, only for Newton SQP methods
constraints 𝑔(𝑥), ℎ(𝑥)
Jacobian 𝑔′ (𝑥), ℎ ′ (𝑥) optional (finite difference fallback)
2nd derivative 𝑔𝑖′′ (𝑥), ℎ ′′𝑗 (𝑥) or 𝑔𝑖′′ (𝑥) 𝑑, ℎ ′′𝑗 (𝑥) 𝑑 optional, only for Newton SQP methods

End of Class 3

https://scoop.iwr.uni-heidelberg.de/teaching/2022ws/short-course-optimization/ 29
R. Herzog cbn

§5 Infinite-Dimensional Optimization

In this section we consider problems which are infinite-dimensional, i. e., which feature infinite-
dimensional optimization variables. Most problems in this category involve a differential equation
of some sort, and the optimization variables are functions. Although for the purpose of numerical
optimization, these functions clearly need to be discretized to become finite-dimensional objects,
it is still useful to recognize the properties of the underlying undiscretized (infinite-dimensional)
problem.

The following example is adapted from section 8 in the CasADi documentation.

Example 5.1 (Optimal control of the van der Pol oscillator). Consider the following optimal control
problem:

∫ 𝑇 ∫ 𝑇
Minimize (𝑥 12 + 𝑥 22 ) d𝑡 +𝛾 𝑢 2 d𝑡 where 𝑥 : [0,𝑇 ] → R2, 𝑢 : [0,𝑇 ] → R
0 0
𝑥¤1 = 𝜇 (1 − 𝑥 22 ) 𝑥 1 − 𝑥 2 + 𝑢
(
subject to
𝑥¤2 = 𝑥 1 (5.1)
0

𝑥 1 (0)

initial conditions =
𝑥 2 (0) 1
and control constraints − 1 ≤ 𝑢 ≤ 1.

The ordinary differential equation (ODE) above is known as the van der Pol oscillator. The unknown
function 𝑥 : [0,𝑇 ] → R2 is the state variable of the optimal control problem (5.1) and 𝑢 : [0,𝑇 ] → R
is the control variable. All constraints are meant in a pointwise sense on the interval [0,𝑇 ] with final
time 𝑇 > 0 given. The constant 𝜇 ≥ 0 is a given physical parameter. The objective expresses the goal to
steer the system to its equilibrium point at the origin, while the second term in the objective penalizes the
control effort. The parameter 𝛾 ≥ 0 is often called the control-cost parameter and it is used to balance both
terms in the objective.

We are going to discretize problem (5.1) using a so-called direct single shooting approach. This means
that we are going to use a numerical solver for the differential equation, which returns an approximate
solution for 𝑥 (𝑡) on a discrete grid of 𝑡-values. Therefore, the only optimization variable remaining is
the control function 𝑢. This function is discretized on a grid that is possibly coarser than the grid for
the state and we take 𝑢 to be piecewise constant.

30 https://scoop.iwr.uni-heidelberg.de/teaching/2022ws/short-course-optimization/ 2023-03-29
cbn Introduction to Optimization

control
state

𝑡
0 Δ𝑡 2Δ𝑡 3Δ𝑡 𝑇

A possible solution approach to this problem is shown in § 7.10. Experiments show that off-the-shelf
solvers have difficulties solving the problem even for moderate discretization sizes. For instance, we
typically observe an increase in the iteration numbers as we refine the control discretization. This is
the case in particular for CasADi’s default solver Ipopt with default settings.

The reason for this behavior is that most solvers for nonlinear programming problems (see § 4) are
designed for genuinely finite-dimensional problems. In particular, they are unaware of the geometry
of the problem. In a function space-aware method, the user would need to choose an inner product,
which ideally should be inherited from the infinite-dimensional limit problem. This has an impact,
for instance, how gradients are calculated, the geometry of trust regions, on stopping criteria, and
many other algorithmic components. Of course one may argue that, for the discretized problem
in R𝑛 , all norms are equivalent, but the equivalence constants typically deteriorate with increasing
dimension 𝑛.

A simple strategy can sometimes be successful to mitigate the difficulties of off-the-shelf solvers: one
first solves the problem on a coarse grid, then interpolates the solution to a finer grid and uses this as
an initial guess for the next round, until the desired discretization level is reached.

However, true efficiency requires an algorithm which “makes sense” also for the infinite-dimensional
limit problem. Only then can we hope for convergence results which are independent of the fine-
ness of discretization. These algorithms, however, are not generally available as software. As a
researcher working with infinite-dimensional problems, one therefore often finds oneself designing
and implementing the algorithm of choice oneself.

https://scoop.iwr.uni-heidelberg.de/teaching/2022ws/short-course-optimization/ 31
R. Herzog cbn

§6 Summary and Practical Advice

We conclude these notes with a short summary and some practical advice.

(1) Modeling an optimization problem is a skill that requires training. The choices made may have
an impact on the characteristics of a problem.

(2) For example, we saw that squaring a constraint ℎ 𝑗 (𝑥) = 0 to become [ℎ 𝑗 (𝑥)] 2 = 0 is not a good
idea.

(3) There is not a single best solver for all kinds of optimization problems.

(4) It is useful to have an overview over different problem types in optimization in order to be able
to make informed modeling decisions and solver choices.

(5) Most algorithms rely on and exploit the smoothness of the objective and constraint functions.
They may not stall on the occasional kink, but generally we should strive to formulate our
problems in a smooth way.

(6) Depending on the type of algorithm, the user may have to implement different routines than
expected. For instance, for algorithms dedicated to least-squares problems, the user implements
the residual function (not the objective). The Chambolle-Pock method for composite convex
problems asks the user to implement prox operators for 𝜏 𝑓 and 𝜎 𝑔∗ .
End of Class 4

32 https://scoop.iwr.uni-heidelberg.de/teaching/2022ws/short-course-optimization/ 2023-03-29
cbn Introduction to Optimization

§7 Solutions

§ 7.1 Solution of Example 2.2 (Rosenbrock) using CasADi (Problem-Based)

CasADi provides algorithmic differentiation for user-defined functions. We demonstrate below the
description of the problem in a suitable form for CasADi’s nlpsol solver interface.

# This code solves an unconstrained optimization problem with the Rosenbrock

# function using CasADi's python bindings. For lack of a solver specializing
# in unconstrained optimization solver, the solver of choice is nlpsol with
# ipopt, see https://web.casadi.org/docs/#nonlinear-programming.

# Resolve the dependencies.

from casadi import *
import matplotlib.pyplot as plt
import numpy as np

# Create a 1-by-1 optimization variable (termed 'x') and another 1-by-1

# optimization variable (termed 'y').
# SX is the class in CasADi representing matrices composed of symbolic expressions.
# There is also MX for more complex expressions.
x = SX.sym('x', 1, 1)
y = SX.sym('y', 1, 1)

# Set the fixed problem parameters.

a = DM(1)
b = DM(100)

# Setup the objective.

f = (a - x)**2 + b * (y - x**2)**2

# Setup the solver.

problem = {'x': vertcat(x,y), 'f': f}
solver = nlpsol('rosenbrock', 'ipopt', problem)

# Call the solver.

result = solver()

# Show the solution.

print()
print(result['x'])

https://scoop.iwr.uni-heidelberg.de/teaching/2022ws/short-course-optimization/ 33
R. Herzog cbn

§ 7.2 Solution of Example 2.2 (Rosenbrock) using CasADi (Opti Interface-Based)

CasADi offers the Opti interface, which allows a simplified description and solution of optimization
problems.

# This code solves an unconstrained optimization problem with the Rosenbrock

# function using CasADi's python bindings, and CasADi's Opti interface for
# modeling of (un)constrained optimization problems.

# Resolve the dependencies.

from casadi import *

# Create an instance of the Opti interface.

opti = Opti()

# Create a 1-by-1 optimization variable (termed 'x') and another 1-by-1

# optimization variable (termed 'y').
x = opti.variable()
y = opti.variable()

# Set the fixed problem parameters.

a = opti.parameter()
b = opti.parameter()
opti.set_value(a, 1)
opti.set_value(b, 100)

# Setup the objective.

f = (a - x)**2 + b * (y - x**2)**2
opti.minimize(f)

# Setup the solver.

opti.solver('ipopt')

# Call the solver.

result = opti.solve()

# Show the solution.

print()
print(result.value(x), result.value(y))

34 https://scoop.iwr.uni-heidelberg.de/teaching/2022ws/short-course-optimization/ 2023-03-29
cbn Introduction to Optimization

§ 7.3 Solution of Example 2.3 (Airplane) using CasADi (Problem-Based)

Apparently CasADi has no interface for solving least-squares problems. Therefore, we resort to
treating the problem as a generic unconstrained optimization problem. Consequently — in contrast to
using a dedicated least-squares solver — we have to implement the objective 𝑓 (𝑥) = 21 𝑖=1 [𝑟𝑖 (𝑥)] 2
Í𝑀
instead of the residual function 𝑟 (𝑥).

# This code solves a parameter estimation problem for a given model, based on a
# set of measurement pairs, using CasADi's python bindings for modeling of
# (un)constrained optimization problems.

# Resolve the dependencies.

from casadi import *
import matplotlib.pyplot as plt

# Create the unknown parameters as optimization variables.

xy = MX.sym('xy', 2, 1)

# Specify the pairs of measurements.

measurements = [
([ 8, 6], 38),
([-3, -3], 220),
([ 1, 0], 222),
([ 8, -3], 300),
]

# Declare the beacon position as a variable.

beacon = MX.sym('beacon position', 2, 1)

# Specify the model function.

angle = fmod(atan2(beacon[1] - xy[1], beacon[0] - xy[0]) + 2*pi, 2*pi) * 180 / pi
model = Function('airplane', [xy, beacon], [angle], ['xy', 'beacon'], ['angle'])

# Assemble the least-squares objective function.

f = 0
for i in range(len(measurements)):
residual = model(xy = xy, beacon = measurements[i][0])['angle'] - measurements[i][1]
f = f + residual**2
f = 0.5 * f

# Setup the initial guess.

x0 = [0, 0]

# Setup the solver.

problem = {'x': xy, 'f': f}
solver = nlpsol('airplane', 'ipopt', problem)

# Call the solver.

result = solver(x0 = x0)

# Show the solution.

print()
print(result['x'])

https://scoop.iwr.uni-heidelberg.de/teaching/2022ws/short-course-optimization/ 35
R. Herzog cbn

§ 7.4 Solution of Example 4.1 (Production) using Matlab’s Optimization Toolbox

(Solver-Based)

The classical use of Matlab’s optimization toolbox required the user to model their optimization
problem in a format suitable for the respective solver to be used. Since Example 4.1 is a linear
optimization problem, linprog is the solver of choice.

% This code solves the production example problem using Matlab's optimization
% toolbox. The problem is a linear optization problem. The solver of choice is
% linprog from Matlab's optimization toolbox, see
% https://de.mathworks.com/help/optim/ug/linprog.html.

% Setup the cost vector describing the objective c'*x.

c = [-1; -1];

% Setup the matrix A and right hand side b for the inequality constraint
% A * x <= b.
A = [50, 24; 30, 33];
b = [2400; 2100];

% Setup the matrix Aeq and right hand side beq for the (void) equality constraint
% Aeq * x = beq.
Aeq = [];
beq = [];

% Setup the lower and upper bounds.

lb = [36; 5];
ub = [];

% Call the solver.

[x, fval] = linprog(c, A, b, Aeq, beq, lb, ub);

% Show the solution.

disp(x)

36 https://scoop.iwr.uni-heidelberg.de/teaching/2022ws/short-course-optimization/ 2023-03-29
cbn Introduction to Optimization

§ 7.5 Solution of Example 4.1 (Production) using Matlab’s Optimization Toolbox

(Problem-Based)

With more recent versions of Matlab’s optimization toolbox, the user may formulate their problem in
a more natural way, and independently of the solver. The solver will be picked automatically depending
on the problem characteristics.

% This code solves the production example problem using Matlab's optimization
% toolbox. The problem is modeled using the problem-based approach, which
% allows for a more natural problem formulation and automatic selection of a
% suitable solver, see
% https://de.mathworks.com/help/optim/problem-based-basics.html.

% Create a maximization-type problem.

problem = optimproblem('ObjectiveSense', 'maximize', 'Description', 'production');

% Create a 2-by-1 optimization variable (termed 'x').

x = optimvar('x', 2, 1, 'LowerBound', [36; 5]);

% Setup the objective.

problem.Objective = x(1) + x(2);

% Setup the (named) inequality constraints.

problem.Constraints.timeConstraintA = 50 * x(1) + 24 * x(2) <= 2400;
problem.Constraints.timeConstraintB = 30 * x(1) + 33 * x(2) <= 2100;

% Review the problem.

show(problem)

% Call the solver.

result = solve(problem);

% Show the solution.

disp(result.x)

https://scoop.iwr.uni-heidelberg.de/teaching/2022ws/short-course-optimization/ 37
R. Herzog cbn

§ 7.6 Solution of Example 4.1 (Production) using scipy.optimize

Here we demonstrate the solution of Example 4.1 with scipy.optimize.linprog. We need to present
the problem in a suitable form to the solver.

# This code solves the production example problem using the scipy.optimize
# package. The problem is a linear optization problem. The solver of choice is
# scipy.optimize.linprog, see
# https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.linprog.html.

# Resolve the dependencies.

from scipy.optimize import linprog

# Setup the cost vector describing the objective c @ x.

c = [-1, -1]

# Setup the matrix A and right hand side b for the inequality constraint
# A @ x <= b.
A = [[50, 24], [30, 33]]
b = [2400, 2100]

# Setup the lower and upper bounds.

bounds = [(36, None), (5, None)]

# Call the solver.

result = linprog(c, A_ub = A, b_ub = b, bounds = bounds)

# Show the solution.

print(result['x'])

38 https://scoop.iwr.uni-heidelberg.de/teaching/2022ws/short-course-optimization/ 2023-03-29
cbn Introduction to Optimization

§ 7.7 Solution of Example 4.1 (Production) using CasADi (Problem-Based)

CasADi provides algorithmic differentiation for user-defined functions (although this is not needed
for linear optimization problems). We demonstrate below the description of the problem in a suitable
form for CasADi’s nlpsol solver interface.

# This code solves the production example problem using CasADi's python
# bindings. The problem is a linear optization problem. For lack of a linear
# optimization solver, the solver of choice is nlpsol with ipopt, see
# https://web.casadi.org/docs/#nonlinear-programming.

# Resolve the dependencies.

from casadi import *

# Create a 2-by-1 optimization variable (termed 'x').

# SX is the class in CasADi representing matrices composed of symbolic expressions.
# There is also MX for more complex expressions.
x = SX.sym('x', 2, 1)

# Setup the objective.

f = - x[0] - x[1]

# Setup the constraint function g representing the inequality constraint

# A @ x <= b.
# DM is the class in CasADi representing matrices with numerical values.
A = DM([[50, 24], [30, 33]])
g = A @ x

# Setup the upper bound for the constraint function g.

b = DM([2400, 2100])

# Setup the lower bound for the variable x.

l = DM([36, 5])

# Setup the solver.

problem = {'x': x, 'f': f, 'g': g}
solver = nlpsol('production', 'ipopt', problem)

# Call the solver, providing the upper bounds for the inequality constraint
# function g and the lower bound for the optimization variable.
result = solver(ubg = b, lbx = l)

# Show the solution.

print()
print(result['x'])

https://scoop.iwr.uni-heidelberg.de/teaching/2022ws/short-course-optimization/ 39
R. Herzog cbn

§ 7.8 Solution of Example 4.1 (Production) using CasADi (Opti Interface-Based)

CasADi offers the Opti interface, which allows a simplified description and solution of optimization
problems.

# This code solves the production example problem using CasADi's python
# bindings, and CasADi's Opti interface for modeling of (un)constrained
# optimization problems.

# Resolve the dependencies.

from casadi import *

# Create an instance of the Opti interface.

opti = Opti()

# Create a 2-by-1 optimization variable (termed 'x').

x = opti.variable(2)

# Setup the objective.

opti.minimize(- x[0] - x[1])

# Setup the constraint function representing the inequality constraint

# A @ x <= b.
# DM is the class in CasADi representing matrices with numerical values.
A = DM([[50, 24], [30, 33]])
b = DM([2400, 2100])
opti.subject_to(A@x <= b)

# Setup the lower bound for the variable x.

l = DM([36, 5])
opti.subject_to(x >= l)

# Setup the solver.

opti.solver('ipopt')

# Call the solver.

result = opti.solve()

# Show the solution.

print()
print(result.value(x))

40 https://scoop.iwr.uni-heidelberg.de/teaching/2022ws/short-course-optimization/ 2023-03-29
cbn Introduction to Optimization

§ 7.9 Solution of Example 4.12 (Post-Office) using CasADi (Opti Interface-Based)

CasADi offers the Opti interface, which allows a simplified description and solution of optimization
problems.

# This code solves Rosenbrock's post-oddice example problem using CasADi's

# python bindings, and CasADi's Opti interface for modeling of (un)constrained
# optimization problems.

# Resolve the dependencies.

from casadi import *

# Create an instance of the Opti interface.

opti = Opti()

# Create a 3-by-1 optimization variable (termed 'x').

x = opti.variable(3)

# Setup the objective.

opti.minimize(- x[0] * x[1] * x[2])

# Setup the constraint functions.

opti.subject_to(x[0] + 2 * x[1] + 2 * x[2] - 72 <= 0)
opti.subject_to(x[0] + 2 * x[1] + 2 * x[2] >= 0)
opti.subject_to(opti.bounded(0, x, 42))

# Setup the solver.

opti.solver('ipopt')

# Call the solver.

result = opti.solve()

# Show the solution.

print()
print(result.value(x))

https://scoop.iwr.uni-heidelberg.de/teaching/2022ws/short-course-optimization/ 41
R. Herzog cbn

§ 7.10 Solution of Example 5.1 (van der Pol) using CasADi (Problem-Based)

# This code solves a discretized optimal control problem for the van der Pol
# oscillator.using CasADi's python bindings. The discretization is achieved
# using direct single shooting.

# Resolve the dependencies.

import numpy as np
from casadi import *
from matplotlib import pyplot as plt

# Define the control bounds.

lbu = -1
ubu = +1

# Define the final time and number of control intervals.

T = 20
nu = 10
control_grid = np.linspace(0, T, num = nu + 1)
dtu = T / (nu + 1)

# Define the step size for the state integrator and determine how many state
# points are associated with each control interval.
dtx_target = 0.1
states_per_control_interval = max([round(dtu / dtx_target), 1])

# Declare the control variables as our optimization variables.

U = MX.sym("u", 1, nu)

# Set the initial value for the state.

x0 = [0, 1]

# Define the parameter mu in the equation.

mu = 2

# Define the control cost parameter gamma.

gamma = 1

# Define the right-hand side function.

rhs = lambda x, u: vertcat(mu*(1 - x[1]**2) * x[0] - x[1] + u, x[0])

# Define a function which integrates the ODE one step.

# We use here an explicit Euler method.
def take_step(x, u, dt):
return x + dt * rhs(x, u)

# Integrate over all control intervals.

x = x0

# Prepare the integration loop and objective evaluation.

objective = 0
for step in range(nu):

# Get the step size for the state integrator.

dtu = control_grid[step+1] - control_grid[step]
dtx = dtu / states_per_control_interval

42 https://scoop.iwr.uni-heidelberg.de/teaching/2022ws/short-course-optimization/ 2023-03-29
cbn Introduction to Optimization

# Advance the state until the end of the current control interval and
# update the objective.
for substep in range(states_per_control_interval):
x = take_step(x, U[step], dtx)
objective = objective + dtx * (x[0]**2 + x[1]**2)

# Add the control part to the objective.

objective = objective + gamma * dtu * U[step]**2

# Setup the solver.

problem = {'x': U, 'f': objective}
solver = nlpsol('vanderpol', 'ipopt', problem)

# Call the solver.

result = solver(
lbx = lbu,
ubx = ubu,
)

# Show the solution.

U = np.array(result['x'])[0]
print()
print(U)

https://scoop.iwr.uni-heidelberg.de/teaching/2022ws/short-course-optimization/ 43
Bibliography
Andersson, J. A. E.; J. Gillis; G. Horn; J. B. Rawlings; M. Diehl (2018). “CasADi: a software framework for
nonlinear optimization and optimal control”. Mathematical Programming Computation 11.1, pp. 1–36.
doi: 10.1007/s12532-018-0139-4.
Chambolle, A.; T. Pock (2011). “A first-order primal-dual algorithm for convex problems with appli-
cations to imaging”. Journal of Mathematical Imaging and Vision 40.1, pp. 120–145. doi: 10.1007/
s10851-010-0251-1.
Clason, C. (2017). Nonsmooth analysis and optimization. arXiv: 1708.04180.
Rudin, L. I.; S. Osher; E. Fatemi (1992). “Nonlinear total variation based noise removal algorithms”.
Physica D 60.1–4, pp. 259–268. doi: 10.1016/0167-2789(92)90242-F.

https://scoop.iwr.uni-heidelberg.de/teaching/2022ws/short-course-optimization/

Web Design and Development (HTML, CSS and JavaScript)
100% (6)
Web Design and Development (HTML, CSS and JavaScript)
68 pages
JPOS Version 1 6
100% (2)
JPOS Version 1 6
802 pages
Operation Research
No ratings yet
Operation Research
211 pages
ML5
No ratings yet
ML5
5 pages
Daa 3
No ratings yet
Daa 3
66 pages
1995 Nsga I
No ratings yet
1995 Nsga I
28 pages
Math 404 - W01 - Intro
No ratings yet
Math 404 - W01 - Intro
28 pages
Review of Optimization Techniques
No ratings yet
Review of Optimization Techniques
13 pages
Unit 2: Algorithm (2 Hrs and Contains 3 Marks)
No ratings yet
Unit 2: Algorithm (2 Hrs and Contains 3 Marks)
6 pages
ETHZ Lecture1
No ratings yet
ETHZ Lecture1
49 pages
Unit-I
No ratings yet
Unit-I
75 pages
(PR4)
No ratings yet
(PR4)
7 pages
Ee364a Homework 1 Solutions
100% (1)
Ee364a Homework 1 Solutions
8 pages
Prerna and Sharma 2024
No ratings yet
Prerna and Sharma 2024
18 pages
Class 12 Maths Chapterwise Topicwise Notes Chapter 12 Linear Programming
No ratings yet
Class 12 Maths Chapterwise Topicwise Notes Chapter 12 Linear Programming
69 pages
Venter Review 2010
No ratings yet
Venter Review 2010
12 pages
Review of Optimization Techniques PDF
No ratings yet
Review of Optimization Techniques PDF
13 pages
Review of Optimization Techniques
No ratings yet
Review of Optimization Techniques
13 pages
Ee364a Homework 5 Solutions
100% (1)
Ee364a Homework 5 Solutions
6 pages
02 Greedy Algorithms Problems
No ratings yet
02 Greedy Algorithms Problems
18 pages
Unit-4 Dynamic Programming
No ratings yet
Unit-4 Dynamic Programming
131 pages
Ee364a Homework Solutions
100% (1)
Ee364a Homework Solutions
4 pages
Adsa U4,1
No ratings yet
Adsa U4,1
4 pages
COMP3308/COMP3608 Artificial Intelligence Week 10 Tutorial Exercises Support Vector Machines. Ensembles of Classifiers
No ratings yet
COMP3308/COMP3608 Artificial Intelligence Week 10 Tutorial Exercises Support Vector Machines. Ensembles of Classifiers
3 pages
1 Introduction
No ratings yet
1 Introduction
24 pages
Dynamic Programming Algorithms
No ratings yet
Dynamic Programming Algorithms
6 pages
Lect Exf Rev
No ratings yet
Lect Exf Rev
3 pages
Optimization Using Linear Programming
No ratings yet
Optimization Using Linear Programming
31 pages
Discrete Optimization
No ratings yet
Discrete Optimization
15 pages
Module Iii
No ratings yet
Module Iii
8 pages
3.1. LPM + Graphic Approach
No ratings yet
3.1. LPM + Graphic Approach
44 pages
Dynamic Programming and Applications
No ratings yet
Dynamic Programming and Applications
17 pages
IB352 Warwick Wk1 - Introduction
No ratings yet
IB352 Warwick Wk1 - Introduction
36 pages
Heat and Mass Transfer Rajput
No ratings yet
Heat and Mass Transfer Rajput
8 pages
ALGORITHM DESIGN TECHNIQUES
No ratings yet
ALGORITHM DESIGN TECHNIQUES
16 pages
6214 Chap01
No ratings yet
6214 Chap01
34 pages
Lecture 1 - Introduction To Optimization PDF
No ratings yet
Lecture 1 - Introduction To Optimization PDF
31 pages
Literature Review On Simplex Method
100% (2)
Literature Review On Simplex Method
7 pages
Chapter 4.1 - Dynamic Programming
No ratings yet
Chapter 4.1 - Dynamic Programming
2 pages
Optimization Models HUST
No ratings yet
Optimization Models HUST
24 pages
Composite Differential Evolution With Modified Ora
No ratings yet
Composite Differential Evolution With Modified Ora
15 pages
Unit 3 Greedy & Dynamic Programming
No ratings yet
Unit 3 Greedy & Dynamic Programming
217 pages
A Comparative Study of Mixed-Integer Linear Programming and Genetic Algorithms For Solving Binary Problems
No ratings yet
A Comparative Study of Mixed-Integer Linear Programming and Genetic Algorithms For Solving Binary Problems
5 pages
Chapter 1 - Introduction
No ratings yet
Chapter 1 - Introduction
21 pages
Assignment_NumericalOptimization
No ratings yet
Assignment_NumericalOptimization
18 pages
Linear Programming Problem
No ratings yet
Linear Programming Problem
83 pages
Daa Unit-3 (Mid-1)
No ratings yet
Daa Unit-3 (Mid-1)
6 pages
Lecture Note on Power System Optimization1
No ratings yet
Lecture Note on Power System Optimization1
45 pages
Linear programming study
No ratings yet
Linear programming study
19 pages
OT MODULE CHAP 3-6 NOTES
No ratings yet
OT MODULE CHAP 3-6 NOTES
25 pages
12.10 Nonconvex Programming (With Spreadsheets)
No ratings yet
12.10 Nonconvex Programming (With Spreadsheets)
3 pages
CC 1
No ratings yet
CC 1
24 pages
yang2013
No ratings yet
yang2013
9 pages
Linear Programming, Graphical Solution
No ratings yet
Linear Programming, Graphical Solution
13 pages
A HYBRID COA/ε-CONSTRAINT METHOD FOR SOLVING MULTI-OBJECTIVE PROBLEMS
No ratings yet
A HYBRID COA/ε-CONSTRAINT METHOD FOR SOLVING MULTI-OBJECTIVE PROBLEMS
14 pages
Sequencing in Spreasheets
No ratings yet
Sequencing in Spreasheets
17 pages
OT CLG Notes
No ratings yet
OT CLG Notes
322 pages
Metaheuristics Introduction 2
No ratings yet
Metaheuristics Introduction 2
81 pages
Teachtsp PDF
No ratings yet
Teachtsp PDF
8 pages
Dynamic Programming
No ratings yet
Dynamic Programming
11 pages
Linear and Nonlinear Programming Essentials
From Everand
Linear and Nonlinear Programming Essentials
Tanushri Kaniyar
No ratings yet
Constraint Satisfaction: Fundamentals and Applications
From Everand
Constraint Satisfaction: Fundamentals and Applications
Fouad Sabry
No ratings yet
Sony FDR-AX Series
No ratings yet
Sony FDR-AX Series
48 pages
SKWorkshop Help
No ratings yet
SKWorkshop Help
429 pages
Algorithms For Solving Linear Systems of Equations
No ratings yet
Algorithms For Solving Linear Systems of Equations
10 pages
Biomedical Signal Processing and Control: Sajjad Afrakhteh, Hamed Jalilian, Giovanni Iacca, Libertario Demi
No ratings yet
Biomedical Signal Processing and Control: Sajjad Afrakhteh, Hamed Jalilian, Giovanni Iacca, Libertario Demi
11 pages
Socal: Migrate Anything To Mongodb Atlas
100% (1)
Socal: Migrate Anything To Mongodb Atlas
46 pages
3.1 Projections & Reflections
No ratings yet
3.1 Projections & Reflections
4 pages
Archer TX20E (UN) - QIG - V
No ratings yet
Archer TX20E (UN) - QIG - V
2 pages
Hong Kong Arrow Trading Co., Limited: Proforma Invoice
No ratings yet
Hong Kong Arrow Trading Co., Limited: Proforma Invoice
1 page
Accident Detection System Project Report 1
No ratings yet
Accident Detection System Project Report 1
40 pages
Lecture 6 - CS50's Introduction To Programming With Python
No ratings yet
Lecture 6 - CS50's Introduction To Programming With Python
11 pages
MVI56E MCMMCMXT User Manual PDF
No ratings yet
MVI56E MCMMCMXT User Manual PDF
199 pages
Agile 3
No ratings yet
Agile 3
3 pages
Andrew H. Karp, Sierra Information Services, Sonoma, CA USA: Getting in To The Picture (Format)
No ratings yet
Andrew H. Karp, Sierra Information Services, Sonoma, CA USA: Getting in To The Picture (Format)
16 pages
MRSPTU PGDCA 1st Year Syllabus 2019 Batch Onwards On 25-07-2019
100% (1)
MRSPTU PGDCA 1st Year Syllabus 2019 Batch Onwards On 25-07-2019
9 pages
ANPR T.SP_.VEGAHD.13.01
No ratings yet
ANPR T.SP_.VEGAHD.13.01
2 pages
1) Computer Concepts Notes
100% (2)
1) Computer Concepts Notes
6 pages
Class 9 practical
100% (1)
Class 9 practical
2 pages
Determining Fake Statements Made by Public Figures by Means of Artificial Intelligence
No ratings yet
Determining Fake Statements Made by Public Figures by Means of Artificial Intelligence
25 pages
Communication Projects in Labview With PDF
No ratings yet
Communication Projects in Labview With PDF
2 pages
TTK Annual Report FY 2022 23 - Final
No ratings yet
TTK Annual Report FY 2022 23 - Final
212 pages
Drop Box
No ratings yet
Drop Box
171 pages
Data and Digital Comms
No ratings yet
Data and Digital Comms
7 pages
L.V. Siva Reddy: Sivareddylella95@
No ratings yet
L.V. Siva Reddy: Sivareddylella95@
3 pages
A10 LOM Sep20 2016
No ratings yet
A10 LOM Sep20 2016
30 pages
M I L-Reviewer
No ratings yet
M I L-Reviewer
12 pages
MINGGU ANTI DADAH SMKDSH 2020 - ProProfs Crossword Puzzles
No ratings yet
MINGGU ANTI DADAH SMKDSH 2020 - ProProfs Crossword Puzzles
2 pages
Rhel Basics
No ratings yet
Rhel Basics
4 pages
Q1-Avatar-based Innovation - Using Virtual Worlds For Real-World Innovation
No ratings yet
Q1-Avatar-based Innovation - Using Virtual Worlds For Real-World Innovation
13 pages