Theory Note 1
Theory Note 1
Disclaimer: These notes aggregate content from several texts and have not been subjected to the usual
scrutiny deserved by formal publications. If you find errors, please bring to the notice of the Instructor.
In this lecture, we discuss what essentially optimization is, where we use optimization, and look at a very
basic optimization technique called linear programming.
Broadly, optimization problems can be classified into two classes:
The knapsack problem is a classic discrete optimization problem, where we are given a bunch of objects with
specific weights and a knapsack (backpack, for instance), which can carry at most some amount of weight.
We want to fill our knapsack up to the highest permissible weight. There is nothing like partial association
of an item with the knapsack: An item is either in the knapsack or not. A polynomial solution to the most
general knapsack problem is not known. One way to solve it is to brute force all possible combinations and
find the optimal one, but this is not efficient.
In continuous optimization problems, even though the solution space is infinite, it is not very difficult to find
the optimal solution. We will be talking about continuous optimization problems.
There is a significant trend in AI of formulating problems in terms of optimization problems.
There are two components of an optimization problem: The objective function which we want to minimize
(or maximize, which is just the negative of the minimization problem), and the constraint set. Let f (x) be
the objective function, and C be the constraint set. Then, the optimization problem is written as
min f (x).
x∈C
Example. Minimize (x − 2)2 with the constraint that x ∈ [0, 1] ∪ [4, 7].
Here, the objective function is f (x) = (x−2)2 and the constraint set is C = [0, 1]∪[4, 7]. This is a one-variable
function, and we can easily see from the plot in figure 1.1a that x∗ = 1 is the optimal value of x.
Example (Geometry). Suppose we have a map of a country with cities represented by the points y1 , . . . , yn
in two-dimensions. We want to set up a supply chain and deliver to all these cities. We want to build a
warehouse such that the sum of the distances from it to all the cities is minimized (assuming we can transport
the Euclidean way).
1-1
1-2 Lecture 1: The Basics of Optimization
y
01 4 7
x x
(a) A plot of f (x) = (x − 2)2 . (b) An example of curve-fitting.
•y3 •yn
•y2 •x
•y1
λ and k are the hyperparameters. λ penalizes a high difference in intensity of adjacent pixels.
Example (Machine learning). Suppose we have inputs (xi , yi ) for i ∈ [n]. We are trying to fit a curve
through these points. Suppose that we hypothesize the curve to be a polynomial hθ (x) = w0 + w1 x + w2 x2 ,
where θ = (w0 , w1 , w2 ) is a vector of the parameters. For each point xi , we want the hypothesized point
hθ (xi ) to be close to yi . Let ℓ(x, y) = (x − y)2 be a (very simple) loss function. Our objective is to minimize
the sum of the loss over all the observed points.
The optimization problem is
n
X
min3 ℓ(hθ (xi ), yi ).
θ∈R
i=1
Lecture 1: The Basics of Optimization 1-3
It is another type of optimization where we have to find an optimal value of vector x that maximizes the
following function along with satisfying some linear constraints. Let x be a n sized vector and A(m ×
n), c(size n) and b(size m) are given matrices/vectors, then the problem is:
max cT x
x
subject to Ax ≤ b
and x ≥ 0
The above form is the standard form of a linear program. If x = {x1 , x2 , x3 , . . . } and c = {c1 , c2 , c3 , . . . },
then the objective function to be maximized is:
cT x = c1 · x1 + c2 · x2 + c3 · x3 + . . .
Classes
Issues
Class1 Class2 Class3
A −2 5 3
B 8 2 −5
C 0 0 10
D 10 0 2
Population 100 000 200 000 50 000
Majority 50 000 100 000 25 000
The aim of the political party is to minimize the total amount of money it needs to invest, yet get the
required majority across each class. Let x1 , x2 , x3 , x4 be the amount of money the party invests in issues A,
B, C, and D respectively. Then, we have the following optimization problem:
min x1 + x2 + x3 + x4
x1 ,x2 ,x3 ,x4
Let x ∈ Rn be the vector containing the variables to optimize and c ∈ Rn be the vector of constants:
x1 c1
.. ..
x = . , c = . .
xn cn
Then, we can write the standard form of linear program as maxx c⊤ x subject to the constraints
b1
..
Am×n xn×1 ≤ bm×1 = . and x ≥ 0,
bm
The aforementioned problem is commonly referred to as the primal problem and it is accompanied by a
corresponding dual problem:
max c⊤ x min b⊤ y
x y
s.t. Ax ≤ b, s.t. A⊤ y ≥ c,
x ≥ 0. y ≥ 0.
We will be looking at two famous and ubiquitous Duality Principles – Weak Duality and Strong Duality. We
will be providing a proof for the first in this scribe.
Lecture 1: The Basics of Optimization 1-5
Theorem 1.1 (Weak Duality Principle) Let x and y represent feasible solutions, i.e., solutions that
satisfy all the constraints, for the primal and dual problems, respectively. Then,
b⊤ y ≥ c⊤ x.
Ax ≤ b,
and hence
x⊤ A⊤ ≤ b⊤
x⊤ A⊤ y ≤ b⊤ y.
b⊤ y ≥ x⊤ c,
which can also be written as b⊤ y ≥ c⊤ x. Thus, weak duality principle provides a relation between the
solutions of primal and dual problems.
Theorem 1.2 (Strong Duality Principle) It suggests that for any optimal solution x∗ of primal problem
and any optimal solution y ∗ of dual problem, both the optima achieved are equal:
bT y ∗ = cT x∗