0% found this document useful (0 votes)
5 views5 pages

Theory Note 1

The lecture covers the basics of optimization, including its classification into continuous and discrete problems, and introduces linear programming as a method for optimization. It discusses various examples of optimization problems, such as the knapsack problem, supply chain location, image de-blurring, and curve fitting in machine learning. The lecture also explains the standard form of linear programming and the principles of duality, including weak and strong duality.

Uploaded by

saramukhopadhyay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views5 pages

Theory Note 1

The lecture covers the basics of optimization, including its classification into continuous and discrete problems, and introduces linear programming as a method for optimization. It discusses various examples of optimization problems, such as the knapsack problem, supply chain location, image de-blurring, and curve fitting in machine learning. The lecture also explains the standard form of linear programming and the principles of duality, including weak and strong duality.

Uploaded by

saramukhopadhyay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

CS 217: Artificial Intelligence and Machine Learning Jan 10, 2024

Lecture 1: The Basics of Optimization


Lecturer: Swaprava Nath Scribes: SG1 & SG2

Disclaimer: These notes aggregate content from several texts and have not been subjected to the usual
scrutiny deserved by formal publications. If you find errors, please bring to the notice of the Instructor.

In this lecture, we discuss what essentially optimization is, where we use optimization, and look at a very
basic optimization technique called linear programming.
Broadly, optimization problems can be classified into two classes:

Class Variable Solution space Solution complexity


Continuous optimization continuous infinite polynomial in the size of the problem
Discrete optimization discrete finite exponential in the size of the problem

Table 1.1: Classes of optimization problems.

The knapsack problem is a classic discrete optimization problem, where we are given a bunch of objects with
specific weights and a knapsack (backpack, for instance), which can carry at most some amount of weight.
We want to fill our knapsack up to the highest permissible weight. There is nothing like partial association
of an item with the knapsack: An item is either in the knapsack or not. A polynomial solution to the most
general knapsack problem is not known. One way to solve it is to brute force all possible combinations and
find the optimal one, but this is not efficient.
In continuous optimization problems, even though the solution space is infinite, it is not very difficult to find
the optimal solution. We will be talking about continuous optimization problems.
There is a significant trend in AI of formulating problems in terms of optimization problems.

1.1 Optimization: What is it? — Motivating Applications

There are two components of an optimization problem: The objective function which we want to minimize
(or maximize, which is just the negative of the minimization problem), and the constraint set. Let f (x) be
the objective function, and C be the constraint set. Then, the optimization problem is written as

min f (x).
x∈C

Example. Minimize (x − 2)2 with the constraint that x ∈ [0, 1] ∪ [4, 7].
Here, the objective function is f (x) = (x−2)2 and the constraint set is C = [0, 1]∪[4, 7]. This is a one-variable
function, and we can easily see from the plot in figure 1.1a that x∗ = 1 is the optimal value of x.
Example (Geometry). Suppose we have a map of a country with cities represented by the points y1 , . . . , yn
in two-dimensions. We want to set up a supply chain and deliver to all these cities. We want to build a
warehouse such that the sum of the distances from it to all the cities is minimized (assuming we can transport
the Euclidean way).

1-1
1-2 Lecture 1: The Basics of Optimization

y
01 4 7
x x
(a) A plot of f (x) = (x − 2)2 . (b) An example of curve-fitting.

•y3 •yn

•y2 •x

•y1

(c) Map of a country with cities at y1 , . . . , yn and a


warehouse at x.

Figure 1.1: Examples of optimization problems.

Let the warehouse be located at x. Then, the optimization problem is


m
X
min ∥x − yi ∥2 ,
x∈C
i=1

where C is the set of all points in the country, and ∥x∥2 =


p
x21 + x22 represents the L2 norm of x = (x1 , x2 ).
Example (Computer vision). Image de-blurring is a common problem in computer vision. We want to
de-blur a blurred image according to some policy.
We consider grayscale images of size m × n, where each pixel has an intensity value in [0, 1]: 0 meaning black,
and 1 meaning white. Let y = [yi,j ]m×n be the blurred image that we are given. Let x = [xi,j ]m×n be the
original image and k be the blurring filter which was applied to get y. Then, to get the original (de-blurred)
image, we have to solve
 
m−1 n−1 m−1 n−1
X X XX
(xi,j − xi,j+1 )2 + (xi+1,j − xi,j )2  .

minm×n  |yi,j − (k ∗ x)i,j | + λ
x∈[0,1]
i=0 j=0 i=0 j=0
| {z }
to reduce sudden intensity changes

λ and k are the hyperparameters. λ penalizes a high difference in intensity of adjacent pixels.
Example (Machine learning). Suppose we have inputs (xi , yi ) for i ∈ [n]. We are trying to fit a curve
through these points. Suppose that we hypothesize the curve to be a polynomial hθ (x) = w0 + w1 x + w2 x2 ,
where θ = (w0 , w1 , w2 ) is a vector of the parameters. For each point xi , we want the hypothesized point
hθ (xi ) to be close to yi . Let ℓ(x, y) = (x − y)2 be a (very simple) loss function. Our objective is to minimize
the sum of the loss over all the observed points.
The optimization problem is
n
X
min3 ℓ(hθ (xi ), yi ).
θ∈R
i=1
Lecture 1: The Basics of Optimization 1-3

1.2 Linear Programming: A way for Optimization

It is another type of optimization where we have to find an optimal value of vector x that maximizes the
following function along with satisfying some linear constraints. Let x be a n sized vector and A(m ×
n), c(size n) and b(size m) are given matrices/vectors, then the problem is:

max cT x

x

subject to Ax ≤ b
and x ≥ 0
The above form is the standard form of a linear program. If x = {x1 , x2 , x3 , . . . } and c = {c1 , c2 , c3 , . . . },
then the objective function to be maximized is:

cT x = c1 · x1 + c2 · x2 + c3 · x3 + . . .

For any two vectors {u, v}, u ≤ v means ui ≤ vi ∀i.

Example (Political Winning). Investing money to win the election


Consider a political scenario where a political party P needs to invest money to win elections. There are
3 different demographic classes, class 1, class 2, and class 3, and four issues to be addressed: A, B, C, and
D. There is a specific pattern in which people from different classes respond to different issues. Table 1.2
contains the number of votes gained or lost by the political party in each class per unit money spent on an
issue, the population of each class, and the majority required by the party to win in each class.

Classes
Issues
Class1 Class2 Class3
A −2 5 3
B 8 2 −5
C 0 0 10
D 10 0 2
Population 100 000 200 000 50 000
Majority 50 000 100 000 25 000

Table 1.2: Return Ratio Data

The aim of the political party is to minimize the total amount of money it needs to invest, yet get the
required majority across each class. Let x1 , x2 , x3 , x4 be the amount of money the party invests in issues A,
B, C, and D respectively. Then, we have the following optimization problem:

min x1 + x2 + x3 + x4
x1 ,x2 ,x3 ,x4

subject to the constraints

−2x1 + 8x2 + 0x3 + 10x4 ≥ 50 000, (1.1)


5x1 + 2x2 + 0x3 + 0x4 ≥ 100 000, (1.2)
3x1 + 5x2 + 10x3 + 2x4 ≥ 25 000, (1.3)
1-4 Lecture 1: The Basics of Optimization

and x1 , x2 , x3 , x4 ≥ 0. Let us look at the optimal solution


2 050 000 ∗ 425 000 ∗ 625 000
x∗1 = , x2 = , x3 = 0, x∗4 = .
111 111 111

The optimal value of x1 + x2 + x3 + x4 = x∗1 + x∗2 + x∗3 + x∗4 = 3 100 000/111.


After multiplying and adding the equations as (1.1) · 25/222 + (1.2) · 46/222 + (1.3) · 14/222, we get

140 3 100 000


x1 + x2 + x3 + x4 ≥ .
222 111
Since x1 + x2 + x3 + x4 = x1 + x2 + 140x3 /222 + x4 ≥ 3 100 000/111, this proves that our given solution is
truly the optimal solution.

1.2.1 Standard Form of Linear Program

Let x ∈ Rn be the vector containing the variables to optimize and c ∈ Rn be the vector of constants:
   
x1 c1
 ..   .. 
x =  . , c =  . .
xn cn

Then, we can write the standard form of linear program as maxx c⊤ x subject to the constraints
 
b1
 .. 
Am×n xn×1 ≤ bm×1 =  .  and x ≥ 0,
bm

where ≥ represents element-wise greater than or equal to:


   
x1 0
 ..   .. 
 .  ≥  .  ⇐⇒ ∀i, xi ≥ 0.
xn 0

The aforementioned problem is commonly referred to as the primal problem and it is accompanied by a
corresponding dual problem:

Primal Problem Dual Problem

max c⊤ x min b⊤ y
x y
s.t. Ax ≤ b, s.t. A⊤ y ≥ c,
x ≥ 0. y ≥ 0.

1.2.1.1 Principles of Duality — Strong & Weak Duality

We will be looking at two famous and ubiquitous Duality Principles – Weak Duality and Strong Duality. We
will be providing a proof for the first in this scribe.
Lecture 1: The Basics of Optimization 1-5

Theorem 1.1 (Weak Duality Principle) Let x and y represent feasible solutions, i.e., solutions that
satisfy all the constraints, for the primal and dual problems, respectively. Then,

b⊤ y ≥ c⊤ x.

Proof. Since x is a feasible solution of the primal problem,

Ax ≤ b,

and hence

x⊤ A⊤ ≤ b⊤

Since y ≥ 0, multiplying it on both sides will not change the inequality:

x⊤ A⊤ y ≤ b⊤ y.

Since y is a feasible solution of the dual problem, A⊤ y ≥ c, and hence

b⊤ y ≥ x⊤ c,

which can also be written as b⊤ y ≥ c⊤ x. Thus, weak duality principle provides a relation between the
solutions of primal and dual problems.

Theorem 1.2 (Strong Duality Principle) It suggests that for any optimal solution x∗ of primal problem
and any optimal solution y ∗ of dual problem, both the optima achieved are equal:

bT y ∗ = cT x∗

You might also like