Optimization Lecture Notes
Optimization Lecture Notes
Resource allocation problems involve optimizing the use of limited resources to achieve a
specific objective. These problems arise across multiple fields, including manufacturing,
transportation, finance, and logistics. Linear Programming (LP) is a mathematical approach
to solving these problems.
Objective: The goal to be maximized or minimized (e.g., maximize profit, minimize cost).
Scenario:
A factory produces two types of products, Product A and Product B. Each product requires
the following resources:
Material: 18 units.
Labor: 16 hours.
1/212
Let x1 represent the number of units of Product A produced, and x2 the number of units of
Product B produced.
Objective Function:
Constraints:
Material:
3x1 + 2x2 ≤ 18
Labor:
2x1 + 4x2 ≤ 16
Non-negativity:
x1 , x2 ≥ 0
The line 3x1 + 2x2 = 18 intersects x1 -axis at (6, 0) and x2 -axis at (0, 9).
The line 2x1 + 4x2 = 16 intersects x1 -axis at (8, 0) and x2 -axis at (0, 4).
Dimensionality: The graphical method is restricted to problems with two variables (or
three variables in rare cases using 3D visualization).
Accuracy: Graphical solutions may lack precision, especially for fractional solutions.
2/212
Scalability: Real-world problems often involve hundreds or thousands of variables,
making this approach impractical.
Optimization involves finding the best solution to a problem under a given set of constraints.
The solution is optimal if it either maximizes or minimizes a specific objective function.
Objective Function: A function representing the goal of the problem (e.g., profit, cost,
distance).
Feasible Region: The set of all solutions that satisfy the constraints.
Optimization problems can be classified into different types based on the nature of the
objective function and constraints. Linear Programming (LP) is one such type where both
the objective function and constraints are linear.
3/212
The constraints are linear equations or inequalities.
Decision variables are typically non-negative (though this may vary in some cases).
Subject to:
x1 , x2 , … , xn ≥ 0
where:
x1 , x2 , … , xn : Decision variables.
Objective Function:
A linear combination of decision variables, e.g., Z = c 1 x1 + c 2 x2 + ⋯ + c n xn .
Constraints:
n
Linear inequalities or equations of the form ∑j=1 aij xj ≤ bi , ∑nj=1 aij xj = bi , or
n
∑j=1 aij xj ≥ bi .
Non-Negativity Condition:
xj ≥ 0 for all j .
4/212
4. Examples of Linear Programs
Subject to:
6. Conclusion
5/212
Lecture 3: Gaussian Elimination with Examples
Ax = b
where:
Backward Substitution: Solve for the unknowns starting from the last equation.
[A∣b]
6/212
3. Add or subtract a multiple of one row from another.
3. Step-by-Step Algorithm
Begin with the first row, and use it to eliminate the leading coefficient in all rows below.
Move to the next row and repeat the process for the next pivot column.
Continue until the matrix is in upper triangular form, where all elements below the main
diagonal are zero.
Substitute this value into the preceding equations to find other variables iteratively.
Solve:
2x1 + x2 = 5,
4x1 − 6x2 = −2.
2 1 5
[ ].
4 −6 −2
Eliminate the first element of the second row using the first row:
Replace Row 2 with Row 2 − 2 × Row 1.
2 1 5
[ ].
0 −8 −12
[ ]
7/212
2 1 5
[ ].
0 1 1.5
2x1 + 1(1.5) = 5
⟹ x1 = 1.75.
Solution: x1
= 1.75, x2 = 1.5.
Solve:
x1 + x2 + x3 = 6,
2x1 + 3x2 + 7x3 = 18,
x1 + 3x2 + x3 = 10.
1 1 1 6
2 3 7 18 .
1 3 1 10
1 1 1 6
0 1 5 6 .
0 2 0 4
Eliminate the second element of Row 3 using Row 2:
Replace Row 3 with Row 3 − 2 × Row 2:
1 1 1 6
0 1 5 6 .
0 0 −10 −8
8/212
Substitute x3= 0.8 into Row 2:
x2 + 5(0.8) = 6 ⟹ x2 = 2.
x1 + 2 + 0.8 = 6 ⟹ x1 = 3.2.
The method is computationally efficient and forms the basis for more advanced
numerical algorithms.
Gaussian elimination provides a systematic approach to solve linear systems, a critical step in
studying the feasible set of a linear program.
The method transforms the augmented matrix [A∣b] into a row-echelon form (or reduced
row-echelon form in some cases) to make solving for x straightforward.
9/212
Step 1: Augmented Matrix Representation
Write the system Ax = b as the augmented matrix [A∣b]:
a21
a22
⋯ a2n
b2
.
⋮ ⋮ ⋮ ⋮
am1 am2 ⋯ amn bm
Select the pivot element akk in the k -th row, where k represents the current step.
Use the pivot row to eliminate all entries below the pivot in the same column by
subtracting appropriate multiples of the pivot row from the rows below.
3. Classification of Outcomes
If A is a square matrix (i.e., m = n) and all pivot elements are nonzero, the system has a
unique solution.
Example:
2 1 5
A=[ ], b = [ ].
1 3 7
10/212
If one or more rows of the reduced matrix become 0 = 0, the system is consistent but
has infinitely many solutions. This occurs when the rank of the matrix A is less than the
number of variables n.
Example:
x1 + x2 + x3 = 6,
2x1 + 2x2 + 2x3 = 12.
Case 3: No Solution
4. Pivoting Strategies
Partial Pivoting:
In partial pivoting, the row with the largest absolute value of the pivot element in the
current column is swapped with the current row.
Complete Pivoting:
In complete pivoting, the largest absolute value in the entire remaining submatrix is
selected as the pivot.
The time complexity of Gaussian elimination is O(n3 ), making it efficient for systems of
small to moderate size. However, for large systems, iterative methods such as Jacobi or
Gauss-Seidel, or matrix factorization methods like LU decomposition, are often preferred.
11/212
6. Summary of Gaussian Elimination
Key Points:
The method handles consistent systems (unique or infinite solutions) and identifies
inconsistent systems (no solution).
Applications:
Gaussian elimination is integral to many areas of applied mathematics, including:
A vector space is a mathematical structure that allows us to generalize the concept of linear
combinations, which is central to understanding the solution sets of linear equations. It
consists of a set of objects called vectors, defined over a field (in this case, the field of real
numbers R), along with two operations: vector addition and scalar multiplication.
Vector spaces are essential in linear algebra as they provide a framework for discussing
linear independence, basis, dimension, and subspaces—all of which are important for
understanding the geometry of solutions to linear equations.
12/212
is called a vector space over the real numbers R if the following axioms are satisfied:
space under:
13/212
Polynomial addition.
4. Non-Examples
1. Set of Integers Zn
Although Zn satisfies closure under addition, it does not satisfy closure under scalar
multiplication since multiplying an integer by a non-integer scalar results in a non-
integer.
5. Visual Representation
In R2 :
The operations of addition and scalar multiplication preserve the structure of the vector
space, i.e., all results remain within the space.
14/212
6. Key Properties of Vector Spaces
2. Analyzing the null space, row space, and column space of matrices:
Vector spaces are foundational to linear algebra and optimization, enabling precise
characterization of feasible regions, constraints, and objective functions in linear
programming.
A linear operator is a function that maps vectors from one vector space to another while
preserving the operations of vector addition and scalar multiplication. Matrices are
representations of linear operators when the vector spaces involved are finite-dimensional.
15/212
Given two vector spaces V and W over the field R, a function T : V → W is a linear
operator if:
For finite-dimensional vector spaces, such operators can be represented using matrices, with
the action of T expressed as matrix-vector multiplication:
T (x) = Ax,
A matrix A is a representation of a linear operator and encodes how the operator transforms
basis vectors of the domain vector space Rn into vectors in the codomain Rm .
Given A as an m × n matrix:
A= ,
⋮ ⋮ ⋮
am1 am2 ⋯ amn
x1
x2
x=
,
⋮
xn
Ax =
.
⋮
am1 x1 + am2 x2 + ⋯ + amn xn
16/212
3. Key Properties of Matrices as Linear Operators
The rank of A indicates the number of dimensions in Rm that are "spanned" by the
transformed vectors.
Ax = 0.
Example:
Let
1 2
A=[ ].
3 6
1 2 x1 0
[ ][ ] = [ ].
3 6 x2 0
−2
This simplifies to x1 + 2x2 = 0, so ker(A) = span ([ ]).
1
17/212
(c) Image of a Matrix
The image (or column space) of a matrix A, denoted im(A), is the span of the columns of A.
It is the set of all vectors in Rm that can be expressed as Ax for some x ∈ Rn :
Example:
1
For the matrix A above, the image is spanned by [ ], as the second column is a scalar
3
1
im(A) = span ([ ]) .
3
4. Rank-Nullity Theorem
rank(A) + nullity(A) = n,
where:
This theorem provides a fundamental relationship between the row space, column space,
and null space of A.
1. Geometry:
Linear operators describe transformations such as rotations, reflections, scalings, and
projections in vector spaces.
18/212
2. Linear Equations:
Solving systems of equations Ax = b involves understanding the rank, kernel, and
image of A.
3. Optimization:
Linear operators are crucial in defining constraints and objective functions in linear
programming.
4. Computer Graphics:
Matrices are used as linear operators for rendering, transforming, and manipulating
graphical objects.
6. Summary
A linear operator maps vectors between vector spaces while preserving linearity.
Matrices serve as representations of linear operators, with rank, kernel, and image as
key properties.
The rank-nullity theorem provides a critical link between the dimensions of the kernel,
image, and the domain.
Linear operators and their properties are foundational to linear algebra, forming the basis
for deeper analysis in optimization, numerical computation, and theoretical mathematics.
1. Introduction
The solution of a system of linear equations is central to linear algebra and optimization.
Given a matrix A ∈ Rm×n and vectors x ∈ Rn and b ∈ Rm , the system of equations can be
expressed in matrix form as:
Ax = b.
1. Homogeneous system: Ax = 0.
2. Non-homogeneous system: Ax = b, where b =
0.
19/212
2. Homogeneous System (Ax = 0)
The homogeneous system always has at least one solution, x = 0 (trivial solution). The
complete solution depends on the rank of A and the dimension of the kernel (ker(A)).
Solution Structure:
If A has full column rank (rank(A) = n), the kernel is trivial (x = 0 only).
If rank(A) < n, there exist infinitely many solutions forming a vector subspace of Rn ,
with dimension equal to the nullity (n − rank(A)).
General Solution:
The general solution is given by:
x = c1 v 1 + c2 v 2 + ⋯ + ck v k ,
1 2 3
A=[ ].
4 5 6
1 2 3 1 2 3
[ ]→[ ].
4 5 6 0 −3 −6
x1 = −2x3 ,
x2 = x3 , x3 free.
−2
ker(A) = span 1
.
1
20/212
1. Consistent system:
If b ∈ im(A), the system has at least one solution.
2. Inconsistent system:
If b ∈
/ im(A), the system has no solution.
General Solution:
When solutions exist, the general solution is the sum of:
Thus,
x = xp + xh .
Example:
Let
1 2 3 14
A=[ ], b=[ ].
4 5 6 32
1 2 3 14 1 2 3 14
[ ]→[ ].
4 5 6 32 0 −3 −6 −24
−2
xp = 8 ,
and the general solution is: x = xp + cv,
21/212
2. Non-homogeneous system: The solution set is a translation of the kernel subspace by
the particular solution xp .
For example:
3. Rank-nullity theorem links the dimensions of the kernel and image to the total number
of variables n.
Understanding these principles is essential for analyzing linear systems, forming the
foundation for linear programming and optimization problems.
2/2
Resource allocation involves distributing limited resources (e.g., time, money, raw materials)
to competing activities to achieve an optimal outcome, typically maximizing profit or
minimizing cost. Such problems are prevalent in industries like manufacturing, logistics,
finance, and healthcare.
22/212
2. General Formulation of Resource Allocation as LP
subject to:
Ax ≤ b, x ≥ 0,
where:
Problem Statement:
A factory produces two types of products, P1 and P2 , using three resources: labor hours,
machine hours, and raw materials. Each unit of P1 and P2 consumes these resources
Data:
Resource constraints:
23/212
Machine hours available: 200 hours.
Resource P1 Usage
P2 Usage
Availability
Formulation as LP:
Define decision variables:
z = 50x1 + 40x2 .
Constraints:
1. Labor hours:
2. Machine hours:
3. Raw materials:
x1 + 2x2 ≤ 120.
4. Non-negativity:
x1 , x2 ≥ 0.
subject to:
24/212
4x1 + 3x2
≤ 240,
2x1 + 2x2
≤ 200,
x1 + 2x2 ≤ 120,
x1 , x2 ≥ 0.
4. Geometric Representation
The feasible region of the LP is a convex polytope in R2 , defined by the intersection of the
half-spaces given by the constraints. The optimal solution lies at a vertex of this polytope.
For example:
Evaluate the objective function z at each vertex of the feasible region to find the
maximum value.
1. Optimality: LP guarantees finding the best solution under given constraints if one exists.
2. Scalability: LP solvers can handle large-scale problems with many variables and
constraints.
25/212
4. Interpretability: Solutions are mathematically rigorous and provide insight into the
trade-offs between competing activities.
7. Summary
Solving such LPs helps industries achieve optimal resource usage efficiently.
LP's geometric representation and solution methods (e.g., graphical, simplex algorithm)
provide systematic approaches for solving real-world problems.
A Boolean function is a function that takes binary inputs (i.e., inputs from {0, 1}n ) and
outputs a value from {0, 1}. These functions are of central importance in areas such as logic
circuits, computational complexity, and optimization.
{0, 1}n . For a Boolean function f : {0, 1}n → {0, 1}, its polynomial representation is a
polynomial Pf (x1 , x2 , … , xn ) over F2 , where Pf can be written as a sum of monomials:
P f (x 1 , x 2 , … , x n ) = ∑ αS ∏ x i .
S⊆[n] i∈S
Here, αS ∈ F2 are coefficients, and each term ∏i∈S xi is a monomial. The polynomial
26/212
P f (x 1 , x 2 ) = x 1 + x 2 .
sense.
Formally, the approximate degree degϵ (f ) is defined as the smallest degree of a polynomial
P for which:
Pr [f (x)
= P (x)] ≤ ϵ,
x∈{0,1}n
where ϵ is a small error probability. In other words, the polynomial P should agree with f on
most inputs (with a probability of at least 1 − ϵ).
In this lecture, we will define a linear program (LP) to find the approximate degree of a
Boolean function, which is a more involved example of linear programming.
To compute the approximate degree of a Boolean function, we need to find the lowest-
degree polynomial that approximates the function within an error tolerance ϵ. This can be
framed as a linear program in the following way:
1. Decision Variables:
We will introduce decision variables for the coefficients of the polynomial. If
f (x1 , x2 , … , xn ) is the Boolean function, we aim to find a polynomial of the form:
P (x1 , x2 , … , xn ) = ∑ αS ∏ xi ,
S⊆[n] i∈S
27/212
where the coefficients αS are our decision variables.
2. Objective Function:
The objective is to minimize the degree of the polynomial, subject to constraints that the
polynomial approximates the Boolean function f within an error of ϵ. Thus, we are
minimizing the maximum degree of the polynomial such that the approximation error is
bounded by ϵ.
3. Constraints:
For each possible input x = (x1 , x2 , … , xn ) ∈ {0, 1}n , we need the polynomial P (x)
∣f (x) − P (x)∣ ≤ ϵ,
4. Linearization of Constraints:
Since f (x) is a Boolean function and P (x) is a polynomial, the constraint ∣f (x) −
P (x)∣ ≤ ϵ is not linear. However, this can be approximated by linear constraints in the
LP formulation by considering a piecewise linear representation of the absolute
difference and bounding it appropriately.
Let f (x1 , x2 )
= x1 ⊕ x2 . We want to approximate this function using a polynomial of
For a more complicated Boolean function, say a function with higher complexity or more
variables, the LP formulation will find the polynomial with the smallest degree that
approximates the function within the given error bounds. The LP solver would minimize the
degree while ensuring the constraints are satisfied.
2. Query Complexity: In decision tree and query complexity, the approximate degree
provides bounds on how efficiently a function can be approximated by a decision tree.
28/212
3. Learning Theory: Approximate degree is used to characterize the complexity of learning
Boolean functions in the framework of learning theory.
7. Summary
In linear programming (LP), problems can often be presented in various forms, with different
types of constraints and objective functions. A critical concept in LP is that many different
formulations of a problem can be equivalent, meaning that they represent the same
optimization problem and yield the same solution, but the structure of the problem may
differ.
It is often useful to convert LPs into a "standard form" because it allows us to streamline
solution methods (e.g., the simplex algorithm) and avoid dealing with multiple cases for
different types of constraints. This standardization makes solving LPs easier and more
systematic.
In this lecture, we will explore several methods for converting LPs into equivalent forms and
discuss why these transformations are important. By doing so, we will ensure that we can
always work with a "canonical" form of LP that is easier to manipulate and solve.
2. Standard Forms of LP
There are several common forms for representing an LP, with the most widely used being:
29/212
Maximize c⊤ x
subject to:
Ax = b, x ≥ 0.
In this form, all the constraints are equalities and all variables are non-negative.
Maximize c⊤ x
subject to:
Ax ≤ b, x ≥ 0.
In this form, the constraints are in the form of inequalities, with non-negative
variables.
Dual Form: The dual of an LP can also be represented in a standard form, which is
derived by associating the primal problem's variables with constraints in the dual
problem. We will revisit this duality in later lectures.
One of the most common types of transformation involves converting inequality constraints
into equality constraints. This is important because the simplex method, for example, works
with equality constraints.
Ax ≤ b.
Ax + s = b, s ≥ 0.
The vector s represents the "slack" or unused portion of the resource represented by the
constraint. This ensures that the equation holds while keeping the variable s non-negative.
Example:
30/212
For the constraint 2x1 + 3x2 ≤ 10, introduce a slack variable s ≥ 0 and rewrite the
constraint as:
Ax ≥ b.
Ax − t = b, t ≥ 0.
The surplus variable t represents the amount by which the left-hand side exceeds the
required bound.
Example:
constraint as:
Another standard transformation is to ensure that all decision variables are non-negative. If
a variable xi is constrained by:
xi ∈ R,
i.e., xi can take any real value, we can rewrite it as two non-negative variables:
xi = x+
−
i − xi , x+ −
i , xi ≥ 0.
Example:
x1 = x+
−
1 − x1 , x+ −
1 , x1 ≥ 0.
31/212
The canonical form described earlier applies to maximization problems. However, we may
encounter minimization problems. In this case, we can convert a minimization problem into
a maximization problem by multiplying the objective function by −1.
Minimize c⊤ x,
Maximize − c⊤ x.
6. Summary of Transformations
Less than or equal to constraints can be converted into equalities by introducing slack
variables.
7. Conclusion
By converting an LP into an equivalent form, we can simplify the problem and apply standard
solution methods like the simplex algorithm more easily. The ability to manipulate the form
of an LP is essential for solving problems efficiently and is a powerful tool in both theory and
practice. By ensuring all constraints are in equality form and all variables are non-negative,
we can work with a "canonical" LP form that standardizes the problem and reduces
complexity.
These transformations not only streamline the computational process but also help in
understanding the relationships between different types of LP formulations, ensuring
consistency in optimization solutions.
32/212
feasible solutions, where the coefficients are non-negative. These combinations, and the
geometric properties of the resulting feasible regions, give rise to the theory of convexity.
In this lecture, we will introduce fundamental concepts related to convexity, starting with
linear combinations, affine combinations, and conic combinations. These ideas will lead to
the more specific notion of convex combinations, which play a central role in optimization
and linear programming.
2. Linear Combinations
A linear combination of vectors is a sum of scalar multiples of those vectors. Given vectors
v1 , v2 , … , vk ∈ Rn , a linear combination of these vectors is expressed as:
k
v = ∑ αi v i ,
i=1
where α1 , α2 , … , αk are scalars, called the coefficients. Importantly, the coefficients can be
1 0
Example: Given vectors v1 = ( ) and v2 = ( ), a linear combination of these vectors
0 1
could be:
1 0 3
v = 3 ( ) + (−2) ( ) = ( ) .
0 1 −2
3. Affine Combinations
An affine combination of vectors is similar to a linear combination but with the additional
constraint that the coefficients sum to 1. Formally, given vectors v1 , v2 , … , vk ∈ Rn , an
affine combination is expressed as:
k
v = ∑ αi v i ,
i=1
k
∑ αi = 1.
i=1
Affine combinations allow us to express points that lie on the affine span of the vectors,
which is the flat affine subspace formed by the given vectors.
( ) ( )
33/212
1 0
Example: For vectors v1 = ( ) and v2 = ( ), an affine combination with α1 + α2 = 1
0 1
could be:
1 0 0.4
v = 0.4 ( ) + 0.6 ( ) = ( ) .
0 1 0.6
This point lies on the line segment connecting v1 and v2 , illustrating that affine
4. Conic Combinations
A conic combination is a special type of linear combination where the coefficients are non-
negative. In other words, given vectors v1 , v2 , … , vk ∈ Rn , a conic combination is
expressed as:
k
v = ∑ αi v i ,
i=1
where αi ≥ 0 for all i. Conic combinations are used to describe points in a cone, a set that is
closed under linear combinations with non-negative coefficients.
1 0
Example: Given the vectors v1 = ( ) and v2 = ( ), a conic combination could be:
0 1
1 0 0.5
v = 0.5 ( ) + 0.3 ( ) = ( ) .
0 1 0.3
Here, both coefficients are non-negative, indicating that the point lies within the cone
formed by the vectors v1 and v2 .
5. Convex Combinations
A convex combination is a particular type of conic combination in which the coefficients are
not only non-negative, but they also sum to 1. Formally, given vectors v1 , v2 , … , vk ∈ Rn ,
a convex combination is given by:
k
v = ∑ αi v i ,
i=1
k
αi ≥ 0
for all i, and ∑ αi = 1.
i=1
34/212
Convex combinations are particularly important in the context of optimization because they
allow us to describe points inside a convex set, which is central to the theory of convex
optimization.
1 0
Example: For the vectors v1 = ( ) and v2 = ( ), a convex combination could be:
0 1
1 0 0.3
v = 0.3 ( ) + 0.7 ( ) = ( ) .
0 1 0.7
This point lies inside the triangle formed by the vectors v1 and v2 , representing a convex
6. Convex Sets
A set C is called convex if, for any two points x, y ∈ C , the entire line segment connecting
them lies entirely within the set. In other words, for all t ∈ [0, 1], the point tx + (1 − t)y
must also lie in C .
1
Example: Consider the set C of points inside a triangle with vertices at v1 = ( ) , v2 =
0
0 0
( ) , v3 = ( ). Any convex combination of these points lies within the triangle, hence the
1 0
8. Summary
35/212
A convex combination is a conic combination where the coefficients also sum to 1.
A convex set is one where the line segment between any two points in the set lies
entirely within the set.
In this lecture, we will explore different types of convex sets, affine sets, and conic sets
through concrete examples, starting with the case in two dimensions. Understanding these
sets is crucial for visualizing how convexity applies to optimization problems in linear
programming. Convex sets are important because they form the feasible regions for
optimization problems, and convex combinations of points in these sets are also contained
within the set.
We will define each of these sets and provide visual examples in 2D. Later, we will generalize
the concepts to higher dimensions, which is often necessary in optimization problems.
2. Convex Sets
A set C ⊂ Rn is called convex if, for any two points x, y ∈ C , the entire line segment
connecting these two points lies entirely within the set. This is mathematically defined as:
This property implies that if you pick any two points in the set, the points between them,
formed by convex combinations, will also belong to the set.
Example in 2D:
Consider the set C defined by a disk in R2 with center (0, 0) and radius 1. This disk is a
convex set because any line segment between two points inside the disk will remain inside
the disk.
2 2 2
36/212
C = {(x, y) ∈ R2 : x2 + y 2 ≤ 1}.
For any two points x = (x1 , y1 ) and y = (x2 , y2 ) inside the disk, and any t ∈ [0, 1], the
Example in 3D:
A sphere in R3 , like the disk in 2D, is a convex set. If C is the set of points inside a sphere of
radius r centered at the origin, i.e.,
C = {(x, y, z) ∈ R3 : x2 + y 2 + z 2 ≤ r2 },
then the set of all points inside the sphere forms a convex set. Any two points inside the
sphere have their connecting line segment entirely inside the sphere.
3. Affine Sets
An affine set is a set that can be expressed as an affine combination of points. More
formally, a set A is affine if for any two points x, y ∈ A and any scalar t ∈ R, the point
tx + (1 − t)y lies in A.
An affine set does not necessarily have to be convex. However, the line segment between two
points in an affine set always lies within the set, as affine combinations of points preserve
the structure of the set.
Example in 2D:
Consider the set of points forming a line in R2 . A line is an affine set because any point on
the line can be expressed as an affine combination of two points on the line. For example,
the set L defined by the equation:
L = {(x, y) : y = 2x + 1}
is affine. The line is not convex (in the context of LPs), but any affine combination of points
on the line will also lie on the line.
Example in 3D:
The set of points forming a plane in R3 is affine. For example, the set P defined by the
equation:
P = {(x, y, z) : 2x + 3y − z = 5}
37/212
is affine. Any affine combination of points on this plane will also lie on the plane.
4. Conic Sets
A conic set is a set that is closed under non-negative scalar multiplication. In other words,
if x ∈ C and α ≥ 0, then αx ∈ C . This set is important because conic combinations, or
sums of non-negative multiples of vectors, describe the geometry of cones.
Example in 2D:
Consider the set C formed by the positive orthant in R2 , which includes all points with non-
negative coordinates. This set can be described as:
C = {(x, y) ∈ R2 : x ≥ 0, y ≥ 0}.
This is a conic set because any point in the positive orthant can be scaled by a non-negative
scalar and still remain in the set.
Example in 3D:
In R3 , the positive octant is the set of all points with non-negative coordinates:
C = {(x, y, z) ∈ R3 : x ≥ 0, y ≥ 0, z ≥ 0}.
This set is conic because any point within the positive octant, when scaled by a non-negative
scalar, will remain within the octant.
These sets can be extended into higher dimensions, although visualization becomes
increasingly difficult beyond two or three dimensions. However, the principles remain the
same:
38/212
Convexity ensures that the entire line segment between any two points within the set
lies inside the set.
Affine sets are formed by affine combinations of points and allow for geometric
representations like lines and planes.
6. Conclusion
Convex sets are central to optimization problems because they ensure that any linear
combination of points inside the set will remain inside the set, allowing efficient
optimization.
Affine sets include linear structures like lines and planes, and they provide important
geometric properties in higher-dimensional spaces.
Conic sets allow scaling by non-negative factors, which is important in many areas of
optimization and linear programming.
Understanding these basic geometric concepts is essential for working with linear programs
and convex optimization problems. They form the foundation for studying more advanced
topics in optimization theory.
The feasible region of a linear programming (LP) problem is the set of all points that satisfy
the system of constraints. These constraints are typically expressed as linear equalities and
inequalities, and the feasible region represents the space of all possible solutions that
satisfy these conditions. The geometric structure of the feasible region is crucial in
understanding the behavior of the linear program and the optimization process.
In this lecture, we will explore the geometric properties of the feasible region, which is
generally described by a set of linear inequalities and equations. This region can be
represented by polygons (in two dimensions) or polytopes (in higher dimensions), and we
will examine their characteristics, including how they are formed, their dimensional
properties, and the significance of their vertices in the context of optimization.
39/212
2. Defining the Feasible Region
Maximize cT x
subject to:
Ax ≤ b,
x ≥ 0,
where:
The feasible region F is the set of all points x ∈ Rn that satisfy the system of constraints:
F = {x ∈ Rn : Ax ≤ b, x ≥ 0}.
This region is defined by a set of linear inequalities and the non-negativity constraints.
The non-negativity constraints xi ≥ 0 represent the portion of the space where all
The feasible region is the intersection of all these half-spaces (and orthants), forming a
region where all constraints are satisfied simultaneously.
40/212
4. Polygons and Polytopes
The feasible region in a linear program often forms a polygon (in two dimensions) or a
polytope (in higher dimensions). A polytope is a geometric object with flat sides, defined as
the convex hull of a finite set of points (vertices), and it can be described by a system of linear
inequalities.
Examples:
x1 + x2 ≤ 4,
x1 ≥ 0,
x2 ≥ 0.
The feasible region is a triangle with vertices at (0, 0), (4, 0), and (0, 4). This is a
polygon, and the intersection of the inequalities forms a bounded region.
x1 + x2 + x3 ≤ 6,
x1 ≥ 0,
x2 ≥ 0,
x3 ≥ 0.
The feasible region is a tetrahedron (a 3D polytope) with vertices at (0, 0, 0), (6, 0, 0),
(0, 6, 0), and (0, 0, 6).
The geometric properties of the feasible region are important in understanding how to solve
linear programs efficiently. These properties include:
Vertices: The points at which the edges or faces of the feasible region meet. In linear
programming, the optimal solution (if one exists) is always found at one of these
vertices.
Edges: In two dimensions, the boundaries of the feasible region are line segments. In
higher dimensions, they become faces and facets, which are generalizations of edges.
Convexity: The feasible region is always convex, meaning that for any two points in the
feasible region, the line segment connecting them lies entirely within the region. This
41/212
property is crucial because linear programming problems can be solved by searching for
the optimal solution along the boundary of the feasible region, which is convex.
In higher dimensions (Rn , where n ≥ 3), the feasible region is a polytope. A polytope is a
geometric object that generalizes the concept of a polygon (in 2D) and a polyhedron (in 3D)
to any number of dimensions. A polytope is characterized by:
The set of linear inequalities that define it, which correspond to the hyperplanes that
define the boundaries of the polytope.
For example, in R3 , the feasible region might be a convex polyhedron, which could have
polygonal faces and edges. In R4 , the feasible region could be a 3-dimensional polytope, but
it is difficult to visualize directly.
Vertices and Optimality: Since the feasible region is convex, if an optimal solution exists,
it will be located at one of the vertices of the polytope. This property allows for efficient
solution algorithms by focusing on the vertices rather than exploring the entire region.
8. Conclusion
42/212
The feasible region of an LP is defined by a system of linear inequalities and is always a
convex set.
In two dimensions, the feasible region is often a polygon, and in higher dimensions, it is
a polytope.
The feasible region is bounded by linear constraints, and its properties (such as
convexity, vertices, edges, and facets) are crucial for solving the optimization problem.
The vertices of the feasible region play a key role in finding the optimal solution to the
linear programming problem.
By understanding the geometric structure of the feasible region, we gain insight into the
nature of linear programming problems and can apply efficient algorithms to solve them.
By a set of constraints: A cone is also the set of points satisfying a system of linear
inequalities that define its boundary and structure.
This theorem shows that both representations are equivalent in the sense that they describe
the same geometric object. Weyl's Theorem is particularly important in the study of convex
geometry and linear programming, as it allows us to switch between different
characterizations of convex sets and simplifies the process of analyzing and solving
optimization problems.
We will now provide a proof of Weyl's Theorem and examine the implications of these
representations for convex cones.
43/212
1. If x ∈ C , then αx ∈ C for all α ≥ 0. This means that the set is closed under non-
negative scalar multiplication.
2. A convex cone is a cone that is also convex. That is, for any two points x, y ∈ C , the
entire line segment joining them lies within the cone. Mathematically, for t ∈ [0, 1], the
point tx + (1 − t)y ∈ C .
A polyhedral cone is a cone that can be described by a finite set of linear inequalities. It is of
particular interest in linear programming.
Ax ≥ 0,
Weyl’s Theorem states that these two representations of a convex cone are equivalent.
Specifically, it asserts:
Theorem (Weyl's Theorem): For a convex cone C ⊂ Rn , the following two statements are
equivalent:
1. C can be described as the set of all non-negative linear combinations of a finite set of
vectors v1 , … , vk , i.e.,
2. There exists a matrix A ∈ Rm×n such that C is the set of solutions to the system of
inequalities Ax ≥ 0.
44/212
This equivalence allows us to move from a description of a cone via its generators
(combinations of vectors) to a description via linear inequalities (constraints), and vice versa.
We will prove the equivalence between the two representations of a convex cone in two
steps.
Assume that C is a convex cone generated by the vectors v1 , … , vk . That is, C = {α1 v1 +
linear inequalities.
Consider the following matrix A ∈ Rk×n , where each row corresponds to one of the
generating vectors:
v1T
v2T
A= .
⋮
vkT
x = α1 v 1 + ⋯ + αk v k ,
αi ≥ 0.
v1T x
v2T x
Ax = ≥ 0.
⋮
vkT x
Thus, x satisfies the system of linear inequalities Ax ≥ 0, which represents the cone C .
Step 2: Constraint Representation Implies Combinatorial Representation
45/212
The solution set of the inequalities Ax ≥ 0 defines a convex polyhedral cone. By the
fundamental theorem of convex polyhedra, the cone is the convex hull of its extreme rays.
Each extreme ray can be described as a non-negative linear combination of the rows of A,
and these rows (or their scalar multiples) form the generators of the cone.
C = {α1 v1 + ⋯ + αk vk : αi ≥ 0},
where v1 , … , vk are the generating vectors corresponding to the extreme rays of the
polyhedral cone.
5. Conclusion
In the study of optimization, after understanding the structure of the feasible region of a
linear program, we shift our focus to the objective function. In linear programming, the
objective function is typically a linear function, which is both convex and concave. However,
in more general optimization problems, the objective function can be non-linear. To handle
these cases, we need to introduce the concept of convex functions, which play a central role
in optimization theory.
Convex functions are widely studied in optimization due to their desirable properties that
allow for efficient solution methods. The key property of convex functions is that local
minima are also global minima, which is essential for ensuring the optimality of solutions.
This property makes convex optimization problems more tractable and ensures that
algorithms like gradient descent and interior-point methods converge to the global optimum
under certain conditions.
46/212
In this lecture, we will formally define convex functions, explore their properties, and discuss
their significance in optimization.
A function f : Rn → R is convex if, for any two points x, y ∈ Rn and for any t ∈ [0, 1], the
following inequality holds:
This condition means that the function value at any point on the line segment joining x and
y is less than or equal to the weighted average of the function values at x and y.
Geometrically, this means that the graph of a convex function lies below the straight line
joining any two points on the graph, i.e., the function is "bent" upwards or "concave up."
A convex set is a set of points such that for any two points in the set, the entire line
segment joining them lies entirely within the set.
A convex function is a function that satisfies the convexity condition defined above,
which ensures that the function has a global minimum (if the domain is convex) and that
any local minimum is a global minimum.
3. Convexity in Optimization
Convex functions are important in optimization because they guarantee that any local
minimum is also the global minimum. This is a crucial property for optimization algorithms,
as it ensures that methods such as gradient descent, Newton’s method, and other iterative
techniques will converge to the optimal solution, provided the function is convex and the
problem is properly formulated.
The objective function in a linear program is of the form cT x, which is a linear function.
Since every linear function is both convex and concave, linear programming problems
47/212
are a special case of convex optimization problems.
However, many practical optimization problems involve nonlinear convex functions, and
understanding the properties of convex functions is necessary for solving these problems.
The analysis of convexity allows us to extend optimization techniques to more general
problem types.
Linear Functions: Any linear function of the form f (x) = cT x is convex (and also
concave). This follows directly from the definition of convexity because a linear function
satisfies the convexity inequality with equality.
Logarithmic Functions: The function f (x) = log(x) is convex for x > 0, as its second
derivative is positive for all x > 0.
Norm Functions: The p-norm f (x) = ∥x∥p is convex for p ≥ 1.
This inequality means that the tangent line at any point x lies below the graph of the
function. It is a first-order condition for convexity and is often used in optimization
algorithms.
48/212
5.2 Second-Order Condition for Convexity
zT H(f )(x)z ≥ 0 ∀z ∈ Rn .
Jensen’s inequality is a fundamental result that stems from the definition of convexity. It
states that if f is a convex function and X is a random variable, then:
This inequality is widely used in various fields, including economics, statistics, and machine
learning.
Convex functions play a central role in optimization because of the following reasons:
Global Minima: A convex function on a convex domain has a unique global minimum (if
it has one), and any local minimum is also the global minimum.
7. Conclusion
In this lecture, we introduced the concept of convex functions, which are a fundamental
class of functions in optimization. A convex function satisfies a specific inequality that
49/212
ensures any local minimum is also a global minimum. We explored several examples of
convex functions, including linear, quadratic, and exponential functions, and discussed the
key properties that characterize convex functions, such as first- and second-order conditions.
Convexity plays a critical role in optimization, enabling the use of powerful algorithms that
guarantee global optimality under certain conditions.
Convex functions possess several remarkable properties that make them especially suitable
for optimization problems. These properties ensure that optimization problems involving
convex functions are well-behaved and can be solved efficiently. In this lecture, we explore
these properties in depth, with particular attention to how they relate to linear programming
(LP). Since linear functions are both convex and concave, many important results about the
objective function in LP can be derived from these properties. Additionally, we will discuss
how these properties provide the theoretical foundation for the simplex algorithm, one of
the most widely used methods for solving linear programming problems.
f (x∗ ) ≤ f (x) ∀x ∈ Rn .
This property implies that convex functions do not have multiple local minima, which
simplifies the optimization process. For linear programming, where the objective function is
linear (and thus convex), the goal is to find the optimal solution within a feasible region that
is also convex.
Convex functions are typically continuous and, under certain conditions, differentiable. The
continuity of convex functions ensures that there are no discontinuities or jumps in the
50/212
function, which is crucial for optimization. If the convex function is also differentiable, we can
apply calculus-based optimization methods like gradient descent to find the optimal
solution.
In linear programming, the objective function is linear and hence both continuous and
differentiable. This makes LP problems well-behaved and suitable for solution by gradient-
based methods or more specialized methods like the simplex algorithm.
Affine functions, which are of the form f (x) = cT x + b, are both convex and concave. This
duality arises because affine functions are linear transformations and do not exhibit any
curvature. The convexity of the objective function in linear programming follows from the
fact that it is affine. Therefore, when we optimize a linear function subject to linear
constraints, we are essentially working with a convex optimization problem.
In the context of the simplex algorithm, which is an iterative method for solving LP problems,
the linear objective function ensures that the algorithm progresses towards the optimal
solution through successive vertices of the feasible polytope.
Jensen’s inequality is a direct consequence of the definition of convexity and provides a way
to understand the behavior of convex functions when applied to random variables or
expectations. For a convex function f and a random variable X , Jensen’s inequality states:
where E[X] is the expected value of X . This inequality is important in many optimization
contexts, particularly in stochastic optimization problems, where the objective function
involves an expectation over random variables.
In linear programming, Jensen’s inequality does not have direct applicability, but it provides
insight into the behavior of convex functions in more complex optimization problems where
uncertainty or randomness is involved.
For a differentiable function f: Rn → R, the first-order condition for convexity states that
the following inequality must hold for any two points x, y ∈ Rn :
where ∇f (x) denotes the gradient of f at x. This inequality means that the function at any
point lies above the tangent hyperplane to the function at that point. It is an essential tool in
51/212
optimization, particularly for methods like gradient descent, which use the gradient to
iteratively approach the minimum of the function.
zT H(f )(x)z ≥ 0 ∀z ∈ Rn .
This condition ensures that the function is "curved upwards" and does not exhibit any local
maxima. If the Hessian is positive definite, the function is strictly convex, which means it has
a unique global minimum. For linear functions, the Hessian is zero, indicating that the
function is affine (neither strictly convex nor concave).
As previously noted, any linear function f (x) = cT x + b is both convex and concave. In the
case of linear programming, the objective function is linear, and the feasible region (defined
by linear constraints) is also convex. The linearity of the objective function ensures that the
optimization problem is convex, and the Simplex algorithm can be used to efficiently find the
optimal solution by navigating the vertices of the feasible polytope.
The exponential function f (x) = ex is convex because its second derivative is always
positive, indicating that the function is "curved upwards" at every point.
The logarithmic function f (x) = log(x), for x > 0, is also convex, and it is used in
various optimization contexts, particularly in convex regression and maximum
likelihood estimation.
52/212
4. Connection to the Simplex Algorithm
The properties of convex functions, particularly the fact that linear functions are convex, are
integral to understanding why the simplex algorithm is effective for solving linear
programming problems. The simplex algorithm works by iterating over the vertices of the
feasible polytope defined by the linear constraints. Because the objective function in linear
programming is linear (and thus convex), the algorithm moves from one vertex to another,
always improving the objective value until it reaches the optimal vertex.
Convexity of the objective function guarantees that the simplex algorithm will not cycle
indefinitely and will always reach the optimal vertex.
Linear constraints define a convex feasible region, and the algorithm can efficiently
traverse the vertices of this region.
Despite its name, the simplex algorithm does not always follow a simplex shape in practice,
but its ability to find the optimal solution in a finite number of steps, under the assumption
of no degeneracy, is largely due to the convexity properties of the linear objective function.
5. Conclusion
In this lecture, we examined the key properties of convex functions, which play a central role
in optimization. We discussed the fundamental characteristics of convex functions, including
the relationship between local and global minima, as well as important conditions for
convexity, such as the first- and second-order conditions. These properties are essential for
understanding the theoretical foundation of linear programming and optimization
algorithms like the simplex algorithm, which is built on the convexity of linear functions and
the geometry of convex polytopes. By leveraging these properties, we can effectively solve
optimization problems with convex objective functions.
In linear programming, the optimal solution lies at one of the vertices of the feasible region.
This fact is central to methods like the simplex algorithm, which iterates over the vertices of
53/212
the feasible polytope to find the optimal solution. However, before we can apply such
algorithms, we need to understand the structure of the feasible region and how we can
define its vertices.
In this lecture, we will formally introduce the concept of Basic Feasible Solutions (BFS),
which correspond to the vertices of the feasible region of a linear program. We will also
discuss how these solutions arise from the system of linear equations that define the
constraints of the linear program.
where A is the matrix of coefficients, b is the vector of constants, and x is the vector of
decision variables. The feasible region is typically a convex polytope, and its vertices
represent potential optimal solutions to the linear program.
A Basic Feasible Solution (BFS) is defined as a solution that corresponds to a vertex of the
feasible region. These vertices are the solutions to the system of linear equations formed by
selecting a subset of the constraints, which are active at that point (i.e., the constraints are
either equalities or strict inequalities).
54/212
2. Solving the System: Solve the system of m active constraints. If the solution exists and is
feasible, this solution is a BFS.
Ax∗ = b, x∗ ≥ 0.
Geometrically, a BFS corresponds to a vertex of the feasible region. Each vertex of a convex
polytope is determined by the intersection of a set of m linearly independent hyperplanes
(the active constraints). Since the feasible region is convex, the optimal solution to a linear
program will always lie at one of these vertices.
In R2 , the feasible region is a polygon, and the BFS corresponds to the vertices of this
polygon.
In R3 , the feasible region is a polyhedron, and the BFS corresponds to the vertices of this
polyhedron.
For example, if the feasible region is a polytope in Rn with m constraints, each BFS
corresponds to an intersection point of the hyperplanes defined by these m constraints.
maximize cT x, subject to Ax ≤ b, x ≥ 0,
the BFS corresponds to a solution to the system of equations formed by the active
constraints. Let us assume that we have selected m constraints that are active at a solution
x∗ . These constraints can be written as:
55/212
Aactive x∗ = bactive ,
where Aactive is the matrix of coefficients corresponding to the selected active constraints,
To ensure that this solution is feasible, we must also check that x∗ ≥ 0, as the solution must
lie within the non-negative orthant.
Consider a simple linear program with two decision variables x1 and x2 , and two linear
constraints:
x1 + x2 ≤ 5,
x1 ≥ 0,
x2 ≥ 0.
The feasible region is a triangle in R2 , bounded by the axes and the line x1 + x2 = 5 .
1. At (x1 , x2 )
= (0, 0), the active constraints are x1 ≥ 0 and x2 ≥ 0.
2. At (x1 , x2 )
= (5, 0), the active constraints are x1 + x2 = 5 and x1 ≥ 0.
3. At (x1 , x2 )
= (0, 5), the active constraints are x1 + x2 = 5 and x2 ≥ 0.
Each of these vertices corresponds to a Basic Feasible Solution, as they satisfy the active
constraints and are feasible.
In general, a linear program may have multiple BFSs. However, the simplicity of the linear
program means that the set of BFSs is finite, and at least one of them is optimal. The simplex
algorithm leverages this fact by systematically exploring the BFSs to find the one that
maximizes or minimizes the objective function.
While a BFS corresponds to a vertex of the feasible region, multiple BFSs can exist at
different vertices of the polytope. The key point is that each BFS is a candidate for the
56/212
optimal solution, and the simplex algorithm moves between these vertices to find the
optimal solution.
8. Conclusion
In this lecture, we defined Basic Feasible Solutions and explored their role in linear
programming. These solutions correspond to the vertices of the feasible region, and they are
fundamental to the optimization process in linear programs. By understanding the structure
of the feasible region and how BFSs are formed, we can better understand optimization
algorithms like the simplex method, which relies on exploring these BFSs to find the optimal
solution. Basic feasible solutions serve as the cornerstone for solving linear programs and
form the basis of many computational techniques in optimization.
In the previous lecture, we introduced Basic Feasible Solutions (BFSs) and established that
they correspond to vertices of the feasible region of a linear program. This is a crucial
concept because many optimization algorithms, particularly the simplex method, explore
these vertices in search of the optimal solution.
In this lecture, we will investigate the relationship between Basic Feasible Solutions (BFSs)
and the vertices of the feasible region. Specifically, we will discuss how every vertex
corresponds to a BFS, and conversely, how each BFS defines a vertex. Additionally, we will
explore the possibility that different BFSs may correspond to the same vertex.
maximize cT x, subject to Ax ≤ b, x ≥ 0.
The feasible region of this linear program is the set of all points x ∈ Rn that satisfy the
constraints. This region is a convex polytope (or a polyhedron in higher dimensions).
57/212
Vertices of the feasible region are points where several hyperplanes (the boundaries of the
constraints) intersect. Each of these intersection points corresponds to a Basic Feasible
Solution (BFS). A BFS is defined as a solution to the system of active constraints, where a
subset of constraints are satisfied as equalities. These active constraints uniquely determine
a point in the feasible region.
3. BFS as Vertices
From the previous lecture, we know that a Basic Feasible Solution corresponds to a solution
to a system of active constraints, which are linearly independent. Each set of m active
constraints, where m is the number of variables, defines a unique point in the feasible
region, which is a vertex.
Conversely, every vertex of the feasible region is associated with a BFS. The geometrical
interpretation is that the feasible region can be viewed as a convex polytope in Rn , where
each vertex is formed by the intersection of m linearly independent constraints. These
constraints give rise to the BFS corresponding to that vertex.
This equivalence is foundational in linear programming because the search for the optimal
solution can be reduced to examining the vertices of the feasible region.
While each vertex corresponds to a BFS, it is important to note that different BFSs can
correspond to the same vertex. This can occur when different subsets of constraints define
the same point in the feasible region. In other words, while a vertex is defined by a unique
intersection of m linearly independent constraints, different combinations of active
constraints may describe the same vertex.
58/212
A vertex of the feasible region could be formed by the intersection of three active
constraints: x1 = 2, x2 = 3, and x3 = 1.
Another BFS might involve the intersection of a different set of three constraints, say
x1 + x2 = 5, x2 = 3, and x3 = 1, which still leads to the same point (x1 , x2 , x3 ) =
(2, 3, 1).
In this case, both sets of constraints describe the same vertex, but they correspond to
different BFSs. This phenomenon occurs because there are often multiple ways to select the
active constraints that uniquely define a given vertex.
5. Geometrical Insight
To illustrate this equivalence further, let’s consider a simple linear program in R2 with two
decision variables x1 and x2 , and two linear constraints:
x1 + x2 ≤ 5,
x1 ≥ 0,
x2 ≥ 0.
The feasible region is a triangle, and the three vertices of this triangle correspond to three
BFSs. These BFSs correspond to:
59/212
A vertex is the intersection point of a set of m linearly independent hyperplanes
(constraints) that uniquely determine the point in the feasible region.
A BFS is a solution to the system of active constraints at a given point, where these
constraints are linearly independent.
Each BFS corresponds to a unique point (vertex) in the feasible region, which is the
solution to a system of m active, linearly independent constraints.
Different BFSs can correspond to the same vertex if different sets of active constraints
describe the same point.
Aactive x = bactive ,
x ≥ 0,
where Aactive represents the matrix of the active constraints, and x is the vector of decision
7. Conclusion
In this lecture, we have explored the important relationship between Basic Feasible
Solutions (BFSs) and the vertices of the feasible region in a linear program. We established
that:
Additionally, we discussed the fact that multiple BFSs can correspond to the same vertex,
depending on the choice of active constraints. This understanding is fundamental for
optimization algorithms like the simplex method, which explores the vertices of the feasible
region to find the optimal solution. Understanding the equivalence between BFSs and
vertices allows for efficient traversal of the feasible region during optimization.
60/212
In this lecture, we will formally introduce the Simplex Algorithm, a widely used algorithm for
solving linear programming problems. The algorithm leverages the fact that the optimal
solution to a linear program lies at a vertex of the feasible region. Since every Basic Feasible
Solution (BFS) corresponds to a vertex of the feasible region, the simplex method iteratively
moves from one BFS to another, improving the objective function at each step until the
optimal solution is found.
We will first review the key ideas behind the Simplex algorithm and then walk through an
example to demonstrate its operation.
Feasible Region and Vertices: The feasible region of a linear program is a convex
polytope, and the optimal solution is found at one of its vertices. Each vertex
corresponds to a BFS, and the Simplex algorithm explores these vertices.
Objective Function: The goal is to optimize the objective function (either maximizing or
minimizing), which is typically a linear function. The Simplex method moves along the
edges of the polytope, from one vertex to another, improving the objective function at
each step.
Pivoting: At each iteration, the Simplex algorithm selects an adjacent BFS (i.e., an
adjacent vertex of the polytope) to move to. This selection is done based on the
coefficients of the objective function, and the algorithm uses a pivot operation to move
between vertices.
Termination: The algorithm terminates when the current BFS cannot be improved
further, meaning that the optimal solution has been found. This occurs when the
objective function cannot be increased (or decreased, in the case of minimization) by
moving to an adjacent vertex.
61/212
1. Initial Basic Feasible Solution (BFS): Start with an initial BFS. If the problem is in
standard form (i.e., all constraints are inequalities, and all variables are non-negative),
this step is straightforward. Otherwise, artificial variables may be introduced to obtain a
feasible starting point (e.g., using the Big-M method or two-phase method).
2. Pivoting:
Select the entering variable: Determine which non-basic variable (a variable not
currently in the BFS) should enter the basis (i.e., which variable should increase to
improve the objective function). This is typically done by examining the coefficients
of the objective function in the tableau.
Select the leaving variable: Once the entering variable is chosen, determine which
basic variable (currently part of the BFS) should leave the basis. This is done by
checking the constraints and finding the variable that will become zero first when
the entering variable increases.
Perform a pivot operation: The pivot operation updates the tableau by replacing
the old basis with the new one. This involves solving a system of linear equations to
update the values of the variables and objective function.
3. Repeat the Process: After performing the pivot, update the BFS and repeat the process
until no further improvements to the objective function can be made.
4. Optimality Check: The algorithm checks the optimality condition: if all the coefficients of
the objective function corresponding to non-basic variables are non-negative (in the case
of maximization), the current BFS is optimal. If not, the algorithm continues to pivot to a
better BFS.
subject to x1 + x2 ≤ 4
2x1 + x2 ≤ 5
x1 ≥ 0,
x2 ≥ 0
62/212
To apply the Simplex algorithm, we first convert this problem into standard form by
introducing slack variables s1 and s2 for the constraints. This gives:
subject to x1 + x2 + s 1 = 4
2x1 + x2 + s2 = 5
x1 ≥ 0,
x2 ≥ 0,
s1 ≥ 0,
s2 ≥ 0
Now, we set up the initial tableau for the Simplex method. The initial tableau includes the
coefficients of the objective function and the constraints.
Z −3 −2 0 0 0
4.1 Iteration 1:
Identify the entering variable: We look at the coefficients in the objective function row.
The most negative coefficient is −3 (corresponding to x1 ), so x1 will enter the basis.
Identify the leaving variable: We perform the ratio test on the RHS values divided by
the corresponding coefficients of the entering variable:
4 5
= 4, = 2.5
1 2
Perform the pivot: The pivot element is 2 (the coefficient of x1 in the s2 -row). We now
4.2 Iteration 2:
Identify the entering variable: The most negative coefficient is −0.5 (corresponding to
x2 ), so x2 will enter the basis.
63/212
2 2.5
= 4, =5
0.5 0.5
Perform the pivot: The pivot element is 0.5 (the coefficient of x2 in the s1 -row). After
Z 0 0 1 1 10
At this point, all coefficients in the objective function row are non-negative, so the algorithm
terminates. The optimal solution is:
x1 = 1, x2 = 4,
Z = 10.
5. Conclusion
In this lecture, we introduced the Simplex Algorithm, which is used to solve linear
programming problems by moving from one BFS to another, improving the objective
function at each step. Through an example, we demonstrated the mechanics of the
algorithm, including how to form the initial tableau, perform the pivot operations, and
identify the entering and leaving variables. The algorithm terminates when no further
improvement in the objective function is possible, and the optimal solution is reached. The
Simplex algorithm is efficient and widely used in practice to solve large-scale linear
programming problems.
In this lecture, we will formalize the Simplex Algorithm and describe it step-by-step,
highlighting the mathematical formulations and operations that occur at each iteration.
Building on the informal example presented earlier, we will now provide the formal
mechanics of the algorithm using the current Basic Feasible Solution (BFS) and the
corresponding coefficients.
The goal is to provide a detailed understanding of each phase of the algorithm, focusing on
how the BFS evolves and how we update the solution at each step. The simplex algorithm
64/212
progresses through a series of pivot operations, where each operation involves selecting an
entering variable (which is not in the BFS) and a leaving variable (which is part of the BFS),
and updating the tableau accordingly.
The Simplex method operates using a tableau, which is an organized matrix that represents
the current solution, including the coefficients of the objective function and the constraints.
The structure of the Simplex tableau is as follows:
⋮ ⋮ ⋮ ⋱ ⋮ ⋮
Constraint m am1 am2 … amn bm
Where:
Basic Variables: These are the variables that are currently in the BFS.
x₁, x₂, ..., xn: These are the decision variables (non-basic variables).
RHS: Right-hand side values (the constants on the right of the equations).
The Simplex algorithm proceeds through the following steps at each iteration:
3.1 Initialization
We begin with an initial BFS corresponding to a feasible solution. The initial tableau is
constructed based on the constraints and the objective function in standard form.
If the problem is not in standard form (e.g., if there are artificial variables or the problem
is not feasible), methods such as the Big-M method or Two-Phase method can be used
65/212
to obtain a feasible starting point.
At each iteration, we identify the entering variable. The entering variable is selected by
examining the coefficients in the last row of the tableau (corresponding to the objective
function). We pick the variable that will most improve the objective function (i.e., the most
negative coefficient in the case of maximization).
For a maximization problem, the entering variable is the one with the most negative
coefficient in the objective row.
For a minimization problem, the entering variable is the one with the most positive
coefficient.
This variable corresponds to a non-basic variable that will be increased from 0 to a positive
value, potentially improving the objective.
Once the entering variable is selected, we need to determine the leaving variable. The
leaving variable is the basic variable that will be replaced by the entering variable in the BFS.
The leaving variable is selected using the minimum ratio test.
The ratio test involves dividing the RHS values by the corresponding coefficients of the
entering variable in each constraint row. The leaving variable is the one whose ratio is the
smallest and positive, ensuring that the new solution remains feasible.
RHSi
Ratio =
The leaving variable is the one with the smallest positive ratio. If all the ratios are negative or
infinite, then the current solution is optimal.
3.4 Pivoting
The pivot operation is performed to update the tableau after selecting the entering and
leaving variables. This operation updates the tableau such that the entering variable
becomes part of the BFS, and the leaving variable is removed.
1. Identify the pivot element, which is the coefficient of the entering variable in the row of
the leaving variable.
66/212
2. Normalize the pivot row by dividing it by the pivot element.
3. Perform row operations to update the other rows, ensuring that all other coefficients in
the entering variable's column are zero.
After the pivot, the tableau is updated to reflect the new BFS. The coefficients of the objective
function row and the constraint rows are modified to incorporate the new values of the
variables.
4. Optimality Check
Once the tableau is updated, we check for optimality. The current solution is optimal if all
the coefficients in the objective function row (except for the RHS column) are non-negative in
the case of maximization. If this condition is met, the algorithm terminates, and the current
BFS is the optimal solution.
If there are still negative coefficients in the objective row, the algorithm proceeds to the next
iteration by selecting a new entering variable and repeating the process.
subject to x1 + x2 ≤ 4
2x1 + x2 ≤ 5
x1 ≥ 0,
x2 ≥ 0
The problem is converted into standard form by introducing slack variables s1 and s2 :
subject to x1 + x2 + s 1 = 4
2x1 + x2 + s2 = 5
67/212
x1 ≥ 0,
x2 ≥ 0, s1 ≥ 0, s2 ≥ 0
Basic Variables x1 x2 s 1
s2
RHS
s1 1 1 1 0 4
s2 2 1 0 1 5
Z −3 −2 0 0 0
Step 3: Pivoting
The pivot element is 2 (the coefficient of x1 in the s2 -row). After performing the pivot, we
Step 4: Repeat
Since the most negative coefficient in the objective row is −0.5, x2 will enter the basis,
and s1 will leave. The pivoting steps will continue until all coefficients in the objective
6. Conclusion
In this lecture, we formalized the Simplex Algorithm by breaking it down into precise
mathematical steps. The key steps involved in the algorithm include selecting the entering
and leaving variables, performing the pivot operation, and updating the tableau until the
68/212
optimal solution is reached. The algorithm terminates when no further improvement in the
objective function is possible, and the optimal BFS is found. The formal approach ensures
that the Simplex algorithm is systematic and effective for solving linear programming
problems.
In the Simplex algorithm, we need a Basic Feasible Solution (BFS) to begin the optimization
process. However, in many cases, it is not immediately obvious what the starting BFS is,
especially if the linear program is not already in a form that readily gives a feasible solution.
In this lecture, we will discuss a clever method to find the starting BFS by transforming the
original linear program into a new linear program with a different objective function. This
transformation ensures that the new linear program is easy to solve, providing us with an
initial BFS from which we can begin applying the Simplex method.
Recall that a linear program is typically given in the following standard form:
maximize cT x
subject to Ax = b
x≥0
Where:
In this form, the feasible region is defined by the system of linear equations Ax = b, subject
to the non-negativity constraints x ≥ 0.
A key challenge in applying the Simplex algorithm is finding an initial BFS that satisfies these
constraints. If the linear program is not in a form that trivially provides a feasible solution (for
example, if the system has artificial variables or if it is not feasible), the starting BFS is not
obvious.
69/212
To overcome the issue of finding an initial feasible solution, we use a technique called the
Two-Phase Method. This method converts the original problem into a new problem with a
different objective function, making it easier to identify a starting BFS.
The idea behind the Two-Phase method is to introduce artificial variables that ensure a
feasible starting solution. Artificial variables are introduced in such a way that the initial
solution can be constructed easily and is guaranteed to be feasible.
Ax + Ia = b
where I is the identity matrix and a is the vector of artificial variables. This system
ensures that an initial BFS is feasible because the artificial variables can be set to satisfy
the equality constraints.
Once the artificial variables are introduced, the new linear program has the following
structure:
m
maximize ∑ ai
i=1
subject to Ax + Ia = b
x ≥ 0, a≥0
In this transformed problem, the goal is to minimize the sum of the artificial variables. The
intuition here is that if the optimal value of the objective function is 0, all the artificial
variables are 0, and the original problem has a feasible solution. If the optimal value is
greater than 0, the original problem is infeasible.
The Phase 1 linear program can be solved using the Simplex method. The solution to this
phase will either:
Achieve an optimal objective value of 0: In this case, all artificial variables are 0, and the
original system is feasible. We can then proceed to Phase 2.
70/212
Achieve a positive optimal objective value: In this case, the original system is
infeasible, and no BFS exists.
Once Phase 1 is completed, we have a feasible solution (if the objective is 0). Now, we return
to the original problem and apply the Simplex method to optimize the original objective
function.
The BFS found at the end of Phase 1 (where the artificial variables are 0) serves as the
starting BFS for the original problem in Phase 2. Since we have already established that this
solution satisfies the constraints of the original problem, we can proceed with the Simplex
method from this point.
The objective function for Phase 2 is simply the original objective function:
maximize cT x
maximize Z = x1 + x2
subject to x1 + 2x2 ≥ 4
x1 + x2 ≤ 5
x1 ≥ 0,
x2 ≥ 0
71/212
maximize Z = x1 + x2
subject to x1 + 2x2 − s1 = 4
x1 + x2 + s 2 = 5
x1 , x2 , s 1 , s 2 ≥ 0
To ensure feasibility, we introduce artificial variables a1 and a2 for the two constraints:
maximize a1 + a2
subject to x1 + 2x2 − s1 + a1 = 4
x1 + x2 + s 2 + a 2 = 5
x1 , x2 , s 1 , s 2 , a 1 , a 2 ≥ 0
optimal value of 0, we proceed to Phase 2, where we drop the artificial variables and
optimize the original objective function Z = x1 + x2 .
6. Conclusion
In this lecture, we discussed how to find the starting BFS for the Simplex algorithm using the
Two-Phase Method. This method transforms the original linear program into a new linear
program with a different objective function. By introducing artificial variables and minimizing
their sum in Phase 1, we can ensure a feasible starting solution. Once Phase 1 completes
with an optimal objective value of 0, we move on to Phase 2 and apply the Simplex algorithm
to the original problem, starting from the BFS found in Phase 1. This approach guarantees
that we always have a valid starting point for the Simplex method.
In the Simplex algorithm, we traverse from one Basic Feasible Solution (BFS) to another by
pivoting through the feasible region. The primary goal is to improve the objective function at
each step. However, in some cases, the algorithm might enter a cycle of pivots, where the
objective value does not increase. This phenomenon is called degeneracy.
72/212
Degeneracy occurs when a BFS corresponds to a solution where more than n variables
(where n is the number of decision variables) are non-zero in the optimal solution, resulting
in a situation where the algorithm revisits the same BFS without improving the objective
function. This can lead to cycling, where the algorithm does not make progress and could
theoretically never terminate.
In this lecture, we will define degeneracy, explore its consequences, and discuss techniques
to avoid it during the execution of the Simplex algorithm.
2. What is Degeneracy?
A linear program is said to be degenerate at a given BFS if there are more than n linearly
independent constraints active at the BFS, or equivalently, if more than n basic variables are
non-zero. This can result in multiple ways to enter a new BFS without improving the objective
function, causing the algorithm to "cycle" and revisit the same BFS without progress.
Degeneracy arises because, at a BFS, we may have multiple constraints that are "active" (i.e.,
satisfied as equalities), but not all of them are necessary to describe the vertex. This excess
of active constraints creates a situation where pivoting between different BFSs results in no
improvement in the objective function value, causing a cycle.
Geometrically, degeneracy occurs when the feasible region has a vertex where more
than n constraints are active. This excess of constraints does not change the point of
intersection, even if the BFS changes.
4. Example of Degeneracy
maximize Z = x1 + x2
subject to x1 + x2 = 2
x1 = 1
x2 = 1
x1 ≥ 0,
x2 ≥ 0
all three constraints are active at this point. Despite having more than two active constraints,
the feasible region only defines a single point. The Simplex algorithm could pivot between
73/212
different bases (such as choosing x1 and x2 to be basic variables) without changing the
5. Consequences of Degeneracy
Cycle indefinitely: The algorithm might revisit the same BFS without making any
progress towards improving the objective function, resulting in an infinite loop.
Wasting computational resources: Even though the objective value does not improve,
the algorithm continues to perform unnecessary computations, potentially making the
process inefficient.
Bland's Lemma provides a simple and effective way to avoid degeneracy and prevent cycling
in the Simplex algorithm. Bland’s Lemma states that the Simplex algorithm will not cycle if, at
each pivot step, we select the entering and leaving variables in a lexicographically consistent
manner.
Lexicographical Rule: When selecting a variable to enter the basis, choose the variable
with the smallest index (i.e., the first index that results in a positive coefficient in the
objective function row). Similarly, when selecting a variable to leave the basis, choose the
one with the smallest index among the candidates for leaving.
This rule ensures that each pivot step strictly increases the objective function or leads to a
new BFS that has not been visited before. By following this rule, the Simplex algorithm is
guaranteed to terminate in a finite number of steps.
Bland’s Lemma guarantees that if there is a degenerate cycle, the algorithm will eventually
move to a different BFS. Specifically, it ensures that:
1. Pivot Selection: At each step, the entering variable is chosen with the smallest index
among all variables that can increase the objective function. Similarly, the leaving
variable is chosen with the smallest index among those that satisfy the ratio test.
2. Termination: With this approach, the algorithm cannot cycle because every pivot step
selects a "new" direction, and the algorithm moves forward in a systematic, non-
repetitive manner.
74/212
Consider the following Simplex tableau at some point in the algorithm:
Basic x1 x2
x3 x4 RHS
x1
1 1 0 0 5
x2 0 1 1 0 2
z −1 0 0 0 0
At this step, if the Simplex method chooses x1 to leave the basis, we must select the entering
variable. Suppose x2 has the largest positive coefficient. Using Bland's Lemma, if we have
any ties in coefficients, we select the variable with the smallest index. This ensures that the
algorithm does not get stuck in a cycle.
9. Conclusion
Degeneracy is an issue in the Simplex algorithm that can lead to cycles, causing the
algorithm to revisit the same BFS without improving the objective function. However, by
following the lexicographical rule outlined in Bland’s Lemma, we can avoid cycling and
ensure that the Simplex algorithm terminates in a finite number of steps. This simple rule
provides an effective way to handle degeneracy and guarantees the correctness of the
Simplex method, even in cases where degeneracy might occur.
Duality is a fundamental concept in linear programming that connects every linear program
(referred to as the primal problem) with another linear program, called the dual problem.
The dual problem provides an alternative perspective on the original optimization problem,
and in many cases, solving the dual problem is computationally more efficient. Moreover, the
solutions to the primal and dual problems are related in an elegant way.
In this lecture, we introduce the theory of duality, beginning with basic concepts of
separation and a brief review of some key mathematical foundations. The connection
between the primal and dual problems will be explored, and we will establish the
relationships that exist between them.
maximize cT x
subject to Ax ≤ b
75/212
x≥0
Where:
For every linear program (primal), there exists a corresponding dual linear program, which in
the case of the primal maximization problem is a minimization problem. The dual problem is
formulated as follows:
minimize bT y
subject to AT y ≥ c
y≥0
Where:
The objective function bT y in the dual problem represents the total "cost" of the
resources (or constraints) in the primal.
The constraints AT y ≥ c in the dual problem ensure that the dual solution respects the
objective function of the primal.
76/212
The dual of a primal maximization problem with ≤ constraints is a dual minimization
problem with ≥ constraints. The key idea is that the solution to the primal problem provides
information about the optimal values of the dual variables, and vice versa.
We introduce the concept of separation: the separation theorem provides a formal way to
understand how the primal and dual solutions are related. The optimal value of the dual is
always a lower bound to the primal's optimal value in the maximization case. The duality
gap, the difference between the primal and dual objective values, is zero when both the
primal and dual problems have feasible solutions and are optimal.
This relationship leads to strong duality, which states that if both the primal and dual
problems have optimal solutions, their optimal objective values are equal.
Geometrically, duality can be viewed as a transformation from the space of decision variables
in the primal problem to a space of dual variables. In this sense:
The primal problem focuses on finding the best combination of variables to optimize the
objective function under certain constraints.
The dual problem focuses on determining the values for the constraints that give the
best (minimum) cost for achieving the optimal solution of the primal.
This geometric interpretation leads to the fundamental theorem of duality: if the primal
problem has an optimal solution, then so does the dual problem, and the objective values of
the primal and dual problems are equal.
The weak duality theorem asserts that for any feasible solution x to the primal problem and
y to the dual problem, the value of the objective function for the dual is always greater than
or equal to the value of the objective function for the primal. In mathematical terms:
This theorem provides a lower bound for the primal objective and an upper bound for the
dual objective. If both the primal and dual problems have optimal solutions, weak duality
implies that their optimal values must be equal.
The strong duality theorem states that if the primal problem has an optimal solution and
the dual problem also has an optimal solution, then the optimal objective values of the
77/212
primal and dual problems are equal:
This result guarantees that the optimal solutions to the primal and dual problems coincide in
value, which is a powerful tool in optimization.
subject to x1 + x2 ≤ 4
x1 ≤ 3
x1 , x2 ≥ 0
subject to y1 + y2 ≥ 3
y1 + 2y2 ≥ 2
y1 , y2 ≥ 0
Here, the primal maximization problem involves maximizing the value of 3x1 + 2x2 subject
to two constraints. The dual problem seeks to minimize the cost 4y1 + 3y2 , subject to
constraints involving the dual variables y1 and y2 . The strong duality theorem assures us
that if both the primal and dual have optimal solutions, their objective values will be equal.
9. Conclusion
Duality is a central concept in linear programming that reveals deep connections between
optimization problems. The primal problem and its dual problem provide complementary
perspectives on the same problem. The study of duality provides a powerful framework for
solving optimization problems, understanding the structure of optimal solutions, and
developing efficient algorithms.
In the next steps, we will delve deeper into the theory and practical implications of duality,
including how to construct the dual for more complex problems and apply duality to derive
bounds and solutions.
78/212
Lecture 24: Hyperplane Separation Problem
The hyperplane separation problem is a key concept in convex geometry and optimization.
It deals with the possibility of separating convex sets using hyperplanes. A hyperplane is a
flat affine subspace of one dimension less than the space in which it resides (in Rn , a
hyperplane is an (n − 1)-dimensional affine subspace). The separation theorem is
concerned with finding a hyperplane that can "separate" two distinct convex sets, or a
convex set and a point, such that one set lies entirely on one side of the hyperplane and the
other set lies entirely on the opposite side.
The separation theorem states that if we have two disjoint convex sets, then there exists a
hyperplane that separates them, meaning the hyperplane divides the space into two half-
spaces, with each set lying entirely in one of the half-spaces. Formally:
either C1 ⊆ {x : aT x ≤ β},
C2 ⊆ {x : aT x ≥ β}
In simpler terms, there is a hyperplane that divides the space such that all points in C1 lie on
one side of the hyperplane and all points in C2 lie on the other side.
In addition to separating two disjoint convex sets, the separation theorem can also be
applied to situations where we are trying to separate a convex set from a single point.
Suppose C is a convex set and p is a point not in C . The separation theorem asserts that
there exists a hyperplane that separates the point p from the convex set C , meaning:
p ∈ {x : aT x ≤ β}, C ⊆ {x : aT x ≥ β}
where a ∈ Rn and β ∈ R.
79/212
This result is significant in convex optimization because it tells us that if a point is outside a
convex set, we can always find a hyperplane that separates them, which is used in
optimization algorithms, particularly in duality theory and constraint qualification.
Let’s consider two disjoint convex sets C1 and C2 . The goal is to find a hyperplane that
Assume, for simplicity, that C1 and C2 are non-empty and disjoint. Since both are
Using a support vector (or a separating hyperplane), we show that there exists a
hyperplane such that each set lies entirely in one of the half-spaces determined by the
hyperplane.
The detailed proof relies on the supporting hyperplane theorem, which asserts that for any
convex set, there is a hyperplane that "supports" the set at any point on its boundary. This
supporting hyperplane can be extended to separate the two convex sets if they are disjoint.
For the case where we separate a convex set C from a point p, the proof is more
straightforward. If the point p is not in C , the point can be used to construct a separating
hyperplane based on the fact that the convex set does not include p.
One common proof technique is to use supporting hyperplanes. The supporting hyperplane
theorem states that for any point p outside a convex set C , there exists a hyperplane that
"just touches" C at a boundary point and separates p from the set.
The construction of the separating hyperplane involves finding the direction in which p and
C are "most apart" and using this direction to define the hyperplane.
5. Geometric Interpretation
The separation theorem has an intuitive geometric interpretation. If two convex sets are
disjoint, we can always find a hyperplane that divides the space such that each set lies
entirely on one side of the hyperplane. This is useful in optimization problems where the
constraints form convex sets and we need to separate feasible regions or separate the
objective function from the feasible region.
Similarly, when separating a point from a convex set, the hyperplane provides a way to
define a boundary that distinguishes between the point and the set. This concept is central
80/212
to methods such as convex hull algorithms and support vector machines in machine
learning.
In the context of linear programming, the separation theorem can be used to prove Farkas'
Lemma, which provides a characterization of the solvability of systems of linear inequalities.
This lemma is closely related to the duality of linear programs.
7. Conclusion
The hyperplane separation problem is a key result in convex analysis and optimization. It
provides a way to separate convex sets using hyperplanes, a concept that has deep
implications for linear programming, convex optimization, and machine learning.
Understanding the separation of convex sets and points is crucial for the development of
efficient algorithms for solving optimization problems, particularly in the duality theory of
linear programming. In future lectures, we will explore further applications and proofs
related to convexity, including more advanced results in optimization theory.
Farkas’ Lemma is often viewed as a generalization of the separation theorem for convex sets
but with a key difference: it deals specifically with the separation of a point from a convex
cone and guarantees that the separating hyperplane passes through the origin. This makes
Farkas' Lemma a cornerstone in understanding the duality in optimization problems.
81/212
2. Statement of Farkas Lemma
Let A ∈ Rm×n be a matrix, and let b ∈ Rm be a vector. Consider the system of linear
inequalities:
Ax ≥ b, x ≥ 0.
y T A = 0, y T b < 0, y ≥ 0.
The lemma asserts that, for a given system of linear inequalities, either a solution exists (in
the form of a non-negative vector x satisfying the inequalities), or there exists a separating
hyperplane that separates the feasible region from the origin, with the hyperplane passing
through the origin. This hyperplane is represented by the vector y , which defines a linear
functional that is orthogonal to the row space of A.
3. Geometric Interpretation
Geometrically, the Farkas Lemma provides insight into the relationship between the
feasibility of a system of inequalities and the geometry of cones. To interpret it:
Convex cone: The set of all non-negative solutions to the system Ax ≥ b can be viewed
as a convex cone in Rn , denoted C = {x : Ax ≥ b, x ≥ 0}.
Separation condition: If no solution exists, the hyperplane y T A = 0 separates the
origin from the convex cone formed by the solutions of the system. The condition
y T b < 0 ensures that the hyperplane separates the origin from the feasible set,
effectively "cutting off" the solutions.
The hyperplane passes through the origin: This is the key aspect of Farkas' Lemma. It
tells us that the separating hyperplane defined by the vector y not only separates the
feasible region from the origin, but it also passes through the origin itself, thus defining
a linear relationship between the cone and the solution space.
The Farkas Lemma has several important applications, particularly in the field of
optimization:
82/212
Duality theory: Farkas’ Lemma is closely related to the dual of a linear programming
problem. In fact, it forms the basis for understanding the duality theory in linear
programming, where the dual problem can be derived from the primal problem by using
the geometric separation arguments.
Linear programming: Farkas' Lemma is often used in the proof of strong duality in
linear programming, which asserts that the optimal values of the primal and dual
problems are equal under certain conditions.
To understand the proof of Farkas' Lemma, we will break it down into steps:
If the system Ax ≥ b has a solution, then the first part of the lemma is trivially true. The
solution x is a vector in the non-negative orthant that satisfies the system of
inequalities.
By the separation theorem for convex sets, there must exist a hyperplane that
separates the origin from the set C . This hyperplane can be described by a vector y such
that y T x ≥ 0 for all x ∈ C and y T 0 < 0. This guarantees that y T A = 0 and y T b < 0,
thus completing the proof.
5.3 Why y ≥0
The condition y ≥ 0 ensures that the hyperplane does not just touch C at a single point,
but rather separates the entire cone from the origin. The non-negativity of y ensures
that the separating hyperplane is oriented correctly to divide the space.
6. Conclusion
83/212
Farkas’ Lemma is a powerful result that has broad applications in linear programming,
convex optimization, and duality theory. It provides a concrete geometric criterion for
understanding the feasibility of a system of linear inequalities and allows us to separate the
feasible set from the infeasible region via a hyperplane passing through the origin.
Understanding Farkas' Lemma is crucial for grasping the deeper properties of linear
programs and convex optimization problems, particularly in the context of duality and
constraint qualification.
1. Introduction to Duality
In this lecture, we introduce the concept of the dual of a linear programming problem.
Duality is a fundamental idea in optimization theory, particularly in linear programming, as it
provides a way to approach optimization problems from a different perspective. Instead of
directly solving the primal problem, one can solve its dual, which is often easier or more
insightful.
Duality connects two optimization problems: the primal problem and the dual problem. The
dual problem provides a lower bound for a maximization problem (or an upper bound for a
minimization problem), and it is often easier to solve due to the structure it derives from the
original problem.
The key insight is that every feasible solution of the primal maximization problem gives us a
lower bound on the optimal value, and we seek to find an upper bound by formulating the
dual problem.
maximize cT x
subject to:
Ax ≤ b, x ≥ 0.
Here:
84/212
x ∈ Rn is the vector of decision variables.
minimize bT y
subject to:
AT y ≥ c, y ≥ 0.
Here:
The condition AT y ≥ c ensures that the dual variables provide valid upper bounds for
the primal objective coefficients.
The dual problem provides bounds on the optimal value of the primal problem. In particular:
If the primal problem is feasible and bounded, the dual problem provides a lower bound
for the primal objective.
Conversely, the dual problem provides an upper bound for a minimization problem.
In geometric terms, the dual problem can be thought of as determining the tightest possible
upper bound for the primal problem's objective function. Here's how duality manifests
geometrically:
The primal problem aims to maximize a linear objective subject to constraints that form
a polytope (a bounded region). The feasible region defined by Ax ≤ b is a convex
polytope, and the objective function cT x is a linear function.
The dual problem involves finding a set of coefficients y (dual variables) that represent
how strongly the constraints in the primal problem should be weighted to provide an
optimal bound on the objective. The constraints in the dual problem are derived from
85/212
the primal problem’s coefficients, ensuring that any feasible dual solution offers an
upper bound on the primal objective.
The relationship between the feasible regions of the primal and dual problems can be
visualized as follows:
If the primal problem is feasible and bounded, the dual problem provides a guaranteed
lower bound on the primal objective value.
The dual can be viewed as a way to approach the primal problem from a different
direction, often simplifying the structure of the optimization.
A critical component of duality is the relationship between lower and upper bounds. We now
discuss how the dual provides an upper bound for the primal problem and how dual
variables can be interpreted as pricing or weight coefficients for each constraint in the primal
problem.
Primal problem: The primal maximization problem aims to find the maximum value of
cT x subject to Ax ≤ b. Any feasible solution of the primal problem gives us a lower
bound on the optimal objective value.
Dual problem: The dual problem minimizes bT y subject to the constraint AT y ≥ c. This
dual formulation gives us an upper bound on the primal objective value. The
relationship between the primal and dual objectives ensures that the optimal value of
the dual problem will provide a bound for the optimal value of the primal problem.
In the case of strong duality, if both the primal and dual problems are feasible and
bounded, the optimal values of the primal and dual problems are equal. This is known as
duality gap zero.
The primal and dual problems are deeply related. The key idea is that the solution of one
problem provides valuable information about the solution to the other. Here are some
important aspects of the primal-dual relationship:
Weak Duality: The optimal value of the dual problem provides a bound on the optimal
value of the primal problem. Specifically, for a maximization primal problem and its
corresponding dual minimization problem, we have the following inequality:
86/212
This is known as weak duality.
Strong Duality: In some cases (under regularity conditions, such as Slater's condition for
convex problems), the optimal values of the primal and dual problems are equal. This is
called strong duality, and it allows us to solve either problem to obtain the optimal
solution for both.
If a primal constraint is active (i.e., Ax = b), then the corresponding dual variable is
positive.
If a dual constraint is active (i.e., AT y = c), then the corresponding primal variable
is positive.
Primal Problem:
subject to:
x1 + x2 ≤ 4,
x1 ≥ 0,
x2 ≥ 0.
0, x2 ≥ 0, respectively.
The dual problem will then minimize 4y1 , subject to the dual constraints.
Dual Problem:
minimize 4y1
subject to:
y1 ≥ 3,
y1 ≥ 2,
y1 ≥ 0.
87/212
Here, the dual problem provides an upper bound for the primal problem, and solving the
dual gives us information about the primal problem's optimal solution.
7. Conclusion
In this lecture, we introduced the dual problem and explained its formulation and
significance in linear programming. Duality provides a powerful tool for deriving bounds on
optimization problems, with the dual offering an upper bound on a maximization problem’s
objective. Understanding duality is crucial for optimization theory, as it connects two
problems and provides deeper insight into the structure of linear programming problems.
Through the use of the dual, we can gain a more complete understanding of the solution
space and the optimal values of primal and dual problems.
Before diving into the examples, let's briefly review the key concepts:
maximize cT x
subject to:
Ax ≤ b, x ≥ 0.
Dual Problem: The dual of a linear programming problem provides an alternative way of
formulating the same problem, typically minimizing or maximizing a different objective
function. The dual of the above primal maximization problem is:
minimize bT y
subject to:
AT y ≥ c, y ≥ 0.
88/212
The primal variables x correspond to the dual constraints y .
Let’s start with a simple primal problem to illustrate how the dual is formed:
Primal Problem:
subject to:
x1 + 2x2 ≤ 6,
x1 + x2 ≤ 4,
x1 ≥ 0,
x2 ≥ 0.
The primal has two constraints (the first and second inequalities), so we introduce dual
variables y1 and y2 for these constraints, respectively. The dual variables correspond to
The dual constraints are derived from the primal's objective function. Each dual
constraint corresponds to a primal variable:
The primal objective involves x1 and x2 , so the dual constraints will correspond to
x1 and x2 .
2y1 + y2 ≥ 3 (for x2 ).
subject to:
89/212
y1 + y2 ≥ 2,
2y1 + y2 ≥ 3,
y1 ≥ 0,
y2 ≥ 0.
Next, consider a primal minimization problem to demonstrate how the dual changes when
the objective of the primal is a minimization:
Primal Problem:
subject to:
x1 + x2 ≥ 2,
2x1 + x2 ≥ 3,
x1 ≥ 0, x2 ≥ 0.
The primal has two constraints, so we introduce dual variables y1 and y2 corresponding
to these constraints.
Since the primal is a minimization problem, the dual will be a maximization problem, and
the dual objective will be to maximize bT y = 2y1 + 3y2 .
y1 + y2 ≤ 5
(for x2 ).
subject to:
y1 + 2y2 ≤ 4,
90/212
y1 + y2 ≤ 5,
y1 ≥ 0, y2 ≥ 0.
Consider a primal problem where the constraints involve equalities. This case demonstrates
how the dual formulation changes when the primal problem has equality constraints.
Primal Problem:
subject to:
x1 + x2 = 5,
x1 − x2 = 1,
x1 ≥ 0,
x2 ≥ 0.
The primal has two equality constraints, so we introduce dual variables y1 and y2 for
The dual objective is to minimize 5y1 + y2 , derived from the right-hand side of the
equality constraints.
y1 − y2 ≥ 4
(for x2 ).
subject to:
y1 + y2 ≥ 3,
y1 − y2 ≥ 4,
91/212
y1 ∈ R,
y2 ∈ R.
Through these examples, we reinforce several key principles for formulating the dual:
Dual Objective: The dual objective is derived from the right-hand side of the primal
constraints.
Dual Constraints: The dual constraints are based on the coefficients of the primal's
objective function and the relationships between the primal variables and constraints.
6. Conclusion
By working through these examples, we have solidified our understanding of how to take
the dual of a linear programming problem. The process involves:
2. Formulating the dual objective function based on the primal’s right-hand side.
Understanding how to formulate the dual problem is crucial for solving linear programming
problems efficiently and gaining insights into the relationship between primal and dual
optimization problems.
1. Introduction to Duality
We begin by recalling the primal and dual problems from earlier discussions:
maximize cT x
subject to:
92/212
Ax ≤ b, x ≥ 0.
Dual Problem (Minimization):
minimize bT y
subject to:
AT y ≥ c, y ≥ 0.
The primal problem seeks to maximize the objective cT x subject to the constraints Ax ≤b
and x ≥ 0. The dual problem, in contrast, seeks to minimize bT y subject to the constraints
AT y ≥ c and y ≥ 0.
We have seen how the primal and dual variables relate to one another, but we have not yet
answered why the dual value is a good indication of the primal value, or why, under certain
conditions, the primal and dual optimal values are exactly equal. This is where strong duality
comes into play.
Before discussing strong duality, we briefly recall the weak duality theorem, which is the
starting point for strong duality:
The weak duality theorem states that for any feasible solution x to the primal problem
and any feasible solution y to the dual problem, the value of the dual objective function
provides a bound on the value of the primal objective function:
cT x ≤ bT y.
This means that the objective value of the primal is always less than or equal to the
objective value of the dual, provided that both solutions are feasible.
The weak duality theorem establishes that the primal objective cannot exceed the dual
objective, but it does not explain why or when these two values are equal at optimality. This
is the essence of strong duality.
The strong duality theorem provides the critical link between the primal and dual problems.
It states that under certain conditions, the optimal values of the primal and dual problems
are equal. Specifically:
If the primal problem has an optimal solution, then the dual problem also has an
optimal solution, and the optimal objective values of both problems are the same:
93/212
optimal value of primal = optimal value of dual.
The strong duality theorem holds under the assumption that both the primal and dual
problems are feasible. More specifically:
However, infeasibility of either the primal or dual problem can prevent strong duality from
holding. If either the primal or dual is infeasible, the strong duality theorem does not apply,
and the relationship between the primal and dual becomes more complicated.
1. Farkas' Lemma: Farkas' Lemma provides a criterion for when a system of linear
inequalities has a solution. It is a key component in proving strong duality, as it
essentially guarantees that if the primal has a feasible solution, then there is a
corresponding feasible solution to the dual.
2. Separation Theorem: The separation theorem states that if a point is outside a convex
set, there exists a hyperplane that separates the point from the set. Using this theorem,
we can prove that if the primal problem has an optimal solution, the dual problem will
also have a corresponding optimal solution, and the objective values will coincide.
3. Duality Gap: The duality gap is the difference between the objective values of the primal
and dual problems. Strong duality tells us that when both the primal and dual have
feasible solutions, this gap is zero, i.e., the values are equal.
6. Geometric Interpretation
Geometrically, strong duality can be understood through the concept of separation. If the
primal problem has an optimal solution, there exists a separating hyperplane that divides
the feasible region of the primal from the feasible region of the dual. This separation ensures
that the optimal objective values of both problems are equal.
94/212
Moreover, this relationship between the primal and dual problems is critical in understanding
the simplicity of optimization problems. The strong duality theorem guarantees that by
solving the dual problem, we can determine the optimal value of the primal problem, and
vice versa.
Efficiency in Optimization: Strong duality allows us to solve the dual problem to find
bounds on the primal problem. Solving the dual can often be computationally easier,
especially in large-scale problems, since dual variables might reduce the dimensionality
of the problem.
Optimality Conditions: The equality of the primal and dual optimal values provides a
useful test for optimality. If we find a feasible solution to either the primal or dual, we
can use the other to check if we have found the optimal solution.
8. Conclusion
The strong duality theorem is a central result in linear programming, asserting that under
feasibility conditions, the optimal values of the primal and dual problems are equal. This
result follows from hyperplane separation theorems and is formalized through Farkas'
Lemma. Strong duality provides a deep connection between the primal and dual
optimization problems, allowing us to solve one problem to obtain insights into the other
and ensuring that the values of the two problems coincide when optimal solutions are found.
Primal Problem:
maximize cT x
95/212
subject to:
Ax ≤ b, x ≥ 0.
Dual Problem:
minimize bT y
subject to:
AT y ≥ c, y ≥ 0.
The strong duality theorem asserts that under the assumption of feasibility for both
problems, the optimal objective values of the primal and dual problems are equal:
Before delving into the proof, recall the weak duality theorem, which we discussed in an
earlier lecture. The weak duality theorem states that if x∗ is feasible for the primal and y ∗ is
feasible for the dual, then:
c T x∗ ≤ b T y ∗ .
This establishes an upper bound for the primal objective by the dual objective. However,
weak duality alone does not guarantee that the optimal values are equal; this is where
strong duality comes into play.
To begin the proof of strong duality, we first consider the possibility that one of the
programs (either primal or dual) is infeasible.
Case 1: Primal Infeasibility: If the primal problem is infeasible, then no feasible solution
exists for the primal, and by the Farkas’ Lemma, the dual problem must be unbounded.
This implies that the dual problem cannot have a finite optimal value.
Case 2: Dual Infeasibility: Similarly, if the dual problem is infeasible, then no feasible
solution exists for the dual, and the primal problem must be unbounded.
If either problem is infeasible, then the other problem must be unbounded, and there is no
feasible solution for either problem.
96/212
In the case where both problems are feasible, we proceed with the proof of strong duality.
Suppose that both the primal and dual programs are feasible. That is, there exist x∗ ≥ 0 and
y ∗ ≥ 0 such that:
Ax∗ ≤ b, AT y ∗ ≥ c.
Now, we look at a slightly modified version of the primal and dual problems by adding an
extra dimension corresponding to the objective function.
Consider the following extended linear program, which incorporates the objective function
into the constraints:
Primal (Extended):
Maximize cT x subject to:
Ax ≤ b, x ≥ 0.
Dual (Extended):
Minimize bT y subject to:
AT y ≥ c, y ≥ 0.
In the next step, we introduce the augmented system. This system involves an extra
dimension that includes the objective function. We will apply Farkas' Lemma to this
extended system to prove the equality of the primal and dual objective values.
Farkas’ Lemma provides a necessary and sufficient condition for the solvability of a system of
≤ b, there
linear inequalities. Specifically, it states that for a system of linear inequalities Ax
exists a solution x ≥ 0 if and only if there is no y ≥ 0 such that AT y ≥ c and y T b < cT x.
To apply Farkas' Lemma in the context of strong duality, we consider the extended linear
program with the extra dimension. This allows us to argue that if a feasible solution exists for
both the primal and dual, then the optimal objective values of the primal and dual must
coincide.
97/212
Step 1: Existence of Optimal Solutions: By Farkas' Lemma, if the primal has a feasible
solution, the dual also has a feasible solution. Similarly, if the dual has a feasible
solution, the primal also has a feasible solution.
Step 2: Optimal Values: Since both the primal and dual problems are feasible, the weak
duality theorem guarantees that the value of the primal objective is less than or equal to
the value of the dual objective. In the extended system, the introduction of the objective
function ensures that this inequality becomes an equality when both problems are
feasible.
Step 3: Equality of Objectives: Thus, the optimal value of the primal problem is equal to
the optimal value of the dual problem.
8. Conclusion
The strong duality theorem provides a powerful result in linear programming: if both the
primal and dual problems are feasible, then their optimal objective values are equal. The
proof of this theorem involves using Farkas' Lemma and extending the linear programs into
an additional dimension. By leveraging Farkas' Lemma, we establish that the optimal values
for both the primal and dual problems must coincide under feasibility conditions, thus
proving strong duality.
This result is fundamental to the theory of linear programming and has wide-reaching
implications in optimization and economic theory, where it is used to analyze resource
allocation, pricing, and more.
The strong duality theorem states that if both the primal and dual linear programs are
feasible, then the optimal values of the primal and dual objectives are equal. Formally:
Primal Problem:
98/212
maximize cT x
subject to:
Ax ≤ b, x ≥ 0.
Dual Problem:
minimize bT y
subject to:
AT y ≥ c, y ≥ 0.
If both the primal and dual are feasible, their objective values are equal, i.e.,:
c T x∗ = b T y ∗ .
For each i, the i-th primal variable xi and the i-th dual constraint yi satisfy:
xi (Ax − b)i = 0
for all i,
where (Ax − b)i is the i-th component of the vector Ax − b (the residual of the primal
constraint).
Similarly, for each j , the j -th dual variable yj and the j -th primal constraint xj satisfy:
yj (AT y − c)j = 0
for all j,
where (AT y − c)j is the j -th component of the vector AT y − c (the residual of the dual
constraint).
These two conditions together form the complementary slackness condition, which states
that for every pair of primal and dual variables, at least one of the following must hold:
xi = 0 implies (Ax − b)i = 0 (primal variable is zero means the corresponding primal
constraint is tight),
yj = 0 implies (AT y − c)j = 0 (dual variable is zero means the corresponding dual
constraint is tight).
99/212
3. Intuition Behind Complementary Slackness
The complementary slackness conditions provide deep insight into how the primal and dual
solutions interact:
Ax ≤ b must be strict. This implies that (Ax − b)i = 0, meaning the primal constraint
The complementary slackness condition ensures that if one variable is positive, the
corresponding constraint is tight, and vice versa. This condition helps us understand the
structure of the optimal solution and how changes in one solution (primal or dual) affect the
other.
To derive complementary slackness, we start with the weak duality theorem, which states
that for any feasible solution x to the primal and y to the dual, the following inequality holds:
cT x ≤ bT y.
Now, let us assume that the primal and dual are feasible and that we have reached optimal
solutions for both. We know from strong duality that:
c T x∗ = b T y ∗ .
This means that if yj∗ > 0, then (AT y ∗ − c)j = 0, i.e., the j -th dual constraint is tight.
Thus, the complementary slackness condition follows directly from the equality cT x∗ = bT y ∗
, which holds due to strong duality.
100/212
5. Implications of Complementary Slackness
Optimality Condition: The condition gives a practical way to check if a pair of solutions
(x∗ , y ∗ ) is optimal. If x∗ and y ∗ satisfy the complementary slackness conditions, then
they are optimal solutions for the primal and dual problems, respectively.
6. Conclusion
Game theory studies strategic interactions where the outcomes depend on the actions of
multiple agents (players), each with their own objectives. The fundamental concepts in game
theory include:
Strategies: The set of actions or decisions that each player can choose.
101/212
Payoffs: The outcomes or rewards each player receives based on the strategies chosen
by all players.
A two-player game involves two players who make decisions simultaneously or in sequence,
with the goal of maximizing their own payoff while potentially minimizing the opponent's
payoff.
2. Zero-Sum Games
A zero-sum game is a special type of game where the total payoff for all players always sums
to zero. This means that one player's gain is exactly another player's loss. In mathematical
terms, if there are two players, A and B, with payoffs PA and PB , then:
PA + PB = 0
Thus, a gain for one player corresponds to an equivalent loss for the other.
The main objective in a zero-sum game is for each player to maximize their own payoff while
minimizing the opponent’s payoff. The game is typically represented in a payoff matrix.
Consider a simple two-player zero-sum game where Player 1 (Row player) and Player 2
(Column player) are involved. The game can be represented as a payoff matrix A where:
The entries Aij in the matrix represent the payoff to Player 1 when Player 1 chooses
strategy i and Player 2 chooses strategy j . The payoff to Player 2 is the negative of this
value (since the game is zero-sum).
y1 , y2 , … , yn , where:
strategies) of Player 1.
102/212
In a zero-sum game, players are typically assumed to choose mixed strategies, which are
probability distributions over their set of pure strategies. The goal of each player is to
maximize their expected payoff, considering the strategies of the opponent.
represents the probability that Player 1 will choose the i-th pure strategy. The vector x must
satisfy the following conditions:
m
∑ xi = 1,
xi ≥ 0
for all i.
i=1
Player 1’s expected payoff given Player 2’s strategy y = (y1 , y2 , … , yn ) is:
m n
Payoff1 (x, y) = ∑ ∑ Aij xi yj .
i=1 j=1
Player 1 aims to maximize this payoff by selecting an optimal mixed strategy x. This can be
formulated as a linear program:
m n
Maximize ∑ ∑ Aij xi yj
i=1 j=1
subject to:
m
∑ xi = 1,
xi ≥ 0
for all i.
i=1
Similarly, Player 2 wants to minimize Player 1’s payoff. Let y= (y1 , y2 , … , yn ) be the mixed
strategy of Player 2, where yj represents the probability of choosing the j -th strategy. The
n
∑ yj = 1, yj ≥ 0
for all j.
j=1
m n
Payoff2 (x, y) = ∑ ∑ Aij xi yj ,
i=1 j=1
103/212
and Player 2 aims to minimize this payoff. Thus, Player 2’s objective is to find a mixed
strategy y that minimizes the expected payoff to Player 1, subject to the constraints on y .
In a two-player zero-sum game, we can use linear programming duality to relate the
optimal strategies of the two players. The primal problem is to maximize Player 1’s expected
payoff (while Player 2 minimizes this). The dual problem involves finding Player 2’s optimal
mixed strategy, and the strong duality theorem ensures that the values of the primal and
dual problems are equal, giving us a solution to the game.
The dual variables correspond to the constraints on Player 1’s and Player 2’s strategies, and
complementary slackness will ensure that the solution satisfies the necessary conditions for
optimality.
The Linear Programming approach provides a systematic way to compute the optimal
strategies for both players in zero-sum games.
Simplex algorithm or other optimization algorithms can be employed to solve the linear
programs efficiently.
Nash equilibrium for two-player zero-sum games is characterized by the optimal mixed
strategies of both players, where neither player can improve their expected payoff by
unilaterally changing their strategy.
Zero-sum games and their solution via linear programming have widespread applications in:
Auctions and bidding strategies: Where the bidders are competing against each other
in a zero-sum context.
Military strategy and conflict resolution: In situations where one player's gain
corresponds directly to the loss of the opponent.
8. Conclusion
This lecture introduces the foundational concepts of Algorithmic Game Theory and the
application of linear programming to solve two-player zero-sum games. We have shown
how linear programs can be used to model the strategic interactions between players in such
104/212
games, and how duality theory plays a critical role in deriving optimal solutions. These
concepts lay the groundwork for more advanced topics in game theory and optimization.
As previously discussed, game theory studies strategic interactions between players, where
each player's payoff depends not only on their own actions but also on the actions of other
players. A game is defined by:
Payoffs: The rewards or costs associated with each combination of actions chosen by the
players.
In the context of two-player games, each player’s strategy influences the payoff they receive,
and the goal is to find the strategy that maximizes each player's payoff, given the strategy of
the other player.
A Nash equilibrium is a strategy profile (a combination of strategies, one for each player)
where no player can improve their payoff by unilaterally changing their own strategy,
assuming the strategies of the other players remain fixed. In other words, given the
strategies of all other players, a Nash equilibrium occurs when each player’s strategy is
optimal for them, and no player has an incentive to deviate.
Formally, for a game with n players, let each player i have a set of strategies Si and a payoff
for all si ∈ Si , i.e., no player i can improve their payoff by changing their strategy, assuming
For two players, this condition means that, given Player 2’s strategy, Player 1’s strategy is
optimal, and vice versa.
105/212
3. Existence of Nash Equilibrium
One of the central results in game theory is Nash’s existence theorem, which asserts that:
A Nash equilibrium always exists for any game in which players have finite strategy sets
and the payoff functions are continuous in the players' strategies.
This means that in finite games (games with a finite number of players and strategies), there
is always at least one strategy profile where no player has an incentive to unilaterally deviate.
The Nash equilibrium can occur in pure strategies, where players choose specific actions
with certainty, or in mixed strategies, where players randomize over their available
strategies.
Nash’s theorem was proven by John Nash in 1950 and applies to both zero-sum games and
more general games, including those with non-zero-sum payoffs.
While Nash’s existence theorem guarantees the existence of a Nash equilibrium in finite
games, it does not provide a method for finding one efficiently. The computational
complexity of finding a Nash equilibrium has been a key area of research in algorithmic
game theory.
Existence vs. Computability: Although Nash equilibrium exists in every finite game (with
continuous payoffs), finding the equilibrium can be computationally difficult. The
problem of computing a Nash equilibrium, even for simple games, can be quite hard.
In practical terms, finding a Nash equilibrium may require sophisticated algorithms. For
certain classes of games, like two-player zero-sum games, the equilibrium can be found
efficiently using linear programming or dynamic programming techniques. However, for
more general games, computing the Nash equilibrium is often a difficult problem.
Pure Strategy Nash Equilibrium: In a pure strategy equilibrium, each player chooses
one strategy with certainty. This is the most straightforward form of equilibrium.
However, pure strategy Nash equilibria do not always exist in every game.
106/212
Mixed Strategy Nash Equilibrium: In a mixed strategy equilibrium, players randomize
over their possible strategies. Even if a game does not have a pure strategy Nash
equilibrium, it may still have a mixed strategy Nash equilibrium. For example, in rock-
paper-scissors, there is no pure strategy equilibrium, but the mixed strategy equilibrium
involves each player randomizing their choices with equal probability.
In the Prisoner's Dilemma, two suspects are arrested for a crime. Each prisoner has two
choices: cooperate with the other by remaining silent, or defect by betraying the other. The
payoffs are structured so that mutual defection leads to a worse outcome than mutual
cooperation, but each player’s best response is to defect, regardless of what the other does.
Cooperate Defect
In this case, the Nash equilibrium is for both players to defect, as neither player can improve
their payoff by changing their strategy, given the strategy of the other.
In Matching Pennies, each player has two strategies: heads (H) or tails (T). Player 1 wins if
the pennies match, and Player 2 wins if they do not.
H T
The Nash equilibrium in this game is for both players to randomize between heads and tails
with equal probability (i.e., choosing H or T with probability 0.5).
107/212
Political Science: In voting systems and coalition formation, players (politicians) must
select strategies that account for the actions of others.
Computer Science: In network routing, distributed algorithms, and auction design, Nash
equilibrium helps predict how rational agents behave.
Evolutionary Biology: In evolutionary game theory, Nash equilibrium helps explain the
stable strategies that evolve in populations over time.
8. Conclusion
In this lecture, we introduced the concept of Nash equilibrium and discussed its significance
in game theory. We established that a Nash equilibrium always exists in finite games with
continuous payoffs, but the challenge lies in efficiently finding it. The lecture highlighted the
differences between pure and mixed strategy equilibria and provided illustrative examples to
better understand the equilibrium concept. Finally, we touched upon the wide-ranging
applications of Nash equilibrium across disciplines.
A zero-sum game is a type of game in which the total payoff (or loss) of all players is always
zero. In a two-player zero-sum game, if one player gains a certain amount, the other player
must lose the same amount. Mathematically, if Player 1 gains x, then Player 2 loses x, and
the total payoff is x + (−x) = 0.
For a two-player zero-sum game:
Player 2's objective: Minimize Player 1's payoff (equivalently, maximize their own payoff,
since Player 2’s payoff is the negative of Player 1’s payoff).
We can represent such a game using a payoff matrix. The rows correspond to the possible
strategies of Player 1, and the columns correspond to the strategies of Player 2. The entries
108/212
in the matrix represent the payoffs to Player 1, with Player 2’s payoff being the negative of
the value in the corresponding entry.
2. Minimax Theorem
There exists a strategy for Player 1 that minimizes the maximum possible loss, and
similarly,
There exists a strategy for Player 2 that maximizes the minimum possible gain.
Player 1's strategy: Player 1 seeks to choose a mixed strategy (a probability distribution
over the available pure strategies) that maximizes their minimum expected payoff,
given Player 2's possible strategies.
Player 2's strategy: Player 2 seeks to minimize Player 1's maximum expected payoff,
given Player 1's possible strategies.
This result establishes that both players can secure a guaranteed outcome (a value of the
game) even if they are playing against an adversary using the worst possible strategy.
We can frame the payoffs of a zero-sum game in a linear programming framework. Consider
the following setup:
Let A be the payoff matrix for Player 1, where the entry Aij represents the payoff to
Player 1 when Player 1 chooses pure strategy i and Player 2 chooses pure strategy j .
The objective for Player 1 is to maximize their expected payoff, while the objective for Player
2 is to minimize Player 1’s payoff.
The payoff to Player 1 when they play mixed strategy x against Player 2’s mixed strategy y is
given by:
109/212
Payoff = xT Ay
Player 1’s objective is to maximize their expected payoff, while Player 2 will minimize this.
The linear program for Player 1 can be written as:
Maximize v
m
subject to ∑ xi = 1,
xi ≥ 0
∀i
i=1
n
∑ Aij yj ≥ v
∀i
j=1
Where v represents the value of the game, which is the expected payoff for Player 1. This
formulation maximizes v subject to the condition that Player 1’s expected payoff is at least v
for all possible strategies of Player 2.
Similarly, Player 2’s objective is to minimize the expected payoff to Player 1, which is the
negative of the value. Player 2’s linear program is:
Minimize v
n
subject to ∑ yj = 1, yj ≥ 0
∀j
j=1
m
∑ Aij xi ≤ v
∀j
i=1
Here, v is the value of the game, and the condition ensures that Player 2 minimizes the
maximum expected payoff to Player 1.
The Minimax Theorem states that the value of the game is the same for both players, i.e.,
the maximin value for Player 1 equals the minimax value for Player 2. This result is closely
related to the concept of strong duality in linear programming.
Strong duality ensures that the value of the primal problem (Player 1’s problem) is equal
to the value of the dual problem (Player 2’s problem), provided both the primal and dual
problems are feasible.
In the context of two-player zero-sum games, the primal problem corresponds to Player
1’s linear program, and the dual problem corresponds to Player 2’s linear program.
110/212
By strong duality, we can conclude that the solutions to both linear programs yield the same
value of the game, and the strategies that achieve this value correspond to a Nash
equilibrium in the zero-sum game. Specifically, the mixed strategies x∗ and y ∗ derived from
the solutions to the linear programs will form a Nash equilibrium because:
Player 1 cannot improve their payoff by changing their strategy given Player 2’s strategy.
Player 2 cannot improve their payoff by changing their strategy given Player 1’s strategy.
Thus, through the duality theory in linear programming, we not only guarantee the existence
of a Nash equilibrium but also provide an explicit method for computing the equilibrium in
two-player zero-sum games.
7. Conclusion
In this lecture, we have demonstrated that the Minimax Theorem, which asserts the
existence of a Nash equilibrium in two-player zero-sum games, is equivalent to strong
duality in linear programming. By framing the payoffs for individual players as linear
programs, we leveraged the strong duality theorem to show that the strategies that
maximize and minimize the expected payoffs for the players lead to a Nash equilibrium. This
connection provides a powerful method for analyzing and solving two-player zero-sum
games.
The minimax theorem, which ensures the existence of a Nash equilibrium in two-player zero-
sum games, has direct implications for communication complexity. Specifically, we will see
how it can be used to analyze the communication required for certain problems.
111/212
The goal is to minimize the number of bits exchanged between Alice and Bob while ensuring
that the correct output is computed. The problem of communication complexity can be
formalized as follows:
They must compute some function f (x, y) of their inputs, where the output is decided
by a communication protocol.
The complexity of the problem is defined as the number of bits exchanged between Alice
and Bob during the computation, and it can depend on the function f (x, y) as well as the
specific communication protocol used.
For example, consider a function f (x, y) where Alice knows x, Bob knows y , and they need
to compute the output f (x, y). A communication protocol might involve Alice sending a
message to Bob, then Bob responding with another message, and possibly Alice sending a
final message. The total number of bits exchanged in this process gives the deterministic
communication complexity.
In the context of communication complexity, one way to view Alice and Bob's interaction is
through the lens of a zero-sum game. Suppose we are interested in computing a function
112/212
f (x, y). We can think of Alice and Bob as playing a game where:
Alice chooses a strategy based on her input x,
The minimax theorem suggests that there exists an optimal strategy for each player that
minimizes the worst-case outcome for the other player. This game-theoretic perspective
allows us to model the communication complexity of certain problems by converting them
into game-theoretic formulations.
A classical example in communication complexity is the AND function. Suppose Alice’s input
x is a binary string of length n, and Bob’s input y is also a binary string of length n. The
function f (x, y) computes the AND of all the bits in x and y , i.e., f (x, y) = x1 ∧ y1 ∧ ⋯ ∧
xn ∧ y n .
The deterministic communication complexity of this function can be analyzed using the
minimax theorem. If both Alice and Bob want to minimize the number of bits exchanged
while ensuring the correct output, they can use optimal strategies based on the minimax
result. For instance, Alice could send a message encoding enough information about her bits
such that Bob can deduce the correct outcome without needing to communicate every
individual bit.
The minimax theorem plays a key role in providing upper and lower bounds on the
communication complexity of many computational problems. By converting a
communication problem into a game-theoretic setting, we can apply the minimax theorem
to analyze the minimal communication required in the worst case, offering both theoretical
insights and practical strategies for designing efficient communication protocols.
Upper bounds: By using strategies derived from the minimax framework, we can also
provide upper bounds, showing that a given amount of communication is sufficient to
solve the problem in the worst case.
113/212
Protocol design: In practice, the minimax theorem guides the construction of
communication protocols that minimize the amount of data exchanged, especially in
large-scale distributed systems.
6. Summary
In randomized communication complexity, the parties Alice and Bob can use random bits
during their communication. This means that the protocol can make probabilistic decisions
at each step. The goal is still to compute a function f (x, y) of their inputs, but the challenge
now is to minimize the expected number of bits exchanged while ensuring that the correct
output is computed with high probability.
Randomness: Alice and Bob are allowed to flip random coins during their
communication. This randomness can influence the decisions made during the protocol.
Error Probability: The protocol is allowed to make mistakes, but only with some small
probability. That is, the protocol should compute the correct output with probability at
least 1 − ϵ, for some small error tolerance ϵ, where ϵ is typically very small (e.g., ϵ =
0.01).
114/212
The randomized communication complexity of a function f (x, y) is the minimum expected
number of bits that Alice and Bob must exchange to compute f (x, y) correctly with high
probability, using a randomized protocol.
In deterministic communication complexity, the actions of Alice and Bob are completely
determined by their inputs, and no randomness is involved. The goal is to find the protocol
that minimizes the total number of bits exchanged in the worst case, ensuring that the
protocol always computes the correct output.
In contrast, in randomized communication complexity, the protocol can use random bits,
which means that it may involve stochastic decision-making. The advantage is that this
randomness allows the protocol to be potentially more efficient in terms of communication,
since the protocol can sometimes avoid the need to send as many bits as in a deterministic
protocol.
Key Differences:
Las Vegas Protocols: These are randomized protocols that always output the correct
result but may use a variable number of communication rounds, depending on the
random choices. The expected number of bits exchanged is minimized, and the protocol
always terminates with the correct answer. These protocols are guaranteed to be correct
but may require more communication than deterministic protocols in some cases.
Monte Carlo Protocols: These are randomized protocols where the output may not
always be correct. However, the protocol is designed such that the probability of error is
very low (for example, less than 1% or ϵ). These protocols aim to minimize the expected
communication while allowing for small errors.
115/212
Suppose Alice has a set A ⊆ {1, 2, ..., n}, and Bob has a set B ⊆ {1, 2, ..., n}.
The goal is to determine if the two sets are disjoint, i.e., if A ∩ B = ∅.
In the deterministic version of this problem, Alice and Bob may need to communicate all the
information in their sets to ensure the correct answer, which could involve exchanging up to
O(n) bits.
However, when allowing randomization, Alice and Bob can use a randomized protocol that
exchanges fewer bits. One well-known protocol involves Alice sending a random subset of
her set to Bob, and Bob checks if there is any overlap with his set. This allows them to answer
whether the sets are disjoint with high probability, using fewer bits than the deterministic
approach.
The expected number of bits exchanged in the randomized protocol is significantly lower
than in the deterministic protocol, showcasing the efficiency of randomized communication.
For example, for certain problems, there might be no deterministic communication protocol
that can solve the problem efficiently, but a randomized protocol can use randomness to
dramatically reduce the communication complexity.
A well-known result in this area is Yao's Minimax Principle in the context of randomized
communication complexity. It states that the minimum expected communication
complexity of a function f (x, y) is the same as the maximum probability with which one
can win a game based on the inputs x and y , where the game is played using randomization.
116/212
Parallel Computing: In parallel computing, minimizing the communication between
different processing units is critical for improving performance. Randomized protocols
can help reduce communication overhead in such systems.
7. Summary
In this lecture, we introduced randomized communication complexity, where Alice and Bob
are allowed to use random bits to improve the efficiency of their communication protocol.
We discussed how randomized protocols can outperform deterministic protocols in terms of
the expected number of bits exchanged, especially when the problem allows for a small error
probability.
The Las Vegas and Monte Carlo types of randomized protocols were introduced, and we
explored how these protocols can be applied to problems like the disjointness problem. We
also highlighted the challenges in proving lower bounds for randomized communication
complexity and discussed some applications of randomized communication in various fields.
117/212
2. Restating the Problem: Deterministic vs. Randomized Protocols
Deterministic Protocol: A protocol where the actions of Alice and Bob are entirely
determined by their inputs. The goal is to minimize the number of bits exchanged in the
worst case, ensuring the protocol always computes the correct answer.
Randomized Protocol: A protocol where Alice and Bob can use random bits in their
communication. The goal is to minimize the expected number of bits exchanged while
ensuring that the correct answer is computed with high probability (i.e., with error
probability less than ϵ).
In general, randomized protocols can perform better than deterministic ones in terms of
expected communication, because they can make probabilistic decisions. However, proving
that a randomized protocol performs well for all cases can be tricky.
Yao's Minimax Theorem provides a formal way to relate these two types of analysis.
Specifically, it asserts the equivalence between the following:
The theorem can be seen as a minimax result, where we minimize the expected
communication for a randomized protocol and maximize the communication complexity
over the worst-case input distribution for a deterministic protocol. It shows that these two
strategies lead to equivalent lower bounds.
Let’s assume we have a randomized protocol for some function f , and we want to prove
a lower bound on its expected communication complexity.
We can use the fact that for a randomized protocol, the expected communication
depends on the distribution of inputs that Alice and Bob receive. This input distribution
can be chosen by an adversary to make the protocol inefficient.
118/212
To establish the equivalence, we imagine transforming the randomized protocol into a
deterministic problem by fixing the distribution of inputs and constructing a
corresponding deterministic input for which the protocol must work.
The idea behind the proof is to show that if a randomized protocol performs well on some
distribution, then no deterministic protocol can outperform it in the worst case. Specifically,
we can argue that:
The theorem has significant implications for understanding the trade-offs between
randomized and deterministic protocols:
Lower Bounds for Randomized Protocols: By applying Yao's Minimax Theorem, we can
establish lower bounds for randomized communication complexity by considering the
worst-case scenario of a corresponding deterministic protocol. This is especially useful in
proving that certain problems require significant communication even when randomized
algorithms are allowed.
119/212
Let us consider the disjointness problem as an example.
Disjointness Problem: Alice has a set A ⊆ {1, 2, ..., n}, and Bob has a set B ⊆
{1, 2, ..., n}. The task is to determine whether A ∩ B = ∅.
In a deterministic protocol, Alice and Bob may need to communicate all the bits of their
sets to determine whether the sets are disjoint, leading to a communication complexity
of O(n).
Using Yao’s Minimax Theorem, we can show that the expected communication complexity
of the best randomized protocol for this problem is still Ω(n). This is because the worst-case
scenario for the disjointness problem (where the sets are large) requires O(n)
communication even in a randomized protocol, and the minimax theorem establishes that
no randomized protocol can perform better than the worst-case deterministic protocol.
7. Summary
The theorem provides a formal framework for understanding the trade-offs between
randomization and determinism in communication complexity, ensuring that these two
approaches are closely linked in terms of their communication requirements.
120/212
To begin, recall the key idea of Yao’s Minimax Theorem:
This theorem allows us to translate the analysis of randomized protocols into the analysis of
deterministic protocols, and it gives us a way to derive lower bounds on the randomized
communication complexity by analyzing corresponding deterministic settings.
Step 1: Define the Problem and Communication Model We start by specifying the
problem for which we want to derive a lower bound, typically in the context of
communication complexity. This involves describing the communication between two
parties (Alice and Bob), their inputs, and the communication protocol they use to solve
the problem.
Step 4: Apply Yao’s Minimax Theorem Once we have the worst-case deterministic
problem, we can apply Yao’s minimax theorem. This allows us to lower-bound the
expected communication complexity of the best randomized protocol by considering
the deterministic worst-case scenario. Specifically, if we can show that the deterministic
problem requires at least C bits of communication in the worst case, the randomized
protocol must also require at least C bits on average.
121/212
To illustrate how to apply Yao’s Minimax Theorem to derive lower bounds, let’s consider a
specific example: the disjointness problem.
Disjointness Problem:
Alice has a set A ⊆ {1, 2, ..., n} and Bob has a set B ⊆ {1, 2, ..., n}.
The goal is to determine if A ∩ B = ∅ (i.e., whether the two sets are disjoint).
1. Defining the Problem: The problem is a communication problem, where Alice and Bob
communicate to determine if their sets are disjoint. The sets are represented as binary
strings of length n, with A and B being the sets that Alice and Bob hold, respectively.
The communication complexity is the number of bits exchanged between Alice and Bob.
2. Choosing a Probability Distribution: We assume that the sets A and B are chosen
randomly according to some distribution. In this case, the distribution may assign a
probability to each possible pair of sets (A, B), where the sets could be either disjoint or
intersecting.
3. Constructing the Deterministic Problem: In the worst-case scenario, the sets A and B
could be very large, and Alice and Bob may need to communicate all of their bits to
determine if the sets are disjoint. This is the worst-case deterministic scenario, where the
communication complexity of the problem is O(n) (since Alice and Bob may need to
exchange all their bits in the worst case).
Thus, the randomized communication complexity of the disjointness problem is Ω(n). This
lower bound is important because it shows that no randomized protocol can solve the
problem using fewer than Ω(n) bits on average, even though randomized protocols might
seem more efficient in other contexts.
122/212
Choosing an appropriate distribution over inputs.
The result that the randomized communication complexity is bounded below by the
deterministic communication complexity (in the worst case) has deep implications in both
communication complexity and algorithmic game theory. It shows that, in many cases,
randomization does not provide a substantial advantage in terms of communication.
The set disjointness problem involves two parties, Alice and Bob, each holding a set of
elements. Their task is to determine whether the sets are disjoint, i.e., if A ∩ B = ∅, where:
Alice has a set A ⊆ {1, 2, … , n},
Bob has a set B ⊆ {1, 2, … , n},
The goal is to determine whether A ∩ B = ∅ (i.e., whether the two sets are disjoint).
The sets are represented as binary vectors A and B , where each entry Ai or Bi is either 0 or
1, indicating whether element i is in the set or not. Alice and Bob exchange messages to
decide whether their sets are disjoint.
The communication complexity is measured by the total number of bits exchanged between
Alice and Bob in a protocol to correctly determine whether A ∩ B = ∅.
123/212
2. Applying Yao’s Minimax Theorem
We will define a probability distribution over the inputs A and B . In the case of the set
disjointness problem, a hard distribution is one that forces any communication protocol to
be inefficient, meaning the protocol must exchange a large number of bits.
One common distribution to consider is the uniform distribution over all possible sets A
and B of size n. However, the key observation in the set disjointness problem is that some
inputs are much harder than others for communication protocols. For example, when the
sets A and B are large and nearly identical, the protocol must communicate more bits to
ensure they are disjoint.
In the worst-case scenario, Alice and Bob may need to exchange all of their bits to determine
whether the sets are disjoint. In this case, the deterministic communication complexity is
Ω(n), where n is the size of the sets. The reasoning is that if the sets A and B are almost
identical, each bit in the sets could potentially be required to be transmitted to check for
disjointness.
This establishes a lower bound of Ω(n) for the randomized communication complexity of
the set disjointness problem. In other words, no randomized protocol can solve the set
disjointness problem using fewer than Ω(n) bits of communication, even in the best-case
distribution.
124/212
To gain more insight into this lower bound, we consider the hard distribution over the
inputs A and B . The hard distribution is designed to make the problem particularly difficult
for communication protocols. One such distribution could be one where A and B are
randomly chosen but are likely to have substantial overlap, which forces the protocol to
communicate a large number of bits to verify disjointness.
In this case, the hard distribution ensures that any protocol must exchange nearly n
bits in the worst case.
4. Conclusion
The lower bound on the randomized communication complexity of the set disjointness
problem is Ω(n), where n is the number of elements in each set. This lower bound shows
that no randomized protocol can solve the problem more efficiently than Ω(n) bits of
communication in the worst case, and it follows directly from applying Yao’s Minimax
Theorem. The hard distribution plays a crucial role in ensuring that the protocol cannot
achieve a better bound.
This result highlights the importance of understanding the limits of randomized protocols
and how communication complexity bounds can be derived using probability distributions,
deterministic problems, and the tools of duality theory from linear programming.
125/212
The goal is to find the maximum amount of flow that can be sent from a source vertex s to a
sink vertex t, while respecting the capacity constraints on the edges.
Formally, we define a flow f (u, v) as the amount of flow passing through edge (u, v),
subject to the following constraints:
Capacity Constraint: For every edge (u, v), the flow must not exceed the capacity of the
edge:
0 ≤ f (u, v) ≤ c(u, v)
Flow Conservation: For every vertex v except the source s and sink t, the amount of flow
entering v must equal the amount of flow leaving v :
∑ f (u, v ) = ∑ f (v , w)
(u,v)∈E (v,w)∈E
The objective is to maximize the total flow from the source s to the sink t, which is defined
as:
Maximize ∑ f (s, v)
(s,v)∈E
This sum represents the total amount of flow leaving the source s.
We now turn the Max Flow Problem into a linear programming problem.
Let f (u, v) be the flow on the edge (u, v). We define the following decision variables:
Let f (u, v) be the amount of flow on edge (u, v) where u and v are vertices in V .
The objective function is to maximize the total flow out of the source vertex s, which is:
Maximize ∑ f (s, v)
(s,v)∈E
Constraints:
1. Capacity Constraints: For each edge (u, v), the flow must not exceed the capacity of the
edge:
126/212
2. Flow Conservation: For every vertex v , except the source s and the sink t, the total flow
entering the vertex must equal the total flow leaving the vertex:
(u,v)∈E (v,w)∈E
f (u, v) ≥ 0 ∀(u, v) ∈ E
Thus, the linear program (LP) for the maximum flow problem can be written as:
Maximize ∑ f (s, v)
(s,v)∈E
subject to:
(u,v)∈E (v,w)∈E
f (u, v) ≥ 0 ∀(u, v) ∈ E
The objective function maximizes the flow from the source s to the sink t.
The capacity constraints ensure that the flow on each edge does not exceed its capacity.
The flow conservation constraints enforce that the amount of flow entering a vertex
(except for the source and sink) equals the amount of flow leaving the vertex.
The non-negativity constraints ensure that the flow on each edge is non-negative.
Thus, solving this LP will yield the maximum flow from the source s to the sink t in the graph.
These algorithms, while not directly solving the LP, can be seen as solving the same problem
iteratively by adjusting the flow values on the graph until an optimal solution is found. The LP
formulation provides a mathematical foundation for understanding the problem, and its
solution is equivalent to the solution obtained by these algorithms.
127/212
5. Conclusion
The Maximum Flow Problem is a classical optimization problem that can be efficiently
formulated as a linear program. The LP formulation involves maximizing the flow from a
source to a sink in a network while respecting edge capacities and flow conservation
constraints. This formulation not only provides a mathematical way of solving the problem
but also lays the groundwork for further theoretical explorations in network optimization
and algorithmic design.
The Minimum Cut Problem is concerned with finding a cut in a directed graph that
separates the source s from the sink t, and minimizes the total capacity of the edges
crossing the cut. More formally:
Given a directed graph G = (V , E), where V is the set of vertices and E is the set of
directed edges with capacities c(u, v) for each edge (u, v) ∈ E ,
A cut is a partition of the vertices into two disjoint sets S and T such that s ∈ S and t ∈
T.
The capacity of the cut (S, T ) is defined as the sum of the capacities of the edges that
go from S to T , i.e., the edges that have one endpoint in S and the other endpoint in T :
The minimum cut is the cut that minimizes this capacity, i.e., the smallest total capacity
among all possible cuts that separate s from t.
One of the key results in network flow theory is the Max Flow - Min Cut Theorem, which
states that:
128/212
This theorem establishes that the maximum flow from the source s to the sink t is equal to
the minimum capacity of a cut that separates s from t.
This theorem is fundamental because it allows us to solve the minimum cut problem by
solving the maximum flow problem. Since we already know how to formulate and solve the
maximum flow problem using linear programming (as shown in the previous lecture), we can
use that framework to find the minimum cut.
Let G = (V , E) be a directed graph, and c(u, v) be the capacity of the edge (u, v). We
define the following decision variables:
Let x(v) be a binary variable that indicates whether a vertex v belongs to the set S (the
source side of the cut). Specifically, x(v) = 1 means v ∈ S , and x(v) = 0 means v ∈ T .
4. Objective Function
The objective of the Min Cut Problem is to minimize the capacity of the cut. The capacity of
the cut (S, T ) is the sum of the capacities of the edges (u, v) where u ∈ S and v ∈ T , or
equivalently, where x(u) = 1 and x(v) = 0. Therefore, the objective function is:
(u,v)∈E
5. Constraints
To ensure that the partition of vertices into sets S and T is valid, we impose the following
constraints:
1. Binary Constraints: Each vertex must be either in set S or set T , which translates to:
x(v) ∈ {0, 1} ∀v ∈ V
2. Source and Sink Constraints: The source vertex s must belong to set S and the sink
vertex t must belong to set T :
129/212
Thus, the linear program for the minimum cut can be written as:
(u,v)∈E
subject to:
x(v) ∈ {0, 1} ∀v ∈ V
x(s) = 1
x(t) = 0
The LP formulation for the minimum cut problem is closely related to the LP for the
maximum flow problem. In fact, by solving the maximum flow problem, we can determine
the minimum cut as follows:
After solving the maximum flow, identify the set S of vertices that are reachable from
the source s using the residual graph.
The minimum cut is the set of edges that go from S to T , and its capacity is the total
capacity of these edges.
This relationship between the maximum flow and minimum cut makes the Max Flow - Min
Cut Theorem a powerful tool in solving both problems.
7. Conclusion
The Minimum Cut Problem is a fundamental optimization problem in network flow theory. It
can be efficiently formulated as a linear program, which minimizes the capacity of a cut that
separates the source s from the sink t in a directed graph. The LP formulation of the
minimum cut problem is directly related to the LP for the maximum flow problem, and the
Max Flow - Min Cut Theorem ensures that the maximum flow equals the minimum cut.
Solving the maximum flow problem thus provides an optimal solution to the minimum cut
problem, and this duality is central to understanding many network optimization problems.
130/212
to a sink t is equal to the capacity of the minimum cut that separates s from t. This is one of
the cornerstone results in network flow theory.
To recall:
Maximum Flow Problem: Given a directed graph G = (V , E) with source s, sink t, and
capacities c(u, v) on the edges, the objective is to find the maximum flow from s to t
that can be sent through the network, subject to the capacity constraints on the edges.
Minimum Cut Problem: Given the same graph, the objective is to find a cut (a partition
of the vertices into two sets S and T , where s ∈ S and t ∈ T ) that minimizes the total
capacity of the edges crossing the cut, i.e., the sum of the capacities of the edges (u, v)
where u ∈ S and v ∈ T .
The Max Flow = Min Cut Theorem states that the maximum value of the flow that can be
sent from s to t is equal to the capacity of the minimum cut that separates s from t.
As established in previous lectures, both the maximum flow and minimum cut problems
can be formulated as linear programs. Let us recall their formulations.
Maximum Flow LP: The maximum flow problem can be formulated as the following
linear program:
Decision Variables:
Objective Function:
Maximize ∑ f (s, v )
(s,v)∈E
Constraints:
∑ f (u, v ) =
∑ f (v , u) ∀u ∈ V ∖ {s, t}
v:(u,v)∈E v:(v,u)∈E
f (u, v) ≥ 0 ∀(u, v) ∈ E
131/212
The objective is to maximize the flow sent from the source s to the sink t, subject to the
flow conservation constraints at each vertex (except at s and t), and the capacity
constraints on the edges.
Minimum Cut LP: The minimum cut problem can be formulated as follows:
Decision Variables:
Objective Function:
(u,v)∈E
Constraints:
x(v) ∈ {0, 1} ∀v ∈ V
The objective is to minimize the capacity of the cut separating s from t, subject to
ensuring that s ∈ S and t ∈ T , and that each vertex is assigned to one of the two sets.
3. Primal-Dual Pair
We now demonstrate that the maximum flow LP and the minimum cut LP form a primal-
dual pair. To do this, we examine the relationship between the constraints and objective
functions of the two linear programs.
The maximum flow problem is a primal optimization problem, where the goal is to
maximize the flow through the network while satisfying flow conservation and capacity
constraints.
The minimum cut problem is the dual optimization problem, where the goal is to
minimize the capacity of the cut separating the source and sink while ensuring that the
cut respects the binary assignment of vertices.
By interpreting the constraints and objective functions of these two LPs, we see that the dual
of the maximum flow problem corresponds to the primal of the minimum cut problem. This
means that the solution to the maximum flow problem provides a lower bound for the
minimum cut, and the solution to the minimum cut problem provides an upper bound for
the maximum flow.
132/212
4. Strong Duality and the Proof
The strong duality theorem of linear programming states that if both the primal and dual
problems have feasible solutions, then their optimal values are equal. In the context of the
max flow and min cut problems, this means that the value of the maximum flow is equal to
the value of the minimum cut.
1. Feasibility: Both the primal (maximum flow) and dual (minimum cut) LPs have feasible
solutions, as they are based on the feasible flow and cut definitions in the network.
2. Optimality: By strong duality, the optimal values of the primal and dual LPs are equal,
i.e., the maximum flow equals the minimum cut.
Thus, we conclude that the value of the maximum flow in the network is equal to the
capacity of the minimum cut that separates the source s from the sink t.
5. Conclusion
The Max Flow = Min Cut Theorem provides a powerful connection between two central
optimization problems in network flow theory. By formulating both problems as linear
programs, we have shown that they form a primal-dual pair. Applying the strong duality
theorem of linear programming leads to the conclusion that the maximum flow in a graph is
equal to the minimum cut capacity. This result has far-reaching implications in network
optimization, communications, and combinatorial optimization.
Recall that every linear program has an associated dual problem. For a given primal linear
program (LP), the dual provides an alternative perspective on the same optimization
problem. The primal and dual LPs are related in such a way that solving one provides
insights into solving the other.
To formalize:
133/212
Primal LP: This typically represents the optimization problem we are trying to solve. It is
expressed in terms of decision variables, constraints, and an objective function.
Dual LP: The dual represents a related problem, where the decision variables
correspond to the constraints in the primal, and the constraints in the dual correspond
to the objective in the primal.
The duality theory guarantees that if both the primal and dual problems are feasible, then
their optimal solutions have the same value, as stated by strong duality.
2. Primal-Dual Methodology
The primal-dual approach refers to a simultaneous solution strategy where we try to solve
both the primal and dual problems concurrently. The key idea is to improve the current
solutions of both the primal and the dual iteratively. This technique is particularly useful
when we do not have a clear method for solving either problem directly, but we can make
progress by leveraging the relationship between the two.
Start with feasible solutions for the dual problem and the primal problem, or at least
approximate feasible solutions.
Iteratively update both the primal and dual solutions based on the current state,
adjusting them in such a way that we progress towards optimality.
In some cases, we may restrict the original primal or dual LP to a simpler form, which is
easier to solve, and then gradually improve this restricted solution.
1. Initialization:
Start with an initial feasible solution for the dual LP and the primal LP. In some
cases, we may begin with an approximate solution that satisfies the constraints to
some extent but is not necessarily optimal.
2. Iterative Improvement:
Gradually refine both the primal and dual solutions. Each step consists of adjusting
the solutions for both LPs based on the information provided by the other. For
example, if the primal solution becomes better, the dual solution is updated
accordingly, and vice versa.
134/212
3. Restricted Primal/Dual:
Often, the original primal and dual LPs may be too complex to solve efficiently. In
such cases, we solve restricted versions of these LPs, where we limit the set of
variables or constraints involved. This restricted LP may be easier to solve, and we
can progressively expand the set of variables and constraints involved as the
solution improves.
4. Termination:
The iterative process stops when we find a solution that satisfies the optimality
conditions, meaning that the current solution is optimal for both the primal and the
dual LPs.
The primal-dual approach is particularly useful in problems where the primal and dual
solutions are intertwined, and solving them together is more efficient than solving them
independently. Some well-known applications include:
Network Flow Problems: In network flow problems, like maximum flow and minimum
cut, the primal-dual approach is often used to find the flow in a network or the cut
separating two vertices.
One classic example is the primal-dual algorithm for the set cover problem, where we
iteratively select elements and sets to cover them while maintaining a feasible dual solution.
To illustrate the primal-dual approach, let us consider the minimum cut problem:
Primal Problem (Minimum Cut): We want to minimize the cut capacity between two
vertices, s and t, in a network.
Dual Problem: In the dual formulation, we are trying to maximize the flow from s to t.
135/212
1. Start with an initial feasible solution: We begin by setting up an initial solution where
the flow is zero and all cuts are unassigned.
2. Iteratively improve both solutions: Gradually adjust the flow and cut by augmenting
the flow along certain paths, which reduces the residual capacity of the cut.
3. Terminate when a feasible cut is found: The process stops when no further
improvements can be made, and the solution corresponds to an optimal flow and an
optimal cut.
Efficiency: The primal-dual approach can often be more efficient than solving the primal
or dual problem independently, especially in large-scale optimization problems.
7. Conclusion
The primal-dual approach is a powerful strategy for solving linear programs, especially when
the primal and dual problems are closely related. By solving both problems simultaneously,
or by iteratively refining feasible solutions to both, we can often find optimal or near-optimal
solutions more efficiently. This approach has broad applications in network flow,
combinatorial optimization, and approximation algorithms, and it is an essential tool in the
field of optimization.
The maximum flow problem is a classical optimization problem in network theory. The goal
is to find the maximum amount of flow that can be sent from a source node s to a sink node
t in a directed flow network, subject to capacity constraints on the edges.
136/212
Formally, we are given a directed graph G = (V , E), where V is the set of vertices (nodes),
and E is the set of edges. Each edge (u, v) ∈ E has a capacity c(u, v), which represents the
maximum flow that can pass through the edge. We seek to maximize the flow from the
source s to the sink t.
We begin by formulating the primal linear program (LP) for the maximum flow problem:
Variables: Let f (u, v) be the flow through the edge (u, v).
Objective: Maximize the total flow from the source s to the sink t:
(s,v)∈E (v,t)∈E
This objective represents the net flow into the sink node t minus the flow out of the
source node s.
Constraints:
(u,v)∈E (v,u)∈E
This ensures that the flow into a node equals the flow out of the node.
This ensures that the flow on an edge does not exceed its capacity.
Feasibility: The flow values must satisfy the above constraints for all edges and vertices
in the network.
The dual problem for the max-flow problem can be derived using duality theory. In the
dual, we seek to find an optimal set of cut capacities. The dual variables correspond to the
cuts in the network that separate the source from the sink.
Formally, let’s consider the cut in a flow network as a partition of the set of vertices V into
two disjoint subsets S and T = V ∖ S , where s ∈ S and t ∈ T . The capacity of a cut is the
sum of the capacities of the edges crossing the cut from S to T , i.e., the edges (u, v) where
u ∈ S and v ∈ T .
137/212
The dual problem maximizes the capacity of such cuts subject to the following constraints:
For every node v ∈ V ∖ {s, t}, we assign a dual variable y(v), where y(v) represents
the potential of node v .
The dual objective is to maximize the sum of capacities across all cuts while satisfying
the dual constraints.
To solve the max-flow problem using the primal-dual approach, we proceed as follows:
1. Initialization:
Start with an initial feasible solution for the primal problem. This is typically done by
setting the flow on each edge to zero initially, i.e., f (u, v) = 0 for all edges (u, v).
The dual solution can be initialized by setting the potentials y(v) = 0 for all
vertices, except for the source s, where y(s) = 0.
2. Iterative Process:
Augment the flow along a path from the source s to the sink t using a residual
graph. A residual graph represents the available capacity along each edge,
considering the current flow.
Update the primal solution: For each edge in the augmenting path, increase the
flow by the smallest available capacity along the path.
Update the dual solution: Adjust the dual variables (node potentials) based on the
change in flow. This step ensures that the dual constraints are maintained, and the
algorithm progresses toward an optimal solution.
3. Termination:
The algorithm terminates when no more augmenting paths can be found in the
residual graph. At this point, the flow in the network is maximal, and the cut that
separates the source from the sink provides the minimum cut, as per the max-flow
min-cut theorem.
5. Ford-Fulkerson Algorithm
The Ford-Fulkerson algorithm is one of the most well-known algorithms for solving the
maximum flow problem. It is based on the primal-dual approach, and it proceeds as follows:
138/212
2. While there exists an augmenting path (a path from s to t in the residual graph),
augment the flow along the path.
3. Update the residual graph and repeat the process until no augmenting paths remain.
4. The final flow corresponds to the maximum flow, and the cut separating s and t
corresponds to the minimum cut.
6. Conclusion
The primal-dual approach provides a clear framework for solving the maximum flow
problem. By starting with an initial feasible solution and iteratively refining both the primal
flow and dual potentials, the algorithm converges to the optimal solution. This leads to the
Ford-Fulkerson algorithm, which is a cornerstone of network flow algorithms. The primal-
dual approach not only yields an efficient algorithm for the max-flow problem but also
deepens our understanding of the relationship between flow and cuts in a network.
The set cover problem is a classic optimization problem. The objective is to select the
minimum number of sets from a collection of sets such that their union covers a given
universal set.
Problem Definition:
where Si ⊆ U.
139/212
Objective: Find the smallest subset of {S1 , S2 , … , Sm } whose union is equal to U , i.e.,
The set cover problem is NP-hard, meaning that finding the exact optimal solution is
computationally intractable for large inputs.
We can model the set cover problem as an Integer Linear Program (ILP).
m
Minimize ∑ xi
i=1
Constraints: Every element u ∈ U must be covered by at least one selected set. For
each element u ∈ U , we can define a constraint as follows:
∑ xi ≥ 1
Si covers u
This ensures that each element in U is covered by at least one of the selected sets.
ILP Formulation:
m
Minimize ∑ xi
i=1
Subject to:
∑ xi ≥ 1
for all u ∈ U
Si covers u
xi ∈ {0, 1}
for all i
The goal is to minimize the total number of sets selected while ensuring that every element
in U is covered by at least one of the selected sets.
Since the set cover problem is NP-hard, it is often useful to solve the LP relaxation of the ILP,
which involves relaxing the integrality constraints. Instead of requiring xi to be binary, we
140/212
LP Relaxation:
m
Minimize ∑ xi
i=1
Subject to:
∑ xi ≥ 1
for all u ∈ U
Si covers u
0 ≤ xi ≤ 1 for all i
By solving this relaxed problem, we obtain a fractional solution, which is typically not
integral. However, the solution provides a lower bound on the optimal integer solution, and
in many cases, it is possible to find a solution that is close to the optimal ILP solution.
Since the set cover problem is NP-hard, an exact solution is difficult to obtain in polynomial
time. However, we can approximate the solution. A well-known approximation algorithm for
the set cover problem works as follows:
Greedy Algorithm:
2. At each step, select the set that covers the largest number of uncovered elements.
3. Add the selected set to the cover and mark the covered elements as covered.
One of the key insights of using LP relaxation is that the LP solution is often close to the ILP
solution. For the set cover problem, the LP relaxation provides a lower bound, and the
greedy algorithm provides an upper bound.
141/212
The relationship between the LP relaxation and the integer solution is essential in
understanding how close the LP solution is to the optimal integer solution. Even though the
fractional solution from the LP relaxation is not directly usable as a solution to the ILP, it
provides important insights and bounds.
6. Conclusion
The set cover problem is a classic example of an NP-hard problem that can be solved
approximately using linear programming. By relaxing the integer constraints and solving the
LP relaxation, we obtain a fractional solution that serves as a lower bound for the integer
problem. Using approximation algorithms, such as the greedy algorithm, we can find a
solution that is close to optimal.
In the context of the set cover problem, we see that even though LP and ILP solutions may
not always coincide, the LP relaxation provides a valuable tool for obtaining approximate
solutions. The relationship between the LP solution and the ILP solution highlights the power
of linear programming in approximating combinatorial optimization problems.
1. Introduction to Rounding
The LP relaxation of the set cover problem allows fractional values for the variables xi ,
where 0 ≤ xi ≤ 1. While these fractional solutions provide useful bounds, they do not
directly give us a valid integral solution (which requires xi to be either 0 or 1). Therefore, we
The integral solution is feasible, i.e., it satisfies all the constraints of the ILP.
The value of the integral solution is as close as possible to the optimal LP solution,
ensuring that the rounding does not lead to a large degradation in the objective value.
2. Rounding Techniques
Several rounding strategies exist to convert fractional solutions into integral ones. We will
discuss a few natural methods and assess their applicability to the set cover problem.
142/212
In randomized rounding, for each set Si , we decide whether to include it in the solution
Specifically, for each Si , we include it in the solution with probability xi , i.e., set Si is
included if we choose a random value from the interval [0, 1] that is less than or equal to
xi . Otherwise, we exclude it from the solution.
Expected Outcome:
The expected value of the objective function after rounding is approximately the value of
the fractional solution. This is because each set is included with probability proportional
to its fractional value.
The feasibility of the solution is preserved on average, but the outcome may not always
be feasible for every execution of the randomized process.
In the set cover problem, randomized rounding might not guarantee that each element
of the universe is covered, since the inclusion of sets depends on probabilities. However,
with appropriate adjustments and careful analysis, it is possible to ensure that the
expected coverage of elements remains valid.
In greedy rounding, we select the sets based on their fractional values, similar to the
greedy algorithm for the set cover problem. The main difference is that instead of
making decisions purely based on the coverage of elements, we choose the sets with the
highest fractional value xi and include them in the solution.
This technique can be applied iteratively, where sets with the largest values of xi are
Expected Outcome:
This method is deterministic and guarantees that each element is covered. The
challenge is that the value of the objective function might be worse than the LP solution
due to over-selection of sets.
In the case of the set cover problem, greedy rounding can be an effective way to ensure
coverage, though it may result in some redundancy (i.e., selecting more sets than
143/212
necessary). The greedy approach can be adjusted to ensure that the solution is both
feasible and close to optimal.
the structure of the problem, often t = 1/2 or based on a heuristic that balances
between feasibility and the objective value.
Expected Outcome:
This method is simpler than randomized rounding and can be more effective in ensuring
a feasible solution. However, it may lead to suboptimal solutions if xi values near t are
For the set cover problem, threshold rounding can be useful, particularly when the
fractional values of xi are not too close to 0 or 1. However, a careful choice of threshold
Greedy rounding ensures that every element is covered, and the algorithm’s
approximation ratio is guaranteed to be within a logarithmic factor of the optimal
solution, i.e., O(log n), where n is the number of elements in the universe.
144/212
This approximation is similar to the greedy algorithm for set cover and can be viewed as
a deterministic alternative to randomized rounding.
Threshold rounding can also give a feasible solution that is close to optimal. By
selecting an appropriate threshold, we can avoid selecting too many unnecessary sets
while ensuring that every element is covered.
4. Conclusion
Greedy rounding and threshold rounding are simpler and often lead to good
approximation ratios for the set cover problem.
In practice, rounding is an essential technique to bridge the gap between LP relaxation and
integer optimization, offering a way to obtain good feasible solutions for NP-hard problems
like the set cover problem.
Lovász's rounding approach is based on the LP relaxation of the set cover problem. The idea
is to treat the LP solution as a probability distribution and select sets in the cover according
145/212
to the probabilities corresponding to their LP variables.
Given an LP solution with fractional values for the sets xi ∈ [0, 1], the goal is to round these
fractional values to obtain an integral solution, i.e., a set cover, while minimizing the
increase in the size of the cover.
For each set Si , we associate a probability with it, equal to the fractional value xi from
the LP relaxation.
This probabilistic selection ensures that the expected number of times set Si is included
One key advantage of this rounding method is that the expected size of the cover does not
increase substantially compared to the LP solution.
Expected Size: The expected number of sets selected is equal to the sum of the LP
variables:
E[size of cover] = ∑ xi
i
Since the LP relaxation gives a solution that is optimal, this expected size is close to the
optimal size of the set cover in the fractional setting.
While the method provides a solution with a guaranteed expected size, we must also ensure
that every element in the universe is covered with high probability.
The probability of coverage for each element is determined by the union of the
probabilities of sets that cover the element.
Since the sets are chosen probabilistically, there is a non-zero chance that an element
may not be covered. However, the probability of failure can be made arbitrarily small by
adjusting the rounding process or selecting more sets.
Probability of Coverage: Each element is covered by at least one of the selected sets
with high probability. This can be shown by the union bound or other probabilistic
146/212
arguments.
Set Size: The expected size of the cover obtained through this rounding method is close
to the LP value. Therefore, the approximation ratio does not increase significantly.
Feasibility with High Probability: The rounding method ensures that the probability of
covering every element is high. The number of sets chosen in the final solution is
proportional to the fractional solution of the LP, and the expected cover size is optimal
or nearly optimal.
Non-Increase in Set Cover Size: The rounding method does not cause a large increase
in the number of sets chosen. It only selects sets based on the LP solution, which
already provides a good approximation of the minimum cover.
Analysis of Set Sizes: Lovász’s method shows that rounding can lead to a cover with an
expected size that is very close to the LP relaxation’s solution, thus ensuring that the
integral solution is good in terms of both feasibility and cost.
7. Conclusion
Lovász’s rounding method is an effective technique for approximating the set cover
problem. By selecting sets probabilistically based on their fractional LP values, we achieve an
integral solution with the following key properties:
The expected size of the set cover does not increase significantly compared to the LP
solution.
Thus, this method offers a practical way to obtain feasible and near-optimal solutions for the
set cover problem, leveraging the structure of linear programming and probabilistic
rounding.
147/212
elements in the universe. This method builds upon the rounding technique from the
previous lecture, and enhances it by executing the rounding process multiple times, thereby
improving the chances of obtaining a feasible and near-optimal solution.
In the set cover problem, we are given a universe of elements U and a collection of sets
S1 , S2 , … , Sm such that each set Si is a subset of U . The objective is to select the fewest
We previously discussed the linear programming (LP) relaxation of the set cover problem,
where we treat the problem as minimizing the sum of the LP variables xi associated with
each set Si , subject to the constraints that each element of U is covered by at least one set
Si , with xi ∈ [0, 1]. The LP relaxation can provide a fractional solution, where the solution
values xi may not necessarily be integer, but the LP value serves as a good approximation
In the previous lecture, we saw that by rounding the fractional LP solution, we can obtain a
solution where the sets are selected probabilistically. However, one issue that arises is that,
while the expected size of the cover is close to the optimal solution, there is a probability
that some elements of the universe may not be covered in a single rounding process. This
happens due to the probabilistic nature of the rounding, and the chance of missing coverage
increases if the selection probabilities are small.
To improve the likelihood that every element is covered, we can repeat the rounding
process multiple times. The idea is that by executing the sampling process several times,
the probability of failing to cover an element decreases exponentially.
For each round of sampling, we independently select sets Si with probability xi (the
After a sufficiently large number of rounds, the probability that any element is not
covered is very small, ensuring high probability of coverage.
148/212
5. Algorithm Outline
1. Solve the LP Relaxation: Compute the fractional solution for the set cover problem
using linear programming, obtaining the values xi for each set Si .
2. Repeated Rounding:
Repeat the rounding process k times, where k is chosen based on the desired
probability of covering all elements.
Combine the sets chosen in each round. The union of the sets selected across all
rounds will cover all elements of the universe with high probability.
4. Return the Result: After k rounds, the union of the selected sets forms a cover for the
universe.
Probability of Coverage:
In each round, the probability that an element is not covered is reduced because
more sets are selected. Specifically, if an element e is not covered in one round, the
probability of missing it in the next round becomes progressively smaller.
After k rounds, the probability of an element not being covered is at most (1 − p)k ,
where p is the probability of an element being covered in each round.
The expected size of the set cover is the sum of the LP values xi , just as in the
original rounding method. Thus, the expected size of the final set cover remains
close to the LP solution.
The number of sets selected in each round is proportional to the LP solution, and
the total number of sets selected after k rounds is still near optimal in expectation.
Time Complexity:
Solving the LP relaxation takes polynomial time, and repeating the rounding process
k times involves selecting sets multiple times, which also runs efficiently in
149/212
polynomial time. The number of rounds k can be chosen to balance between
probability of coverage and computational efficiency.
7. Conclusion
This algorithm for the set cover problem uses repeated sampling to ensure high probability
of covering all elements in the universe. The key steps involve:
This technique effectively addresses the issue of incomplete coverage by the initial rounding
and ensures a feasible solution with high probability. Furthermore, the expected size of the
cover remains close to the LP solution, making this approach a practical and efficient
method for solving the set cover problem with high-probability guarantees.
Linear regression involves fitting a line to a set of data points such that the line best
represents the relationship between the dependent variable y and the independent
variable(s) x. The general form of a linear model is:
y = θ 0 + θ 1 x1 + θ 2 x2 + ⋯ + θ n xn
where:
In simple linear regression, we have only one independent variable x, and the model
reduces to:
y = θ0 + θ1 x
150/212
The goal is to find the best values for the parameters θ0 and θ1 , which minimize the error
In regression, the error typically refers to the difference between the predicted and actual
values. There are various ways to measure the error, but one common method is to use the
sum of absolute deviations (L1 norm) or sum of squared errors (L2 norm). In this lecture,
we focus on the L1 norm, which measures the absolute error for each data point:
m
Error = ∑ ∣yi − y^i ∣
i=1
^i
where m is the number of data points, and y = θ0 + θ1 xi is the predicted value for the i-th
data point.
Minimizing this error function means finding the parameters θ0 and θ1 such that the total
We now frame the linear regression problem as a linear program. Given a set of data points
(x1 , y1 ), (x2 , y2 ), … , (xm , ym ), our goal is to minimize the total absolute error between the
1. Decision Variables:
Let ei represent the error for the i-th data point, i.e., ei
= ∣yi − (θ0 + θ1 xi )∣.
2. Objective Function:
The objective is to minimize the sum of the absolute errors across all data points:
m
min ∑ ei
i=1
3. Constraints:
For each data point, the absolute error ei can be modeled using two linear
constraints:
ei ≥ yi − (θ0 + θ1 xi )
ei ≥ (θ0 + θ1 xi ) − yi
151/212
These constraints ensure that ei is at least as large as the absolute difference between
i=1
subject to:
ei ≥ yi − (θ0 + θ1 xi ) ∀i
ei ≥ (θ0 + θ1 xi ) − yi
∀i
The linear program seeks to find the values of θ0 and θ1 that minimize the total absolute
error ∑ ei , subject to the constraints that each error ei is large enough to capture the
absolute difference between the observed and predicted values. Once the LP is solved, the
optimal values of θ0 and θ1 provide the best-fit line in terms of minimizing the absolute
errors.
In contrast to the L1 norm approach we used here (absolute error), the least squares
method minimizes the sum of squared errors (L2 norm), which results in a different
objective function:
m
min ∑(yi − (θ0 + θ1 xi ))2
i=1
This method leads to a closed-form solution for θ0 and θ1 , but the L1 norm approach, while
computationally more intensive (due to the linear programming), is more robust to outliers
in the data.
Using linear programming for linear regression has practical applications, especially in cases
where you want to minimize errors in a more robust way (using the L1 norm) or when
dealing with large datasets that may not fit well under least squares. Additionally, LP
formulations for regression can be extended to more complex models, such as support
152/212
vector machines (SVMs), and can be adapted for regularized regression (e.g., Lasso
regression).
7. Conclusion
Linear classification involves partitioning a dataset into two classes using a linear boundary.
In a two-dimensional space, this boundary is a straight line, while in higher dimensions, it
becomes a hyperplane. The task is to find a hyperplane that separates the points of the two
classes in such a way that points on one side of the hyperplane belong to one class, and
points on the other side belong to the other class.
For simplicity, let's consider the case where we have a set of two classes C1 and C2 , and we
wish to find a hyperplane that separates these classes perfectly (i.e., no data points are
misclassified).
xi ∈ Rn represents the feature vector of the i-th data point (an n-dimensional vector),
yi ∈ {+1, −1} represents the label of the i-th data point (with two possible classes, +1
and −1),
the goal is to find a hyperplane w ⋅ x + b = 0 that separates the two classes. Here:
w ∈ Rn is a vector normal to the hyperplane,
b is the bias term (scalar), and
w ⋅ x denotes the dot product between the weight vector w and the feature vector x.
153/212
For the dataset to be linearly separable, the following constraints must hold:
For each data point (xi , yi ), the label yi determines on which side of the hyperplane the
point xi lies:
If yi = +1, we want w ⋅ xi + b ≥ 1,
Thus, the linear classification problem can be expressed as the following set of constraints:
yi (w ⋅ xi + b) ≥ 1
∀i = 1, 2, … , m
To solve the linear classification problem using linear programming, we aim to find the
optimal hyperplane that separates the classes. The objective of the linear program is to
maximize the margin between the two classes while satisfying the constraints.
Margin: The margin of a hyperplane is the distance between the closest data point (from
either class) and the hyperplane. Maximizing this margin ensures that the classifier is as
far as possible from the nearest data points, leading to better generalization.
The constraints ensure that all data points are correctly classified, and the objective is to
1
maximize the margin, which is proportional to ∥w∥ (the inverse of the magnitude of the
weight vector).
min ∥w∥
subject to:
yi (w ⋅ xi + b) ≥ 1
∀i = 1, 2, … , m
This is a convex optimization problem, which can be solved efficiently using linear
programming methods.
2. Constraints: The constraints yi (w ⋅ xi + b) ≥ 1 ensure that all data points are classified
correctly and lie on the correct side of the hyperplane. If a data point is from class C1 ,
154/212
the constraint requires that w ⋅ xi + b is greater than or equal to 1, and if the point is
from class C2 , the constraint requires that w ⋅ xi + b is less than or equal to -1.
3. Solution: Solving this linear program gives the optimal values for w and b, which define
the hyperplane that separates the data points of the two classes with the maximum
margin.
The linear program described above is closely related to the Support Vector Machine (SVM),
a widely used machine learning algorithm. SVM aims to find the optimal hyperplane that
separates the data with the largest possible margin. The formulation above is the basis for
the hard-margin SVM, which assumes that the data is linearly separable.
For non-linearly separable data, SVM introduces slack variables to allow some
misclassification while still maximizing the margin. The resulting optimization problem
becomes a soft-margin SVM, which can also be solved using linear programming with
additional constraints.
Linear classifiers are widely used in machine learning for classification tasks, including:
By formulating the problem as a linear program, these classification tasks can be efficiently
solved using LP solvers, making them applicable to large-scale datasets.
7. Conclusion
In this lecture, we demonstrated how linear programming can be applied to solve linear
classification problems. By formulating the task of finding a separating hyperplane as a
linear program, we can efficiently find the optimal solution that maximizes the margin
between the two classes. This approach forms the foundation for Support Vector Machines
(SVMs) and provides a powerful method for solving classification problems in machine
learning.
155/212
linear program (LP) affect the optimal solution. Specifically, we will look at how modifications
to the coefficients in the objective function or constraints influence the optimal solution and
its value.
Sensitivity analysis in linear programming examines the effect of small changes in the
parameters of the problem (such as coefficients in the objective function or the right-hand
side values in constraints) on the optimal solution. It helps answer key questions like:
How does a change in the coefficients of the objective function affect the optimal value?
How does a change in the right-hand side of a constraint impact the solution?
The goal of sensitivity analysis is to understand the robustness of the optimal solution,
identify critical parameters, and provide insights for decision-making under uncertainty.
In the context of a linear program, the following parameters are typically analyzed for
sensitivity:
Objective Function Coefficients: How sensitive is the optimal value to changes in the
coefficients of the objective function?
Right-hand Side of Constraints: How does a change in the constraint bounds (the right-
hand side values) affect the optimal solution?
Feasibility and Optimality Conditions: What are the boundaries for changes that still
keep the solution feasible and optimal?
Maximize cT x
subject to:
Ax ≤ b
x≥0
Where c is the vector of coefficients in the objective function, A is the matrix of coefficients
in the constraints, and b is the right-hand side vector.
156/212
If the optimal solution to this linear program is x∗ , sensitivity analysis involves determining
how changes in the vector c affect x∗ and the optimal value.
Allowable Range for Objective Coefficients: This refers to how much a given objective
function coefficient can change without altering the optimal basis of the solution.
Shadow Price: The shadow price (or dual value) of a constraint measures the change in
the objective function's optimal value per unit increase in the right-hand side of the
constraint. Sensitivity analysis on the right-hand side of constraints helps in
understanding how feasible solutions change when the resources (or limits) in the
constraints change.
Next, let's examine the sensitivity of the solution with respect to changes in the right-hand
side vector b, which represents the resource limits in the constraints.
Maximize cT x
subject to:
Ax ≤ b
x≥0
Feasibility Region: As b changes, the feasible region of the LP also changes. Sensitivity
analysis helps us understand how much we can increase or decrease the values of b
without violating the feasibility of the current solution.
Shadow Price: The shadow price provides valuable information in this context. If the
right-hand side of a constraint increases, the shadow price tells us how much the
objective function will improve per unit increase. Similarly, it indicates the cost of
relaxing the constraint.
There are several common types of sensitivity analysis used in linear programming:
1. Range of Optimality: This analysis focuses on the allowable range of coefficients in the
objective function for which the current optimal solution remains unchanged. If the
coefficient of a variable in the objective function is modified, the solution may still be
optimal within a specific range of values.
157/212
2. Range of Feasibility: This type of analysis examines the effect of changes in the right-
hand side of the constraints on the feasible region. It determines the limits within which
the current solution remains feasible.
3. Dual Prices or Shadow Prices: This refers to the change in the objective function value
resulting from a one-unit increase in the right-hand side of a constraint, assuming the
rest of the problem remains unchanged.
Example 1:
subject to:
x1 + x2 ≤ 5
2x1 + x2 ≤ 6
x1 , x2 ≥ 0
1. Initial Optimal Solution: Solve this linear program using the Simplex method or any LP
solver, and assume the optimal solution is x1 = 2, x2 = 3, with the objective value z =
18.
2. Sensitivity to Changes in Objective Coefficients:
Suppose the coefficient of x1 is changed from 3 to 3.5. We would check whether the
function coefficient does not change the optimal solution, it is within the allowable
range.
Suppose the right-hand side of the first constraint is changed from 5 to 6. Sensitivity
analysis tells us whether this change will cause the current solution to become
infeasible. If the solution is still feasible, we calculate the shadow price to see how
much the objective function would improve per unit increase in the right-hand side.
Example 2:
158/212
Consider an LP for a transportation problem:
subject to:
x1 + x2 + x3 = 20
x1 + 2x2 = 15
x2 + x3 = 10
x1 , x2 , x3 ≥ 0
The effect of changing the supply or demand on the optimal transportation plan.
How much the cost coefficients can change before the optimal solution changes.
7. Conclusion
Sensitivity analysis is an important tool for understanding the stability and robustness of
solutions in linear programming. By evaluating how small changes in the problem's
parameters affect the solution, decision-makers can better handle uncertainties and make
more informed decisions. Through sensitivity analysis, we gain insights into which
parameters are most influential, enabling more flexible and adaptive planning in various
applications, such as resource allocation, optimization, and game theory.
Linear programming involves optimization problems where all decision variables are
continuous, meaning they can take any real values within specified bounds. A typical LP
formulation is:
subject to:
159/212
Ax ≤ b
x≥0
In contrast, Integer Programming (IP) is a class of optimization problems where some or all
decision variables are restricted to be integer values. An IP problem is typically formulated
as:
subject to:
Ax ≤ b
x ∈ Zn or x ∈ {0, 1}n
Where:
Variable Types: In LP, the variables are continuous, whereas in IP, the variables are
restricted to integer values (which may even be binary in certain problems).
Feasibility and Solution Space: LP has a continuous feasible region that is often convex
and can be solved efficiently using algorithms like the Simplex method or Interior Point
methods. In contrast, IP has a discrete feasible region, and the solution space is not
convex. This makes IP problems significantly harder to solve.
Solving Methods: While LP problems can be solved in polynomial time using algorithms
like Simplex or Interior Point methods, Integer Programming is much harder. Most IP
problems are NP-hard, meaning they do not have polynomial-time algorithms for
solving them in the general case.
160/212
Computational Difficulty: Integer programming is computationally challenging due to
the discrete nature of the variables. While LPs are solvable in polynomial time, integer
programming problems typically require more advanced techniques like branch-and-
bound, branch-and-cut, or cutting-plane methods, all of which can be computationally
expensive.
Optimality and Approximation: Unlike LPs, where the optimal solution is guaranteed to
be found at a vertex of the feasible region, IPs may require exhaustive search techniques
like backtracking or dynamic programming, and finding an optimal integer solution can
take longer.
Several approaches exist to solve integer programming problems. Here are the most
common techniques:
161/212
5. Applications of Integer Programming
Scheduling Problems: Scheduling involves assigning tasks to resources over time, and
since both tasks and resources are usually discrete, integer programming is often used
for these types of problems.
Cutting Stock and Knapsack Problems: These problems involve optimizing cutting or
packing operations and often have natural integer constraints, requiring IP to model the
problem.
n
Maximize ∑ vi xi
i=1
subject to:
n
∑ w i xi ≤ W
i=1
xi ∈ {0, 1}
for all i
knapsack.
162/212
Linear Programming Relaxation: If we relax the integrality constraint and allow xi to
take fractional values between 0 and 1, this becomes a typical LP problem, and we can
solve it efficiently using methods like Simplex or Interior Point methods.
Integer Programming Solution: To obtain the optimal 0-1 solution, we must solve the
problem as an integer program, which involves more computational effort and may
require techniques like branch-and-bound.
7. Conclusion
Integer programming provides a powerful framework for solving optimization problems with
discrete variables, but it comes with significant computational challenges compared to linear
programming. While LP can be solved efficiently in polynomial time, IP often requires more
sophisticated techniques like branch-and-bound or cutting planes, which can be
computationally expensive. Despite these challenges, IP has a wide range of practical
applications across industries, particularly in areas like logistics, scheduling, resource
allocation, and project selection. Understanding the differences between LP and IP, as well as
the techniques used to solve IP problems, is crucial for tackling real-world optimization
problems involving integer variables.
Integer programming problems are typically formulated in the following general structure:
subject to:
Ax ≤ b
x ∈ Zn or x ∈ {0, 1}n
Where:
163/212
A is a matrix representing the coefficients of the constraints.
b is the vector representing the right-hand side of the constraints.
cT is the vector representing the objective coefficients.
The decision variables are constrained to integer values, and the problem seeks to maximize
(or minimize) the objective function cT x subject to the constraints.
2. Example Problem
We have a knapsack with a maximum weight capacity W = 50, and we want to maximize
the total value of items that can be placed in the knapsack. Each item has a weight and a
value associated with it. The objective is to decide which items to include in the knapsack,
subject to the weight constraint.
1 10 60
2 20 100
3 30 120
We want to maximize the total value of the selected items while ensuring the total weight
does not exceed the knapsack capacity. The problem can be formulated as:
subject to:
x1 , x2 , x3 ∈ {0, 1}
Where:
The objective function maximizes the total value of the items selected.
The constraint ensures the total weight of the selected items does not exceed the
capacity of the knapsack.
164/212
4. Solving the Problem Using LP Relaxation
To start solving, we often begin by relaxing the integer constraints, allowing the decision
variables x1 , x2 , x3 to take any values between 0 and 1. This converts the problem into a
subject to:
0 ≤ x1 , x2 , x3 ≤ 1
This is now a typical LP problem, which can be solved using methods such as Simplex or
Interior Point algorithms. The solution provides a fractional solution, meaning that the
values of x1 , x2 , x3 may not be integers. For example, we might get a solution where x1
=
0.5, x2 = 1, and x3 = 0.2.
5. Rounding or Branching
If the LP relaxation results in fractional values for the decision variables, we need to find a
way to obtain integer solutions. There are two main methods for this:
Rounding: This method involves rounding the fractional values to the nearest integers
(either 0 or 1). While this method is simple, it does not always yield an optimal solution.
Branch-and-Bound: If the rounding does not give a feasible solution, we use branch-
and-bound, which is an iterative procedure for systematically solving the integer
program by breaking the problem into smaller subproblems. The method divides the
feasible region into subproblems and bounds the solution space based on the relaxation
of integer constraints.
6. Branch-and-Bound Algorithm
1. Relaxation: Solve the LP relaxation of the IP problem, which provides a bound on the
optimal solution.
2. Branching: If the solution is fractional, branch by dividing the problem into two
subproblems, one where a decision variable is forced to be 0 and another where it is
forced to be 1.
3. Bounding: Compute upper and lower bounds for each subproblem to determine if
further branching is needed.
165/212
4. Pruning: Discard subproblems that cannot lead to a better solution than the current
best solution (this is called pruning).
The process continues until all subproblems are either solved or pruned.
7. Cutting Planes
Another technique used to solve integer programming problems is cutting planes. The idea
is to iteratively add linear inequalities (cuts) to the LP relaxation to exclude fractional
solutions without excluding any feasible integer solutions. This process refines the solution
space and can eventually lead to an integer solution.
Once the integer programming problem is solved, we interpret the values of the decision
variables:
If the values of x1 , x2 , x3 are integers (0 or 1), then the solution is directly interpretable
If the values are fractional, we may apply rounding or use branching methods to refine
the solution to integer values.
If the problem was solved using branch-and-bound or cutting planes, the final solution
provides the optimal set of items for the knapsack.
9. Summary
In this lecture, we learned how to approach solving an integer programming problem, using
the Knapsack Problem as a case study. We:
Discussed methods like rounding, branch-and-bound, and cutting planes for handling
fractional solutions.
Integer programming is a powerful tool for solving combinatorial optimization problems, but
its computational complexity requires sophisticated techniques to find the optimal solution
efficiently.
166/212
Lecture 53: Transportation and Assignment Models
In this lecture, we discuss two important applications of Linear Programming (LP): the
Transportation Problem and the Assignment Problem. Both of these are classical
optimization problems with a variety of real-world applications, including logistics, resource
allocation, and operations research.
1. Transportation Problem
Problem Formulation:
We have m suppliers, each with a certain supply si (the amount of goods available).
We have n consumers, each with a certain demand dj (the amount of goods required).
The cost of transporting one unit of goods from supplier i to consumer j is given by cij .
The objective is to determine the amount of goods xij to transport from supplier i to
consumer j , such that the total transportation cost is minimized while satisfying all supply
and demand constraints.
Mathematical Formulation:
m n
Minimize Z = ∑ ∑ cij xij
i=1 j=1
subject to:
Supply Constraints: The total amount supplied by each supplier must not exceed its
supply:
n
∑ xij = si
for each i = 1, 2, … , m
j=1
Demand Constraints: The total amount received by each consumer must equal its
demand:
167/212
m
∑ xij = dj
for each j = 1, 2, … , n
i=1
Non-negativity Constraints: The amount of goods transported between any pair must
be non-negative:
xij ≥ 0
for each i, j
Solution Approach:
The transportation problem can be solved using Linear Programming (LP) techniques, but it
can also be efficiently solved using specialized algorithms, such as the Transportation
Simplex Method or the Modified Distribution Method (MODI Method). These algorithms
exploit the structure of the transportation problem to find the optimal solution faster than
general LP solvers.
2. Assignment Problem
The Assignment Problem is a special case of the transportation problem where the number
of suppliers equals the number of consumers (i.e., m = n). The goal is to assign n workers
to n tasks in such a way that the total cost is minimized (or profit is maximized).
Problem Formulation:
The objective is to assign workers to tasks in such a way that the total cost is minimized while
ensuring that each worker is assigned exactly one task and each task is assigned to exactly
one worker.
Mathematical Formulation:
n n
Minimize Z = ∑ ∑ cij xij
i=1 j=1
subject to:
168/212
n
∑ xij = 1 for each i = 1, 2, … , n
j=1
n
∑ xij = 1
for each j = 1, 2, … , n
i=1
task j :
Solution Approach:
The assignment problem is a special type of integer programming (IP) problem. However,
because of its specific structure (binary decision variables and a square matrix of costs), it
can be solved more efficiently using algorithms such as:
allowed to take continuous values between 0 and 1. The relaxed LP solution can be
converted back into an integer solution using rounding or other techniques.
Cost Matrix Structure: The transportation problem deals with a rectangular cost matrix,
while the assignment problem deals with a square cost matrix.
Objective: Both problems aim to minimize costs, but the transportation problem also
considers supply and demand constraints for each supplier and consumer, while the
assignment problem ensures each worker is assigned exactly one task and vice versa.
4. Applications
Transportation Problem:
169/212
Logistics and supply chain management: Optimizing the distribution of goods from
warehouses to retail locations.
Assignment Problem:
5. Summary
Discussed the Assignment Problem, its specific structure, and how it can be solved
efficiently.
Both the transportation and assignment problems are essential in operations research, and
solving them efficiently can lead to significant cost savings and optimization in various fields.
1. Problem Recap
The cost of transporting one unit of goods from supplier i to consumer j is cij .
170/212
The objective is to determine how much to transport from each supplier to each consumer
(represented by the decision variables xij ) such that the transportation cost is minimized
Mathematical Formulation:
i=1 j=1
subject to:
Supply Constraints: The total amount supplied by each supplier should not exceed its
supply:
n
∑ xij = si
for each i = 1, 2, … , m
j=1
Demand Constraints: The total amount received by each consumer must equal its
demand:
m
∑ xij = dj
for each j = 1, 2, … , n
i=1
Non-negativity Constraints: The amount of goods transported between any pair must
be non-negative:
xij ≥ 0
for each i, j
The transportation problem can be solved using various methods, some of which include:
The Simplex Method: Although a general linear programming method, it is often used
to solve the transportation problem directly.
The Northwest Corner Rule: A heuristic method to find an initial feasible solution.
The Least Cost Method: Another heuristic to start with an initial feasible solution.
171/212
The Vogel Approximation Method (VAM): A more sophisticated heuristic that often
provides better initial solutions.
In this lecture, we will explore a couple of these methods for solving the transportation
problem.
To begin solving the transportation problem, we first need to find an initial feasible solution.
Several methods can be used to construct an initial basic feasible solution (BFS).
1. Start at the top-left corner of the transportation tableau (representing the first supplier
and the first consumer).
2. Allocate as much as possible to the first cell (x11 ), which is the minimum of the supply of
3. If the supply s1 is exhausted, move to the next consumer in the same row (i.e.,
x12 , x13 , …). If the demand d1 is met, move to the next supplier in the same column.
4. Continue this process until all supplies and demands are satisfied.
This method selects the cell with the lowest cost for allocation.
2. Allocate as much as possible to that cell (the minimum of the supply and demand for
that cell).
3. After the allocation, adjust the supply and demand and cross out the row or column that
has been completely satisfied.
4. Repeat the process until all supplies and demands are satisfied.
VAM is a more sophisticated approach that generally yields better initial solutions.
1. For each row and column, calculate the penalty cost, which is the difference between the
smallest and second smallest costs in that row or column.
172/212
2. Select the row or column with the largest penalty and allocate as much as possible to the
cell with the lowest cost in that row or column.
3. Update the supply and demand, and repeat the process until all supplies and demands
are satisfied.
Once an initial feasible solution is found using any of the methods above, we need to check if
it is optimal. If the solution is not optimal, we need to improve it.
The Transportation Simplex Method is typically used to iteratively improve the solution. It
works by pivoting between basic feasible solutions, much like the regular Simplex method
for linear programming.
The basic idea is to start with the initial feasible solution and look for a way to increase the
total transportation cost. This can be done by checking the reduced costs of the non-basic
variables (the cells not included in the current solution) and identifying whether any of them
can be added to the solution to improve the cost.
If a cycle is formed when adding a non-basic variable to the solution, then the solution is
optimal. If no improvement can be made, the algorithm terminates.
5. Example
Consider a simple transportation problem with three suppliers and three consumers.
Supplier 1 4 6 8 20
Supplier 2 2 4 6 30
Supplier 3 5 7 3 25
Demand 15 25 35
We need to minimize the transportation cost by allocating the supplies to meet the
demands. Using the Northwest Corner Rule, we start by allocating to x11 , then move
through the tableau until the total supply and demand are satisfied.
6. Summary
Explored different methods to find initial feasible solutions, including the Northwest
Corner Rule, Least Cost Method, and Vogel's Approximation Method (VAM).
173/212
Discussed the Transportation Simplex Method as a way to improve the initial solution
and find the optimal solution.
The transportation problem is widely applicable in logistics, supply chain management, and
various industrial settings. Solving it efficiently can lead to significant cost savings and
optimization.
1. Problem Definition
Given:
n agents.
n tasks.
A cost matrix C = [cij ], where cij represents the cost of assigning agent i to task j .
The objective is to assign exactly one task to each agent, such that the total assignment cost
is minimized. The cost for the entire assignment is the sum of the individual costs cij for
Mathematical Formulation:
i=1 j=1
subject to:
n
∑ xij = 1
for each i = 1, 2, … , n
j=1
174/212
n
∑ xij = 1
for each j = 1, 2, … , n
i=1
Non-negativity Constraints: The decision variable xij must be binary (either 0 or 1):
xij ∈ {0, 1}
for each i, j
1. Step 1: Subtract the smallest value in each row from every element in that row.
2. Step 2: Subtract the smallest value in each column from every element in that column.
3. Step 3: Cover all the zeros in the matrix with a minimum number of horizontal and
vertical lines.
4. Step 4: Create new zeros by reducing the uncovered elements by the smallest uncovered
value and adding it to the elements covered by two lines.
This method guarantees an optimal solution in O(n3 ) time, making it efficient for large-scale
assignment problems.
The assignment problem is a special case of the Integer Linear Programming (ILP) problem,
where the decision variables xij are binary. We can solve the assignment problem using the
Simplex algorithm or other LP-based methods designed for binary variables (such as
branch-and-bound or cutting planes). However, LP methods are typically less efficient than
the Hungarian method for solving the assignment problem due to the combinatorial nature
of the problem.
175/212
The assignment problem can also be modeled as a minimum cost flow problem or shortest
path problem in a graph. This is particularly useful when the assignment problem is large,
and the constraints need to be modeled as flow conservation equations in a network graph.
This approach is often used in cases where the structure of the problem allows for easy
representation as a network flow.
3. Example
Consider a simple assignment problem with 4 agents and 4 tasks. The cost matrix is as
follows:
Agent 1 4 2 7 3
Agent 2 8 5 6 1
Agent 3 3 4 2 5
Agent 4 6 9 1 4
The goal is to assign each agent to a task such that the total cost is minimized.
Row 1: [4 − 2, 2 − 2, 7 − 2, 3 − 2] = [2, 0, 5, 1]
Row 2: [8 − 1, 5 − 1, 6 − 1, 1 − 1] = [7, 4, 5, 0]
Row 3: [3 − 2, 4 − 2, 2 − 2, 5 − 2] = [1, 2, 0, 3]
Row 4: [6 − 1, 9 − 1, 1 − 1, 4 − 1] = [5, 8, 0, 3]
2. Subtract the smallest value in each column:
Column 1: [2 − 1, 7 − 1, 1 − 1, 5 − 1] = [1, 6, 0, 4]
Column 2: [0 − 0, 4 − 0, 2 − 0, 8 − 0] = [0, 4, 2, 8]
Column 3: [5 − 0, 5 − 0, 0 − 0, 0 − 0] = [5, 5, 0, 0]
Column 4: [1 − 0, 0 − 0, 3 − 0, 3 − 0] = [1, 0, 3, 3]
3. Cover the zeros and perform necessary operations to identify the optimal assignment.
4. Summary
176/212
The Assignment Problem, its mathematical formulation, and the objective of minimizing
the total cost of assigning agents to tasks.
Various methods for solving the assignment problem, including the Hungarian method,
Linear Programming, and the Shortest Path approach.
Both PERT and CPM are used to manage projects efficiently by determining the critical tasks,
the shortest time to complete a project, and the relationship between tasks.
PERT (Program Evaluation and Review Technique): PERT is primarily used in projects
with uncertain activity durations. It uses statistical techniques to account for uncertainty
in activity durations. PERT is often used in research and development projects where the
time estimates are uncertain.
CPM (Critical Path Method): CPM is used for projects where the activity durations are
known and deterministic. It focuses on identifying the critical path, i.e., the longest
sequence of dependent activities that determines the shortest time in which the project
can be completed.
177/212
Aspect PERT CPM
Project Type Research and development (R&D) projects Construction, manufacturing, and
production projects
Critical Path The longest time path through the The longest time path, but with
network exact durations
The diagrammatic representation of both PERT and CPM is crucial in visualizing the tasks,
their dependencies, and the overall project timeline.
Both PERT and CPM use network diagrams to represent tasks and their relationships. These
diagrams consist of nodes (representing tasks or milestones) and arcs (representing the
relationships or dependencies between tasks).
Tasks (Nodes): Each node in the diagram represents an individual task or activity.
Dependencies (Arcs): Arcs represent the relationships between tasks. If task A must be
completed before task B, an arc is drawn from node A to node B.
AON (Activity on Node): In this type of diagram, nodes represent activities, and arcs
represent the dependencies between them. Most modern project scheduling techniques
(including PERT and CPM) use AON diagrams.
AOA (Activity on Arc): In this older diagram type, arcs represent activities, and nodes
represent the events or milestones of the project.
A Task A 4 None
B Task B 3 A
C Task C 2 A
D Task D 5 B, C
E Task E 3 D
178/212
1. Nodes represent tasks A, B, C, D, and E.
Task D depends on both B and C (arc from B to D and arc from C to D).
The critical path is the longest path through the network, which determines the shortest
time to complete the entire project. In the example above, the critical path can be
determined by summing the durations of tasks along the various paths and selecting the
one with the longest total duration.
Path 1: A → B → D → E = 4 + 3 + 5 + 3 = 15 days
Path 2: A → C → D → E = 4 + 2 + 5 + 3 = 14 days
Thus, the critical path is A → B → D → E, and the project will take 15 days to complete.
Slack time refers to the amount of time that a task can be delayed without affecting the
project’s overall completion time. Tasks on the critical path have zero slack, meaning any
delay in these tasks will delay the entire project. Other tasks may have some slack, which can
be used to absorb delays without affecting the project timeline.
1. List all the tasks involved in the project and determine their duration.
2. Identify task dependencies: Determine which tasks must be completed before others
can start.
3. Draw the network diagram: Create nodes for tasks and draw arcs to represent the
dependencies between tasks.
4. Determine the critical path: Identify the longest path through the network and
calculate the project duration.
5. Calculate slack time: For tasks not on the critical path, calculate the slack time.
179/212
Visual Representation: Both methods provide a clear visual representation of the
project schedule and task dependencies, making it easier to manage and track progress.
Time Management: They help in identifying the critical tasks and managing time
effectively, ensuring that the project is completed on schedule.
Resource Allocation: The diagrams help in resource planning by ensuring that tasks are
scheduled in the most efficient way.
Project Scheduling: Used for scheduling and organizing complex projects with multiple
tasks and dependencies.
Research and Development: PERT is often used in R&D projects where time estimates
are uncertain and need probabilistic estimation.
7. Conclusion
The concepts of PERT and CPM and their applications in project management.
The critical path, slack time, and the steps involved in creating PERT/CPM diagrams.
The importance of these techniques in managing time, costs, and resources in large and
complex projects.
By using PERT and CPM, project managers can efficiently plan, schedule, and control
projects, ensuring that tasks are completed on time and within budget.
180/212
The critical path in a project is the longest sequence of dependent activities that determines
the shortest time in which the project can be completed. Any delay in the tasks on the critical
path will directly lead to a delay in the project's overall completion time.
2. Draw the project network diagram, representing the tasks as nodes and the
dependencies as directed arcs (edges).
4. Perform a forward pass to calculate the earliest start (ES) and earliest finish (EF) times
for each task.
5. Perform a backward pass to calculate the latest start (LS) and latest finish (LF) times
for each task.
7. Determine the critical path by identifying the tasks with zero slack time.
3. Forward Pass
The forward pass is used to determine the earliest start time (ES) and the earliest finish
time (EF) for each task. It starts from the first task and works its way through the network.
Earliest Start (ES): The earliest time a task can start, given the completion of its
predecessor tasks.
Earliest Finish (EF): The earliest time a task can finish, calculated as EF = ES +
Duration of Task.
Example:
A 4 None
B 3 A
C 2 A
D 5 B, C
E 3 D
181/212
To calculate the forward pass:
4. Backward Pass
The backward pass is used to determine the latest start (LS) and latest finish (LF) for each
task. It starts from the last task and works backward through the network.
Latest Finish (LF): The latest time a task can finish without delaying the project.
Latest Start (LS): The latest time a task can start without delaying the project, calculated
as LS = LF − Duration of Task.
Task E is the last task. The LF for E = EF of E = 15. Thus, LS for E = 15 - 3 = 12.
Slack Time = LS − ES = LF − EF
182/212
Tasks with zero slack are on the critical path, which in this case are A, B, D, E.
The critical path is the path that includes all the tasks with zero slack. From the example:
The total duration of the project is the duration of the critical path: 4 + 3 + 5 + 3 = 15
days.
7. Summary
Critical Path: The longest path in the project network, determining the project's
duration.
Forward Pass: Determines the earliest start and finish times for each task.
Backward Pass: Determines the latest start and finish times for each task.
Slack Time: The amount of time a task can be delayed without affecting the project's
duration. Tasks on the critical path have zero slack.
Critical Path Calculation: The tasks with zero slack form the critical path, which
determines the overall project duration.
8. Conclusion
Discussed the steps to calculate the critical path using forward pass and backward
pass.
Calculated the slack time and identified tasks on the critical path.
Used an example to illustrate how to calculate the critical path, slack time, and project
duration.
This method ensures that projects are completed in the shortest possible time, helping
managers focus on critical tasks and avoid delays.
183/212
1. Introduction to Resource Levelling
Resource levelling aims to balance the resource usage across the duration of the project. The
primary goal is to avoid the situation where resources are over-allocated at certain points in
time, which can lead to delays or inefficiencies.
Over-allocation occurs when the demand for a resource exceeds its availability at a
given time.
Under-utilization occurs when the resources are not being fully used during certain
periods.
By applying resource levelling, project managers try to smooth the resource usage to ensure
that resources are distributed evenly across the entire project schedule.
When tasks have flexible start and finish times (i.e., the project has slack).
When the project deadline is flexible, and task completion times can be adjusted.
Resource levelling is generally not suitable when the project deadline is fixed, as it can delay
the overall project completion.
Minimize idle time: To minimize resource downtime by making sure that they are
constantly utilized without overburdening them.
Maintain the project schedule: To adjust the schedule while still achieving the project’s
objectives, even though the timeline might be extended.
There are various methods to perform resource levelling, and the choice of method depends
on the project requirements, available resources, and task relationships.
a. Delaying Tasks
184/212
This method involves shifting tasks in the schedule to later dates to balance the resource
usage. Tasks that are not on the critical path and have slack can be delayed without affecting
the overall project duration.
b. Splitting Tasks
In this method, tasks are split into smaller parts and spread across the project duration to
avoid resource peaks. This technique is particularly useful when tasks can be broken into
smaller sub-tasks that require less resource allocation at any given time.
c. Resource Smoothing
Resource smoothing involves adjusting the start and finish times of tasks within the existing
project timeline, without affecting the project’s overall duration. The goal is to ensure that
resource usage is as consistent as possible while still meeting the project deadline.
In cases where delays due to resource levelling cannot be avoided, additional resources can
be added to complete tasks on time. This might involve hiring temporary staff or utilizing
overtime. However, this method increases project costs and may not always be feasible.
1. Identify Resource Over-Allocations: Review the project schedule and identify periods
where the demand for resources exceeds the available supply.
2. Determine Resource Constraints: Identify how many resources are available and at
which times, considering the project’s resource allocation constraints.
3. Reschedule Tasks: If possible, reschedule tasks that are non-critical or have slack to
avoid overlapping with other tasks that need the same resources.
4. Apply Resource Smoothing: Adjust tasks within the project timeline without affecting
the overall project duration. For example, tasks can be extended without extending the
project deadline.
5. Assess Impact on Project Timeline: Check if the resource levelling process causes any
delays. If the deadline is fixed, some tasks may need to be shortened or optimized.
Consider the following simplified project with two resources, Person A and Person B, and
four tasks:
185/212
Task Duration Resource Requirement Earliest Start Earliest Finish
Over-allocation:
Person A is over-allocated between Days 1-4 because they are assigned to both T1 and
T2.
Person B is over-allocated between Days 3-5 because they are assigned to both T1 and
T3.
2. Adjust Task Schedule: We can shift T2 to start on Day 6, which would remove the overlap
for Person A. Similarly, T3 can be shifted to start on Day 6, removing the overlap for
Person B.
By shifting tasks T2 and T3, we ensure that both Person A and Person B are not over-
allocated, and the project now has a smoother resource distribution. However, the project
duration has increased by 4 days (from 10 days to 14 days), which is a trade-off in resource
levelling.
Advantages:
186/212
Reduces resource fatigue: Ensures that resources are not overloaded during critical
phases.
Disadvantages:
Project delay: The most common disadvantage is that resource levelling can increase
the overall duration of the project if deadlines are not flexible.
Complexity in scheduling: When the project has many tasks and resources, levelling can
become a complex task.
Increased cost: In some cases, to avoid delays, additional resources or overtime may be
needed, increasing the project cost.
8. Conclusion
In this lecture, we covered the concept of resource levelling, its significance in managing
resource allocation, and the methods for balancing resources across the project duration.
Resource levelling ensures that resources are optimally used without overloading them,
though it may lead to an increase in the overall project duration. By applying resource
levelling techniques, project managers can ensure smoother project execution while
effectively managing resource constraints.
Project scheduling traditionally focuses on completing tasks on time and within the allocated
resources. However, cost plays an equally critical role in scheduling. A well-constructed
project schedule must not only ensure that tasks are completed efficiently but also be
mindful of the costs associated with each task, resource usage, and potential trade-offs
between time and cost.
187/212
Cost considerations in project scheduling involve multiple components that contribute to the
total project cost. These components include:
Direct Costs: These are costs directly associated with performing project tasks. Examples
include:
Indirect Costs: These are overhead costs that are not directly tied to a particular task but
are necessary for the overall project operation. Examples include:
Fixed Costs: These costs do not change regardless of the scale or duration of the project.
For example, renting office space or long-term equipment leases.
Variable Costs: These costs vary depending on the project's scope, time, and resource
requirements. For example, if the project duration is extended, labor costs and material
costs may increase.
Crash Costs: These costs are associated with accelerating the project. They are incurred
when resources are added to a task or when a task is completed in a shorter time frame
than originally planned, often by working overtime or using more expensive resources.
A key aspect of project scheduling is the trade-off between time and cost. In many projects,
there is a direct relationship between the time required to complete a task and the cost
incurred. For example:
Delaying the Schedule: On the other hand, delaying tasks or extending their duration
may lead to cost savings, as it might reduce the need for overtime or additional
resources. However, this also risks project delays and can have downstream effects on
subsequent tasks or project deadlines.
188/212
To balance cost and time effectively, project managers must determine how much time to
save, and the associated costs that justify the time saved.
Cost-time trade-off analysis helps project managers to make informed decisions regarding
schedule changes and resource allocation. The process involves identifying the critical path,
determining the possible duration reductions for each task, and evaluating the cost
implications of these reductions.
1. Identify Critical Path: Determine which tasks directly affect the project’s overall duration
and are critical to meeting the project deadline.
2. Assess Time Reduction Potential: Analyze how much the time for each task can be
reduced. Tasks that are not on the critical path may not require attention, as reducing
their duration will not affect the project completion date.
3. Calculate Crash Costs: For each task on the critical path, calculate the costs associated
with speeding up the task. These might include hiring additional labor, using faster
equipment, or working overtime.
4. Evaluate the Impact of Cost and Time: Compare the costs of reducing the time with the
benefits of completing the project earlier (e.g., improved cash flow, earlier delivery to the
client, etc.).
5. Select the Best Trade-off: Choose the optimal set of tasks to crash, balancing the
project’s time and cost constraints.
Several methods and tools can be used to manage cost considerations in project scheduling:
a. Cost Aggregation
Cost aggregation involves summing up all the costs associated with each activity and
resource in the project to get the total cost. This method helps identify where the most
significant cost items are and helps with forecasting and controlling costs.
b. Cost Smoothing
Cost smoothing aims to achieve a more consistent cost expenditure throughout the project
timeline by redistributing resources and adjusting the schedule. This method ensures that
the project doesn’t face large cost spikes or underutilization of resources.
189/212
c. Resource Leveling with Cost Constraints
When applying resource leveling in a project, it’s important to consider the associated costs.
Resources may be leveled by adjusting their allocation to reduce peak demand and avoid
overuse, which could lead to higher costs. For example, balancing the use of labor and
machinery to ensure that costly resources are not under- or over-utilized.
EVM is a project performance management tool used to monitor project costs. By comparing
the planned progress and cost with actual progress and cost, EVM helps identify cost
overruns early. Key components of EVM include:
Planned Value (PV): The budgeted cost for the work scheduled.
Earned Value (EV): The budgeted cost for the work actually performed.
Actual Cost (AC): The actual cost incurred for the work performed.
EV
Cost Performance Index (CPI): A measure of cost efficiency calculated as CP I = AC
.
The project duration is the sum of the durations of all tasks, or 14 days.
If the project manager needs to reduce the duration, they can crash certain tasks. For
example, reducing the duration of T1 from 5 days to 3 days might increase its daily cost
to $1500.
After crashing:
T1 duration is 3 days with a cost of $1500/day, leading to a total cost of $4500 for T1.
This change reduces the overall project duration by 2 days (from 14 days to 12 days) but
increases the total cost due to the higher cost per day for crashing the task.
7. Conclusion
190/212
Cost consideration is a critical aspect of project scheduling, as it ensures that the project
remains within budget while meeting its deadlines. By understanding the relationship
between time and cost, project managers can make informed decisions to balance both
aspects effectively. Techniques such as cost-time trade-off analysis, cost smoothing, resource
leveling, and earned value management are invaluable tools in managing project costs.
Ultimately, the goal is to deliver a project that satisfies both time constraints and budgetary
limits, ensuring optimal efficiency and success.
minimize Tr(CX)
subject to
Ai (X) = bi ,
i = 1, 2, ..., m.
Tr(CX): The objective is to minimize the trace of a matrix expression involving a matrix C
and the variable matrix X . The trace of a matrix is the sum of its diagonal elements.
191/212
X ⪰ 0: This constraint means that X must be semidefinite positive, meaning all of its
eigenvalues must be non-negative.
A_i(X) = b_i: These are optional affine constraints on the matrix X , where Ai are linear
This type of problem generalizes linear programming and quadratic programming. SDP has
many applications in optimization problems involving quadratic forms and matrix variables.
Semidefinite programming, like linear programming, exhibits strong duality, which is the
property that allows one to derive a dual optimization problem corresponding to the primal
problem. The concept of duality provides valuable insight into the relationship between the
primal problem and its dual.
minimize Tr(CX)
subject to
X ⪰ 0, Ai (X) = bi ,
i = 1, 2, ..., m.
The dual of a semidefinite program involves the introduction of Lagrange multipliers for the
affine constraints. By forming the Lagrangian, the dual problem is obtained by maximizing a
dual objective function under dual constraints. The dual formulation of SDP is given by:
192/212
m
maximize ∑ bi λi
i=1
subject to
m
∑ λi Ai ⪯ C,
i=1
constraints Ai (X) = bi .
m
Matrix Inequality ( ∑i=1 λi Ai ⪯ C ): This condition expresses that the weighted sum
of matrices Ai must be less than or equal to the matrix C in the semidefinite ordering.
The dual problem provides a different perspective on the original (primal) problem and is
often useful for obtaining bounds or simplifying the solution.
Just like linear programming, semidefinite programming also exhibits strong duality under
certain conditions. The strong duality theorem states that if both the primal and dual
problems have feasible solutions, then the optimal values of the primal and dual problems
are equal.
In the context of SDP, strong duality holds when certain regularity conditions are satisfied,
such as when the primal problem is feasible, and the dual problem has a feasible solution
(i.e., there is no duality gap).
Control Theory: SDP is widely used in control theory to design stable systems, optimize
system performance, and solve problems like robust control and optimal control.
Combinatorial Optimization: SDP can be used in problems like the maximum cut
problem, where the objective is to partition a graph into two sets such that the number
193/212
of edges between the sets is maximized. The SDP relaxation of the max-cut problem
provides a powerful tool for approximating solutions to NP-hard problems.
Machine Learning: SDP has been used in machine learning for tasks such as kernel
methods, support vector machines (SVMs), and optimization of classification problems.
6. Conclusion
In this lecture, we will discuss how semidefinite programming (SDP) can be applied to the
Maximum Cut problem. We will explore the relaxation of the problem using SDP and discuss
its approximation guarantees.
194/212
maximize ∑ xu,v ,
{u,v}∈E
subject to
xu,v = {
1 if edge {u, v} is cut
0 if edge {u, v} is not cut
where xu,v denotes whether the edge {u, v} is cut (i.e., the vertices u and v are in different
sets).
This is an NP-hard problem, and exact algorithms become inefficient for large graphs. Thus,
approximations are needed.
The Maximum Cut problem can be relaxed using semidefinite programming. Instead of
looking for an exact cut, we will relax the binary variables xu,v to continuous variables that
represent the "fraction" of the cut. This relaxation leads to an SDP formulation.
The SDP relaxation works by considering the cut as a vector in a higher-dimensional space.
Here's how we approach this:
Cut Definition: The idea is to define a cut using a vector representation of the vertices. If
two vertices are separated, their dot product is negative, and if they are in the same set,
their dot product is positive.
The SDP formulation of the Max-Cut problem involves finding a set of vectors such that the
cut is maximized. The relaxed problem can be written as:
1
maximize ∑ (1 − xTu xv ) ,
4
{u,v}∈E
subject to
xTv xv = 1
∀v ∈ V ,
where xv ∈ Rn is a vector corresponding to vertex v , and xTu xv is the dot product between
195/212
Objective Function: The objective function 14
∑{u,v}∈E (1 − xTu xv ) maximizes the cut
size. The term 1 − xTu xv measures the degree to which vertices u and v are separated
by the cut, i.e., the larger the dot product, the smaller the separation between the two
vertices. The objective seeks to maximize the separation for edges in the cut.
Constraint: The constraint xTv xv = 1 ensures that each vector xv lies on the unit
sphere, which ensures that the vertices are mapped to unit vectors in a higher-
dimensional space.
The SDP relaxation of the Maximum Cut problem provides an approximation to the original
combinatorial problem. The relaxation is often much easier to solve because it is a convex
problem and can be efficiently solved using standard SDP solvers.
The key observation is that the SDP solution provides a fractional cut, but we are interested
in finding an integral (binary) solution. To recover an integral solution, we can use
randomized rounding. In randomized rounding, we round the fractional vectors to binary
values with a certain probability, which leads to an approximate cut.
a. Randomized Rounding
1. Solve the SDP relaxation to get the vectors xv for each vertex.
2. For each vertex v , consider the sign of the dot product xTv r, where r is a random vector
3. Assign each vertex to one of two sets based on the sign of this dot product.
This process yields a cut that is expected to be close to the optimal cut, with a performance
guarantee.
b. Approximation Guarantee
Using randomized rounding, it can be shown that the expected value of the cut generated by
the SDP relaxation is at least 12 times the size of the optimal cut. Therefore, the SDP-based
5. Conclusion
The SDP relaxation of the Maximum Cut problem provides a powerful way to approximate
the optimal solution for large graphs. By relaxing the binary constraints to continuous ones,
we transform the combinatorial problem into a convex optimization problem that can be
196/212
solved efficiently. Randomized rounding then recovers a feasible solution that approximates
the optimal cut.
The SDP relaxation provides a convex formulation of the Max-Cut problem that is much
easier to solve.
The randomized rounding technique allows us to convert the fractional solution back to
an integral one.
approximation.
This approach has applications in combinatorial optimization and is particularly useful for
large-scale problems where exact solutions are computationally infeasible.
As discussed in the previous lecture, the Maximum Cut problem involves partitioning the
vertices of a graph G = (V , E) into two disjoint sets such that the number of edges
between the sets is maximized. This is an NP-hard problem, and solving it exactly for large
graphs is computationally infeasible.
Recall from the previous lecture that the Max-Cut problem can be relaxed using semidefinite
programming. In this relaxation, we represent each vertex v ∈ V by a vector xv ∈ Rn , and
the goal is to maximize the sum of the cut values across all edges.
197/212
The relaxed problem is as follows:
1
maximize ∑ (1 − xTu xv ) ,
4
{u,v}∈E
subject to
xTv xv = 1
∀v ∈ V .
This SDP relaxation replaces the binary decision variables (indicating whether an edge is cut)
with continuous vectors, which are constrained to lie on the unit sphere.
The core idea behind the Goemans and Williamson algorithm is to solve the SDP relaxation,
which provides fractional solutions, and then use randomized rounding to recover an
integral solution (a cut) from these fractional vectors. The algorithm proceeds as follows:
1. Solve the SDP: First, solve the SDP relaxation to obtain the vectors xv for each vertex
where r is a randomly chosen vector from the unit sphere in Rn . Based on this sign,
vertex v is assigned to one of the two sets in the cut.
3. Construct the Cut: After rounding, the vertices are assigned to two sets, and the edges
that cross between these sets form the final cut.
The primary innovation of this algorithm is that it uses a clever randomized procedure to
ensure that the resulting cut is close to the optimal solution while maintaining an
approximation guarantee.
The key to the analysis is understanding how well the randomized rounding performs. The
analysis shows that the expected value of the cut generated by this algorithm is within a
constant factor of the optimal cut. The main results are as follows:
approximation for the Maximum Cut problem. This result is based on the fact that the
relaxation provides a fractional solution, and the rounding technique preserves most of
the cut value.
198/212
2. Geometric Interpretation of Rounding: The vector xv for each vertex v represents the
approximation factor for the Maximum Cut problem, which is a significant improvement
over the naive 12 -approximation that would result from simply picking the cut randomly.
SDP Relaxation: The SDP relaxation of the Max-Cut problem is a convex problem that
can be solved efficiently using standard SDP solvers. This step provides a high-quality
fractional solution.
7. Conclusion
199/212
solution. This algorithm is one of the best-known approximation algorithms in combinatorial
optimization, with applications in network design, circuit partitioning, and other areas where
large-scale graph cuts are required.
Randomized Rounding provides a way to recover an integral solution from the fractional
solution of the SDP.
This approach has broader implications in optimization theory and algorithm design,
especially in combinatorial optimization and approximation algorithms.
A Boolean function is a mathematical function that takes one or more binary inputs (i.e.,
values in {0, 1}) and produces a binary output. Boolean functions are fundamental in
computer science because they are the building blocks of digital circuits, decision-making
processes, and logical operations.
For example, a simple Boolean function might take two inputs x and y , and output their
logical AND, OR, or XOR:
Each of these functions can be represented in different ways, depending on the context and
application.
200/212
One of the most straightforward ways to represent a Boolean function is by using a truth
table. A truth table lists all possible combinations of input values and the corresponding
output for the function.
For example, the AND function for two inputs x and y can be represented by the following
truth table:
x y f (x, y) = x ∧ y
0 0 0
0 1 0
1 0 0
1 1 1
In this table, every combination of the inputs x and y is listed, and the output is calculated
according to the AND operation. Truth tables are helpful for small functions, but they
become less practical for functions with many inputs, as the number of rows grows
exponentially with the number of variables.
3. Boolean Expressions
For example:
The Algebraic Normal Form is another representation of Boolean functions, especially in the
context of Boolean polynomials. In ANF, a Boolean function is expressed as a polynomial in
which the variables are combined using XOR (denoted ⊕) instead of addition, and the
variables are not multiplied.
201/212
f (x1 , x2 , ..., xn ) = c0 ⊕ c1 x1 ⊕ c2 x2 ⊕ c3 x1 x2 ⊕ …
Boolean functions can also be represented in canonical forms that express them in a
standardized manner. Two important canonical forms are:
These forms are often used in satisfiability problems, where the goal is to determine if a
formula is satisfiable — that is, if there exists an assignment of truth values to the variables
that makes the entire expression true.
6. Circuit Representations
For example, a simple Boolean function like f (x, y) = x ∧ y can be represented by a circuit
with a single AND gate. More complex functions, such as XOR or combinations of AND, OR,
and NOT, require more complicated circuits. The complexity of Boolean circuits is an
important factor in the study of computational complexity, as the size of the circuit can affect
the time required for evaluation.
Boolean functions and their representations are used in a variety of fields, including:
Digital Circuit Design: Boolean functions form the foundation of digital logic circuits
used in computer processors, memory units, and other hardware components.
202/212
Optimization: Boolean functions are key in problems like SAT (satisfiability), which has
important applications in verification and optimization.
Artificial Intelligence: Boolean functions are used in machine learning algorithms and
decision trees for classification and regression tasks.
The choice of representation for a Boolean function has a significant impact on the
computational complexity of evaluating the function.
Truth Tables: For n variables, the truth table has 2n rows, making it exponential in size.
Boolean Expressions: A Boolean expression can be simplified, but the size of the
expression depends on the complexity of the function.
Canonical Forms (DNF, CNF): Both DNF and CNF forms can lead to an exponential size in
the worst case, though DNF and CNF representations are more efficient for satisfiability
problems.
9. Conclusion
The Fourier transform, typically associated with continuous functions, can be applied to
Boolean functions (discrete functions) as well. In the case of Boolean functions, this is done
using the Fourier expansion over the Boolean cube. This allows us to express a Boolean
203/212
function as a sum of weighted Fourier characters (also called Fourier basis functions), which
can be analyzed in terms of their frequency components.
Let f : {0, 1}n → {0, 1} be a Boolean function that takes n binary variables as input. The
Fourier representation expresses f as a sum of terms corresponding to different subsets of
the input variables.
f (x1 , x2 , … , xn ) =
∑ f^(S) ∏ xi
S⊆{1,2,…,n} i∈S
where S is a subset of the indices {1, 2, … , n}, and the term ∏i∈S
xi is a product of the
∈ S . The coefficients f^(S) are the Fourier coefficients, which represent
variables xi for all i
3. Fourier Coefficients
i∈S
{0, 1}n . This formula computes the correlation between the function f and the Fourier basis
function ∏i∈S (−1)xi .
The Fourier coefficients f^(S) give a measure of how much the Boolean function f is
influenced by the subset S of its input variables. Intuitively, the larger the Fourier coefficient
f^(S), the more the function depends on the subset of variables corresponding to S .
The domain of the Boolean function f is the Boolean cube {0, 1}n , which consists of all
possible input vectors of n binary variables. The Fourier transform decomposes this function
into components that correspond to different frequencies, allowing for a deeper
understanding of its structure.
The Fourier expansion decomposes the function into these components (called Fourier
characters), which are characterized by different subsets of the variables. Each Fourier
character ∏i∈S (−1)xi can be seen as a frequency component, where S denotes the set of
204/212
5. Properties of Fourier Coefficients
Several important properties and results hold for the Fourier coefficients of Boolean
functions:
Parseval’s Theorem: The sum of the squares of the Fourier coefficients equals the
expected value of the square of the function f . In other words:
S⊆{1,2,…,n}
This implies that the Fourier coefficients f^(S) capture all the information about the
function f .
Bias of a Boolean Function: The bias of a Boolean function f is the sum of the absolute
values of the Fourier coefficients for the non-zero subsets S . The bias measures how far
the function is from being balanced (i.e., having equal numbers of 0s and 1s in its
output).
Bias(f ) = ∑ ∣f^(S)∣
S=∅
Fourier Spectrum: The Fourier spectrum of a Boolean function is the collection of its
Fourier coefficients f^(S). Analyzing the spectrum helps in understanding the complexity
Learning Theory: In machine learning, particularly in the study of Boolean functions, the
Fourier expansion allows for efficient learning of Boolean functions in the context of
Fourier-based algorithms like the Fourier transform method.
Complexity Theory: Fourier analysis provides insight into the complexity of Boolean
functions, helping to classify functions based on their Fourier coefficients. This is
relevant for understanding the computational complexity of Boolean function evaluation
and related problems like satisfiability.
205/212
inputs and their complexity.
Let’s consider a simple Boolean function: the majority function, which outputs 1 if the
majority of the inputs are 1, and 0 otherwise. The Fourier expansion of this function can be
derived, and its Fourier coefficients help in understanding how much each variable and its
combinations contribute to the overall majority behavior.
8. Conclusion
The Fourier representation of Boolean functions provides a rich and powerful framework for
analyzing the structure and complexity of Boolean functions. By decomposing a function into
its Fourier coefficients, we can better understand its behavior, optimize algorithms, and
apply this understanding to fields like machine learning, cryptography, and complexity
theory. The ability to express Boolean functions in terms of simpler oscillatory components is
a key tool for solving a wide range of computational problems.
The approximate degree of a Boolean function f : {0, 1}n → {0, 1} is defined as the
smallest degree d such that there exists a real-valued polynomial p(x1 , x2 , … , xn ) of
degree at most d that approximates the Boolean function f within a certain error margin.
∗
Mathematically, the approximate degree deg (f ) of a Boolean function f is given by:
real coefficients, and the degree deg(p) denotes the highest degree of any term in the
polynomial p. The goal is to find a polynomial that approximates the function f as closely as
possible, with the minimum degree required.
206/212
The approximate degree is a measure of how "complex" a Boolean function is in terms of its
polynomial representation. A Boolean function that is highly "nonlinear" or "complicated"
may require a higher-degree polynomial to approximate it well. Conversely, functions that
are "simple" or closer to being linear may have a low-degree approximation.
For example:
The approximate degree of a Boolean function has important implications in various fields:
Circuit Complexity: The approximate degree provides an upper bound on the circuit
complexity of a Boolean function. Specifically, the approximate degree of a function f is
related to the minimum size of a non-uniform Boolean circuit that computes f . A lower
approximate degree suggests a simpler circuit, while a higher degree implies a more
complex circuit.
Boolean Function Analysis: The approximate degree provides insights into the structure
of Boolean functions. For example, a high approximate degree suggests that the
function has intricate dependencies between the input variables, while a low
approximate degree indicates that the function can be described by simpler interactions
among the variables.
The exact degree of a Boolean function f , denoted deg(f ), is the degree of the smallest
polynomial that exactly represents the function. The approximate degree is a relaxation of
207/212
this concept, where we allow some error in the approximation.
deg∗ (f ) ≤ deg(f )
In some cases, the approximate degree may be much smaller than the exact degree. This is
particularly relevant in the context of approximation algorithms, where the goal is often to
find efficient approximations of functions.
The approximate degree is related to other complexity measures for Boolean functions, such
as:
AC0 Circuit Complexity: A Boolean function f has an AC0 circuit (constant depth,
unbounded fan-in circuits) of size s if and only if the approximate degree of f is
O(log s). Thus, the approximate degree can be used to analyze the efficiency of circuits
computing a Boolean function.
has an approximate degree of Ω(n), which grows linearly with the number of variables.
outputs 1 if the number of 1s in the input is odd and 0 otherwise, has an approximate
degree of n.
208/212
7. Techniques for Bounding Approximate Degree
Several techniques are used to bound the approximate degree of a Boolean function,
including:
8. Conclusion
The concept of approximate degree provides an essential tool for understanding the
complexity of Boolean functions and their approximations. By measuring the degree of
polynomials that approximate Boolean functions, we can gain insights into their circuit
complexity, learnability, and computational hardness. Approximate degree plays a crucial
role in fields such as computational complexity, machine learning, and quantum computing,
and it remains a key concept in the study of Boolean functions and their applications.
We will also introduce the idea of a dual witness in the context of approximate degree,
which will help us establish lower bounds for the approximate degree of the OR function.
1. The OR Function
ORn (x1 , x2 , … , xn ) = {
1 if at least one of x1 , x2 , … , xn is 1,
0 otherwise.
This function is a classic Boolean function, and its complexity can be characterized by the
approximate degree, which is the degree of the lowest-degree polynomial that
approximates the function well.
209/212
First, it is important to understand the exact degree of the OR function. The OR function is
linear in its Boolean representation, and intuitively, it should have a low-degree polynomial
representation. However, determining the exact degree is critical for understanding the
behavior of polynomial approximations.
The exact degree of the OR function is 1. This is because the function can be expressed as:
n
ORn (x1 , x2 , … , xn ) = 1 − ∏(1 − xi )
i=1
This is a polynomial of degree 1. The exact degree, in this case, corresponds to the fact that
the OR function can be expressed using a linear polynomial with degree 1.
The approximate degree is more nuanced. To understand the approximate degree of the OR
function, we look for a polynomial that approximates it within some error bound.
A key result here is that the approximate degree of the OR function is linear in n,
specifically:
This result tells us that the approximate degree of the OR function grows linearly with the
number of variables, n. Even though the OR function is linear in its exact degree
representation, approximating it with a low-degree polynomial requires a degree that scales
with the input size.
To prove the lower bound on the approximate degree of the OR function, we introduce the
concept of a dual witness. This idea comes from the study of communication complexity
and is used to establish lower bounds for the complexity of Boolean functions.
A dual witness is essentially a method to prove that no polynomial of degree smaller than a
certain value can approximate the Boolean function within a given error bound. It works by
showing that any polynomial approximation fails to meet the desired error threshold for
specific input distributions or sets of inputs.
For the OR function, the dual witness argument involves showing that for certain input sets
(e.g., where many of the xi 's are 1), any polynomial of degree d will fail to approximate the
OR function well. Specifically, the idea is to demonstrate that for large inputs, the OR
210/212
function behaves in a way that requires a polynomial of degree at least n to approximate it
effectively.
function.
We show that for a carefully chosen set of input points where many of the variables are
1, the error between p and ORn is significant unless the degree d of the polynomial is at
least n.
The error is typically maximized when a large number of xi 's are set to 1, as the OR
function flips from 0 to 1, and a low-degree polynomial cannot capture this transition
accurately.
From this, we conclude that the approximate degree of the OR function must be at least
∗
linear in n, i.e., deg (ORn ) = Ω(n).
The fact that the approximate degree of the OR function grows linearly with n has several
important consequences:
Circuit Complexity: Since the approximate degree provides an upper bound on the
circuit complexity, this result implies that circuits computing the OR function require
complexity that grows linearly with the number of inputs.
7. Conclusion
The approximate degree of the OR function provides a fascinating example of how Boolean
functions can be analyzed in terms of polynomial approximations. While the exact degree of
211/212
the OR function is 1, its approximate degree is much larger, growing linearly with the number
of inputs. This result is critical for understanding the limits of polynomial approximation and
has broad implications in fields like circuit complexity, communication complexity, and
machine learning.
By utilizing the concept of a dual witness, we can rigorously prove lower bounds for the
approximate degree of Boolean functions, offering deeper insights into their computational
complexity and the limitations of polynomial representations.
212/212