0% found this document useful (0 votes)
10 views

Optimization Lecture Notes

Uploaded by

Shashank S
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Optimization Lecture Notes

Uploaded by

Shashank S
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 212

Linear Programming and Optimization

Lecture 1: Linear Programming – An Example

1. Introduction to Resource Allocation Problems

Resource allocation problems involve optimizing the use of limited resources to achieve a
specific objective. These problems arise across multiple fields, including manufacturing,
transportation, finance, and logistics. Linear Programming (LP) is a mathematical approach
to solving these problems.

Key components of resource allocation problems include:

Objective: The goal to be maximized or minimized (e.g., maximize profit, minimize cost).

Decision Variables: Quantities to be determined (e.g., number of products to produce).

Constraints: Limitations or restrictions on resources (e.g., material availability, labor


hours).

2. Real-World Problem: Resource Allocation Example

Scenario:
A factory produces two types of products, Product A and Product B. Each product requires
the following resources:

Material: 3 units for A, 2 units for B.

Labor: 2 hours for A, 4 hours for B.

The factory has the following resources available:

Material: 18 units.

Labor: 16 hours.

The profit per unit is $4 for Product A and $6 for Product B.

Objective: Maximize total profit.


Constraints: The production of A and B must not exceed resource availability.

3. Formulating the Problem Mathematically

1/212
Let x1 represent the number of units of Product A produced, and x2 the number of units of
​ ​

Product B produced.

Objective Function:

Maximize Z = 4x1 + 6x2 ​ ​

Constraints:
Material:

3x1 + 2x2 ≤ 18
​ ​

Labor:

2x1 + 4x2 ≤ 16
​ ​

Non-negativity:

x1 , x2 ≥ 0
​ ​

4. Solving Using Graphical Method (High-School Approach)

Step 1: Plot the constraints.

Rewrite inequalities as equations:


3x1 + 2x2 = 18 and
​ ​ 2x1 + 4x2 = 16
​ ​

Plot these lines on the x1 x2 -plane.


​ ​

The line 3x1 ​ + 2x2 = 18 intersects x1 -axis at (6, 0) and x2 -axis at (0, 9).
​ ​ ​

The line 2x1 ​ + 4x2 = 16 intersects x1 -axis at (8, 0) and x2 -axis at (0, 4).
​ ​ ​

Step 2: Identify the feasible region.

The feasible region is the intersection of all constraints, including non-negativity.

Step 3: Evaluate the objective function at vertices of the feasible region.

Calculate Z at each vertex to find the maximum profit.

5. Limitations of the Graphical Method

Dimensionality: The graphical method is restricted to problems with two variables (or
three variables in rare cases using 3D visualization).

Accuracy: Graphical solutions may lack precision, especially for fractional solutions.

2/212
Scalability: Real-world problems often involve hundreds or thousands of variables,
making this approach impractical.

6. Transition to Linear Programming

Linear Programming provides a systematic and scalable approach to solving resource


allocation problems. Techniques such as the Simplex Method and software tools can handle
high-dimensional problems efficiently.

This example demonstrates how a real-world problem can be formulated as a linear


program. Subsequent lectures will delve into solving such problems using advanced LP
techniques.

Lecture 2: Introduction to Linear Programming

1. Optimization Problems: A General Overview

Optimization involves finding the best solution to a problem under a given set of constraints.
The solution is optimal if it either maximizes or minimizes a specific objective function.

Elements of Optimization Problems:

Objective Function: A function representing the goal of the problem (e.g., profit, cost,
distance).

Decision Variables: Variables representing the quantities to be determined in the


problem.

Constraints: Equations or inequalities representing restrictions on the decision


variables.

Feasible Region: The set of all solutions that satisfy the constraints.

Optimization problems can be classified into different types based on the nature of the
objective function and constraints. Linear Programming (LP) is one such type where both
the objective function and constraints are linear.

2. Definition of a Linear Program

A linear program is a mathematical optimization problem that satisfies the following


properties:

The objective function is a linear expression of the decision variables.

3/212
The constraints are linear equations or inequalities.

Decision variables are typically non-negative (though this may vary in some cases).

Formally, a linear program can be expressed in standard form as:

Maximize (or Minimize) Z = c1 x1 + c2 x2 + ⋯ + cn xn ​ ​ ​ ​ ​ ​

Subject to:

a11 x1 + a12 x2 + ⋯ + a1n xn ≤ b1


​ ​ ​ ​ ​ ​

a21 x1 + a22 x2 + ⋯ + a2n xn ≤ b2


​ ​ ​ ​ ​ ​

am1 x1 + am2 x2 + ⋯ + amn xn ≤ bm


​ ​ ​ ​ ​ ​ ​

x1 , x2 , … , xn ≥ 0
​ ​ ​

where:

x1 , x2 , … , xn : Decision variables.
​ ​ ​

Z : Objective function (to be maximized or minimized).


c1 , c2 , … , cn : Coefficients of the objective function.
​ ​ ​

aij : Coefficients of the constraints.


b1 , b2 , … , bm : Right-hand side values of the constraints.


​ ​ ​

3. General Structure of a Linear Program

Objective Function:
A linear combination of decision variables, e.g., Z = c 1 x1 + c 2 x2 + ⋯ + c n xn .
​ ​ ​ ​ ​ ​

Constraints:
n
Linear inequalities or equations of the form ∑j=1 aij xj ​ ​ ​ ≤ bi , ∑nj=1 aij xj = bi , or
​ ​ ​ ​ ​

n
∑j=1 ​ aij xj ≥ bi .
​ ​ ​

Non-Negativity Condition:
xj ≥ 0 for all j .

4/212
4. Examples of Linear Programs

Example 1: Production Problem


A factory produces two products, A and B, to maximize profit. The profit is given by Z =
4x1 + 6x2 , subject to constraints:
​ ​

3x1 + 2x2 ≤ 18,


​ ​ 2x1 + 4x2 ≤ 16,
​ ​ x1 , x2 ≥ 0.
​ ​

Example 2: Diet Problem


Minimize the cost of a diet that meets nutritional requirements. Let x1 and x2 represent ​ ​

quantities of two food items:

Minimize Z = 3x1 + 5x2 ​ ​

Subject to:

2x1 + 4x2 ≥ 20,


​ ​ x1 + 2x2 ≥ 10,
​ ​ x1 , x2 ≥ 0.
​ ​

5. Applications of Linear Programming

Linear programming has applications in numerous fields:

Business and Economics: Profit maximization, cost minimization, supply chain


optimization.

Operations Research: Resource allocation, scheduling, transportation problems.

Engineering: Network flow optimization, energy management.

Health Sciences: Diet optimization, medical resource allocation.

6. Conclusion

Linear programming provides a formal framework for addressing optimization problems in


which all relationships are linear. Its versatility and efficiency have made it a cornerstone of
operations research and applied mathematics.

In subsequent lectures, we will explore specific techniques to solve linear programs


systematically and efficiently.

5/212
Lecture 3: Gaussian Elimination with Examples

1. Introduction to Solving Linear Equations

A system of linear equations is a collection of equations in which each equation is linear in


the unknown variables. Such systems arise frequently in linear programming when analyzing
the feasible set of solutions defined by constraints.

A system of linear equations can be written in matrix form as:

Ax = b

where:

A is the m × n coefficient matrix.


x is the column vector of n variables.
b is the m × 1 column vector of constants.

Objective: Solve for x such that Ax = b.

2. Gaussian Elimination: Overview

Gaussian elimination is an algorithmic method used to solve systems of linear equations by


systematically reducing the augmented matrix to a simpler form. The method involves two
main stages:

Forward Elimination: Transform the matrix into an upper triangular form.

Backward Substitution: Solve for the unknowns starting from the last equation.

The augmented matrix for a system Ax = b is written as:

[A∣b]

Gaussian elimination proceeds by performing row operations to simplify this matrix.

Permissible Row Operations:

1. Swap two rows.

2. Multiply a row by a nonzero scalar.

6/212
3. Add or subtract a multiple of one row from another.

3. Step-by-Step Algorithm

Step 1: Forward Elimination

Begin with the first row, and use it to eliminate the leading coefficient in all rows below.

Move to the next row and repeat the process for the next pivot column.

Continue until the matrix is in upper triangular form, where all elements below the main
diagonal are zero.

Step 2: Backward Substitution

Solve for the last variable using the last equation.

Substitute this value into the preceding equations to find other variables iteratively.

4. Example 1: Solving a 2 × 2 System

Solve:

2x1 + x2 = 5,
​ ​ 4x1 − 6x2 = −2.
​ ​

Step 1: Write the augmented matrix

2 1 5
[ ].
4 −6 −2
​ ​ ​

Step 2: Forward elimination

Eliminate the first element of the second row using the first row:
Replace Row 2 with Row 2 − 2 × Row 1.

2 1 5
[ ].
0 −8 −12
​ ​ ​

Step 3: Normalize the pivot

Divide Row 2 by −8:

[ ]
7/212
2 1 5
[ ].
0 1 1.5
​ ​ ​

Step 4: Backward substitution

Use Row 2 (x2 ​ = 1.5) in Row 1:

2x1 + 1(1.5) = 5
​ ⟹ x1 = 1.75.

Solution: x1 ​
= 1.75, x2 = 1.5. ​

5. Example 2: Solving a 3 × 3 System

Solve:

x1 + x2 + x3 = 6,
​ ​ ​
2x1 + 3x2 + 7x3 = 18,
​ ​ ​
x1 + 3x2 + x3 = 10.
​ ​ ​

Step 1: Write the augmented matrix

1 1 1 6
2 3 7 18 .
​ ​ ​ ​ ​ ​

1 3 1 10

Step 2: Forward elimination

Eliminate the first element of Rows 2 and 3 using Row 1:


Replace Row 2 with Row 2 − 2 × Row 1:
Replace Row 3 with Row 3 − Row 1:

1 1 1 6
0 1 5 6 .
​ ​ ​ ​ ​ ​

0 2 0 4
Eliminate the second element of Row 3 using Row 2:
Replace Row 3 with Row 3 − 2 × Row 2:

1 1 1 6
0 1 ​5 6 .​ ​ ​ ​ ​

0 0 −10 −8

Step 3: Backward substitution

Solve for x3 : −10x3


​ ​ = −8 ⟹ x3 = 0.8. ​

8/212
Substitute x3= 0.8 into Row 2:

x2 + 5(0.8) = 6 ⟹ x2 = 2.
​ ​

Substitute x2= 2 and x3 = 0.8 into Row 1:


​ ​

x1 + 2 + 0.8 = 6 ⟹ x1 = 3.2.
​ ​

Solution: x1 ​ = 3.2, x2 = 2, x3 = 0.8.


​ ​

6. Significance of Gaussian Elimination in Linear Programming

Gaussian elimination is a foundational technique for solving systems of linear equations,


which frequently arise in linear programming when analyzing equality constraints or
pivoting operations in the Simplex Method.

The method is computationally efficient and forms the basis for more advanced
numerical algorithms.

Gaussian elimination provides a systematic approach to solve linear systems, a critical step in
studying the feasible set of a linear program.

Lecture 4: Summary of Gaussian Elimination

1. Formalization of Gaussian Elimination

Gaussian elimination is a systematic process for solving a system of linear equations


represented in the form Ax = b, where:
A is an m × n matrix of coefficients.
x is a column vector of n variables.
b is a column vector of constants.

The method transforms the augmented matrix [A∣b] into a row-echelon form (or reduced
row-echelon form in some cases) to make solving for x straightforward.

2. Steps of Gaussian Elimination

9/212
Step 1: Augmented Matrix Representation
Write the system Ax = b as the augmented matrix [A∣b]:

a11 ​ a12 ​ ⋯ a1n ​ b1 ​

a21 ​
a22 ​
⋯ a2n ​
b2 ​

​ ​ ​ ​ ​ ​ ​ .
⋮ ⋮ ⋮ ⋮
am1 ​ am2 ​ ⋯ amn ​ bm ​

Step 2: Forward Elimination


The objective is to eliminate all entries below the diagonal in the augmented matrix by
applying row operations to convert the matrix into an upper triangular form.

Select the pivot element akk in the k -th row, where k represents the current step.

Scale the pivot row if necessary to normalize the pivot.

Use the pivot row to eliminate all entries below the pivot in the same column by
subtracting appropriate multiples of the pivot row from the rows below.

Step 3: Backward Substitution


Once the matrix is in upper triangular form, solve for the variables starting from the last
equation:

(last element in augmented column)


xn = .
(last diagonal element)
​ ​

Substitute the value of xn into preceding equations to find xn−1 , xn−2 , … , x1 .


​ ​ ​ ​

3. Classification of Outcomes

During Gaussian elimination, the following cases might arise:

Case 1: Unique Solution

If A is a square matrix (i.e., m = n) and all pivot elements are nonzero, the system has a
unique solution.

Example:
2 1 5
A=[ ], b = [ ].
1 3 7
​ ​ ​

Case 2: Infinite Solutions

10/212
If one or more rows of the reduced matrix become 0 = 0, the system is consistent but
has infinitely many solutions. This occurs when the rank of the matrix A is less than the
number of variables n.

Example:
x1 + x2 + x3 = 6,
​ ​ ​ 2x1 + 2x2 + 2x3 = 12.
​ ​ ​

Case 3: No Solution

If one or more rows reduce to a contradictory statement (e.g., 0 = c, where c =


 0), the
system is inconsistent and has no solution. This occurs when the augmented matrix
[A∣b] has a higher rank than A.
Example:
x1 + x2 = 3,
​ ​ x1 + x2 = 5.
​ ​

4. Pivoting Strategies

Partial Pivoting:

In partial pivoting, the row with the largest absolute value of the pivot element in the
current column is swapped with the current row.

This reduces numerical errors and ensures stability in computation.

Complete Pivoting:

In complete pivoting, the largest absolute value in the entire remaining submatrix is
selected as the pivot.

This is computationally more expensive but provides maximum numerical stability.

5. Efficiency of Gaussian Elimination

The time complexity of Gaussian elimination is O(n3 ), making it efficient for systems of
small to moderate size. However, for large systems, iterative methods such as Jacobi or
Gauss-Seidel, or matrix factorization methods like LU decomposition, are often preferred.

11/212
6. Summary of Gaussian Elimination

Key Points:

Gaussian elimination is a fundamental method for solving systems of linear equations.

It systematically reduces a matrix to upper triangular form using row operations.

The method handles consistent systems (unique or infinite solutions) and identifies
inconsistent systems (no solution).

Applications:
Gaussian elimination is integral to many areas of applied mathematics, including:

Analyzing feasible sets in linear programming.

Computing inverse matrices.

Solving network flow problems.

Understanding Gaussian elimination is a prerequisite for advanced techniques like LU


decomposition and the Simplex Method, which are foundational to optimization and linear
programming.

Lecture 5: Vector Space Over Real Numbers

1. Introduction to Vector Spaces

A vector space is a mathematical structure that allows us to generalize the concept of linear
combinations, which is central to understanding the solution sets of linear equations. It
consists of a set of objects called vectors, defined over a field (in this case, the field of real
numbers R), along with two operations: vector addition and scalar multiplication.

Vector spaces are essential in linear algebra as they provide a framework for discussing
linear independence, basis, dimension, and subspaces—all of which are important for
understanding the geometry of solutions to linear equations.

2. Definition of a Vector Space

A set V , along with two operations:

1. Addition: + : V × V → V (vector addition), and


2. Scalar Multiplication: ⋅ :R×V →V,

12/212
is called a vector space over the real numbers R if the following axioms are satisfied:

1. Closure under Addition:


For all u, v ∈V,u+v ∈V.
2. Associativity of Addition:
For all u, v, w ∈ V , (u + v) + w = u + (v + w).
3. Existence of Additive Identity:
There exists an element 0 ∈ V (the zero vector) such that u + 0 = u for all u ∈ V .
4. Existence of Additive Inverse:
For every u ∈ V , there exists −u ∈ V such that u + −u = 0.
5. Closure under Scalar Multiplication:
For all c ∈ R and u ∈ V , c ⋅ u ∈ V .
6. Distributive Property for Scalars:
For all c, d ∈ R and u ∈ V , (c + d) ⋅ u = c ⋅ u + d ⋅ u.
7. Distributive Property for Vectors:
For all c ∈ R and u, v ∈ V , c ⋅ (u + v) = c ⋅ u + c ⋅ v.
8. Compatibility of Scalar Multiplication:
For all c, d ∈ R and u ∈ V , (cd) ⋅ u = c ⋅ (d ⋅ u).
9. Multiplicative Identity:
For all u ∈ V , 1 ⋅ u = u.

3. Examples of Vector Spaces Over R

Example 1: Rn (Euclidean Space)


The set of all n-tuples of real numbers, Rn = {(x1 , x2 , … , xn ) ∣ xi ∈ R}, forms a vector
​ ​ ​ ​

space under:

Addition: Component-wise addition.


(x1 , x2 , … , xn ) + (y1 , y2 , … , yn ) = (x1 + y1 , x2 + y2 , … , xn + yn ).
​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​

Scalar Multiplication: Component-wise scaling.


c ⋅ (x1 , x2 , … , xn ) = (cx1 , cx2 , … , cxn ),
​ ​ ​ ​ ​ ​ c ∈ R.

Example 2: Set of Polynomials


The set Pn of all polynomials of degree at most n is a vector space under:

13/212
Polynomial addition.

Scalar multiplication of polynomials.

Example 3: Set of Continuous Functions


The set C[a, b] of all continuous real-valued functions defined on an interval [a, b] forms a
vector space under:

Function addition: (f + g)(x) = f (x) + g(x).


Scalar multiplication: (c ⋅ f )(x) = c ⋅ f (x).

Example 4: Null Space of a Matrix


The null space of a matrix A, defined as Null(A) = {x ∣ Ax = 0}, is a vector space
because it satisfies all the vector space axioms.

4. Non-Examples

1. Set of Integers Zn
Although Zn satisfies closure under addition, it does not satisfy closure under scalar
multiplication since multiplying an integer by a non-integer scalar results in a non-
integer.

2. Vectors Without Additive Identity


A set that does not include a zero vector cannot form a vector space.

5. Visual Representation

In R2 :

A vector space can be visualized as a collection of arrows (vectors) in a 2D plane


originating from the origin.

The operations of addition and scalar multiplication preserve the structure of the vector
space, i.e., all results remain within the space.

14/212
6. Key Properties of Vector Spaces

1. Uniqueness of Additive Identity:


The zero vector 0 is unique in any vector space.

2. Uniqueness of Additive Inverse:


For each u ∈ V , the additive inverse −u is unique.
3. Zero Scalar Multiplication:
For any u ∈ V , 0 ⋅ u = 0.
4. Zero Vector Multiplication:
For any c ∈ R, c ⋅ 0 = 0 .
5. Cancellation Law:
For u, v, w ∈ V , if u + w = v + w, then u = v.

7. Relevance to Linear Equations

Vector spaces provide the framework for:

1. Understanding solution sets to Ax = b:

Solutions form an affine subspace (shifted vector space).

2. Analyzing the null space, row space, and column space of matrices:

These spaces help characterize the structure of solutions to systems of linear


equations.

3. Formulating optimization problems in terms of subspaces.

Vector spaces are foundational to linear algebra and optimization, enabling precise
characterization of feasible regions, constraints, and objective functions in linear
programming.

Lecture 6: Linear Operators

1. Introduction to Linear Operators

A linear operator is a function that maps vectors from one vector space to another while
preserving the operations of vector addition and scalar multiplication. Matrices are
representations of linear operators when the vector spaces involved are finite-dimensional.

15/212
Given two vector spaces V and W over the field R, a function T : V → W is a linear
operator if:

1. T (u + v) = T (u) + T (v), for all u, v ∈ V .


2. T (cu) = cT (u), for all c ∈ R and u ∈ V .

For finite-dimensional vector spaces, such operators can be represented using matrices, with
the action of T expressed as matrix-vector multiplication:

T (x) = Ax,

where A is an m × n matrix, x is a vector in Rn , and T (x) lies in Rm .

2. Matrix Representation of Linear Operators

A matrix A is a representation of a linear operator and encodes how the operator transforms
basis vectors of the domain vector space Rn into vectors in the codomain Rm .

Given A as an m × n matrix:

a11 ​ a12 ​ ⋯ a1n ​

a21 ​ a22 ​ ⋯ a2n ​

A= ​ ​ ​ ​ ​ ​ ,
⋮ ⋮ ⋮
am1 ​ am2 ​ ⋯ amn ​

and x as a column vector in Rn :

x1 ​

x2 ​

x= ​ ​ ​
,

xn ​

the product Ax gives the transformed vector in Rm :

a11 x1 + a12 x2 + ⋯ + a1n xn


​ ​ ​ ​ ​ ​

a21 x1 + a22 x2 + ⋯ + a2n xn


​ ​ ​ ​ ​ ​

Ax = ​ ​
.

am1 x1 + am2 x2 + ⋯ + amn xn
​ ​ ​ ​ ​ ​

16/212
3. Key Properties of Matrices as Linear Operators

(a) Rank of a Matrix


The rank of a matrix A, denoted rank(A), is the dimension of the image (or column space)
of A. It is the maximum number of linearly independent columns of A, and equivalently, the
maximum number of linearly independent rows.

The rank of A indicates the number of dimensions in Rm that are "spanned" by the
transformed vectors.

Computed using Gaussian elimination by reducing A to row-echelon form.

(b) Kernel of a Matrix


The kernel (or null space) of a matrix A, denoted ker(A), is the set of all vectors x ∈ Rn
such that:

Ax = 0.

Geometrically, ker(A) represents the subspace of Rn that is "collapsed" to the zero


vector in Rm by A.

The dimension of the kernel is called the nullity of A.

Example:
Let

1 2
A=[ ].
3 6
​ ​

The kernel is found by solving Ax = 0:

1 2 x1 0
[ ][ ] = [ ].

3 6 x2 0
​ ​ ​ ​

−2
This simplifies to x1 + 2x2 = 0, so ker(A) = span ([ ]).
1
​ ​ ​

17/212
(c) Image of a Matrix
The image (or column space) of a matrix A, denoted im(A), is the span of the columns of A.
It is the set of all vectors in Rm that can be expressed as Ax for some x ∈ Rn :

im(A) = {y ∈ Rm ∣ y = Ax for some x ∈ Rn }.

The image represents the subspace of Rm that A "maps onto."

The dimension of the image is equal to the rank of A.

Example:
1
For the matrix A above, the image is spanned by [ ], as the second column is a scalar
3

multiple of the first. Thus,

1
im(A) = span ([ ]) .
3

4. Rank-Nullity Theorem

For an m × n matrix A, the rank-nullity theorem states:

rank(A) + nullity(A) = n,

where:

rank(A) is the dimension of the image of A.


nullity(A) is the dimension of the kernel of A.

This theorem provides a fundamental relationship between the row space, column space,
and null space of A.

5. Applications of Linear Operators

1. Geometry:
Linear operators describe transformations such as rotations, reflections, scalings, and
projections in vector spaces.

18/212
2. Linear Equations:
Solving systems of equations Ax = b involves understanding the rank, kernel, and
image of A.

3. Optimization:
Linear operators are crucial in defining constraints and objective functions in linear
programming.

4. Computer Graphics:
Matrices are used as linear operators for rendering, transforming, and manipulating
graphical objects.

6. Summary

A linear operator maps vectors between vector spaces while preserving linearity.

Matrices serve as representations of linear operators, with rank, kernel, and image as
key properties.

The rank-nullity theorem provides a critical link between the dimensions of the kernel,
image, and the domain.

Linear operators and their properties are foundational to linear algebra, forming the basis
for deeper analysis in optimization, numerical computation, and theoretical mathematics.

Lecture 7: Solution of Linear Equations

1. Introduction

The solution of a system of linear equations is central to linear algebra and optimization.
Given a matrix A ∈ Rm×n and vectors x ∈ Rn and b ∈ Rm , the system of equations can be
expressed in matrix form as:

Ax = b.

We analyze two key cases:

1. Homogeneous system: Ax = 0.
2. Non-homogeneous system: Ax = b, where b =
 0.

19/212
2. Homogeneous System (Ax = 0)
The homogeneous system always has at least one solution, x = 0 (trivial solution). The
complete solution depends on the rank of A and the dimension of the kernel (ker(A)).

Solution Structure:

If A has full column rank (rank(A) = n), the kernel is trivial (x = 0 only).
If rank(A) < n, there exist infinitely many solutions forming a vector subspace of Rn ,
with dimension equal to the nullity (n − rank(A)).

General Solution:
The general solution is given by:

x = c1 v 1 + c2 v 2 + ⋯ + ck v k ,
​ ​ ​ ​ ​ ​

where {v1 , v2 , … , vk } is a basis for ker(A), and c1 , c2 , … , ck


​ ​ ​ ​ ​ ​ ∈ R.
Example:
Let

1 2 3
A=[ ].
4 5 6
​ ​ ​

To solve Ax = 0, augment and reduce A:

1 2 3 1 2 3
[ ]→[ ].
4 5 6 0 −3 −6
​ ​ ​ ​ ​ ​

The solution is:

x1 = −2x3 ,
​ ​ x2 = x3 ,​ ​ x3 free.

The kernel is:

−2
ker(A) = span ​ 1 ​ ​ ​ ​
.
1

3. Non-Homogeneous System (Ax = b)


For the system Ax = b, a solution exists if and only if b lies in the image (im(A)).
Case Analysis:

20/212
1. Consistent system:
If b ∈ im(A), the system has at least one solution.
2. Inconsistent system:
If b ∈
/ im(A), the system has no solution.

General Solution:
When solutions exist, the general solution is the sum of:

1. A particular solution xp satisfying Axp ​ ​ = b.


2. The homogeneous solution xh , i.e., solutions to Ax ​ = 0.

Thus,

x = xp + xh . ​ ​

Example:
Let

1 2 3 14
A=[ ], b=[ ].
4 5 6 32
​ ​ ​ ​

Find a particular solution using Gaussian elimination:

1 2 3 14 1 2 3 14
[ ]→[ ].
4 5 6 32 0 −3 −6 −24
​ ​ ​ ​ ​ ​ ​ ​

From the second row: −3x2 − 6x3 = −24, or x2 + 2x3 = 8.


​ ​ ​ ​

Set x3 = t, then x2 = 8 − 2t, and x1 = −2x2 − 3x3 + 14 = −2(8 − 2t) − 3t + 14.


​ ​ ​ ​ ​

The particular solution is:

−2
xp = 8 ,
​ ​ ​ and the general solution is: x = xp + cv, ​

where v is the basis vector of ker(A).

4. Geometric Interpretation of Solutions

1. Homogeneous system: The solution set forms a subspace of Rn .

21/212
2. Non-homogeneous system: The solution set is a translation of the kernel subspace by
the particular solution xp .

For example:

For Ax = b with A as a 2 × 3 matrix, the solution set is a plane in R3 for consistent


systems.

5. Summary of Key Concepts

1. Homogeneous systems (Ax = 0):


Always have the trivial solution.

General solution is a subspace of Rn .

2. Non-homogeneous systems (Ax = b):


Have a solution if b ∈ im(A).
General solution is the sum of a particular solution and the homogeneous solution.

3. Rank-nullity theorem links the dimensions of the kernel and image to the total number
of variables n.

Understanding these principles is essential for analyzing linear systems, forming the
foundation for linear programming and optimization problems.

2/2

Lecture 8: Resource Allocation as Linear Programming (LP)

1. Introduction to Resource Allocation

Resource allocation involves distributing limited resources (e.g., time, money, raw materials)
to competing activities to achieve an optimal outcome, typically maximizing profit or
minimizing cost. Such problems are prevalent in industries like manufacturing, logistics,
finance, and healthcare.

A resource allocation problem can often be mathematically modeled as a linear program


(LP) since both the objective function and the constraints are linear.

22/212
2. General Formulation of Resource Allocation as LP

A resource allocation problem typically consists of:

1. Decision variables: Represent the quantities to be determined (e.g., units of products to


manufacture, resources to allocate).

2. Objective function: A linear function of the decision variables to be maximized or


minimized (e.g., profit, cost, time).

3. Constraints: Linear inequalities or equalities representing the limitations on resources


or other restrictions.

The general LP formulation is:

Maximize (or Minimize) z = c⊤ x,

subject to:

Ax ≤ b, x ≥ 0,

where:

x = [x1 , x2 , … , xn ]⊤ : vector of decision variables.


​ ​ ​

c: coefficients of the objective function (e.g., profit or cost per unit).


A: constraint matrix representing resource requirements.
b: available resources (right-hand side of constraints).

3. Example: Resource Allocation in Manufacturing

Problem Statement:
A factory produces two types of products, P1 and P2 , using three resources: labor hours,
​ ​

machine hours, and raw materials. Each unit of P1 and P2 consumes these resources
​ ​

differently. The goal is to maximize profit.

Data:

Each unit of P1 yields a profit of $50, and P2 yields $40.


​ ​

Resource constraints:

Labor hours available: 240 hours.

23/212
Machine hours available: 200 hours.

Raw material available: 120 units.

Resource usage per product:

Resource P1 Usage
​ P2 Usage
​ Availability

Labor hours 4 3 240

Machine hours 2 2 200

Raw material 1 2 120

Formulation as LP:
Define decision variables:

x1 = number of units of P1 produced,


​ ​ x2 = number of units of P2 produced.
​ ​

Objective Function: Maximize profit:

z = 50x1 + 40x2 . ​ ​

Constraints:

1. Labor hours:

4x1 + 3x2 ≤ 240.


​ ​

2. Machine hours:

2x1 + 2x2 ≤ 200.


​ ​

3. Raw materials:

x1 + 2x2 ≤ 120.
​ ​

4. Non-negativity:

x1 , x2 ≥ 0.
​ ​

The LP formulation is:

Maximize z = 50x1 + 40x2 , ​ ​

subject to:

24/212
4x1 + 3x2
​ ​ ≤ 240,
2x1 + 2x2
​ ​ ≤ 200,
x1 + 2x2 ≤ 120,
​ ​

​ ​

x1 , x2 ​ ​ ≥ 0.

4. Geometric Representation

The feasible region of the LP is a convex polytope in R2 , defined by the intersection of the
half-spaces given by the constraints. The optimal solution lies at a vertex of this polytope.

For example:

Plot the constraints on a graph with x1 and x2 axes.


​ ​

Identify the region satisfying all constraints.

Evaluate the objective function z at each vertex of the feasible region to find the
maximum value.

5. Applications of Resource Allocation in Various Industries

1. Manufacturing: Optimizing production schedules to maximize profit or minimize costs


under labor, machine, and material constraints.

2. Logistics: Allocating vehicles or routes to minimize delivery costs or maximize efficiency.

3. Finance: Portfolio optimization to maximize returns while limiting risk.

4. Healthcare: Allocating medical staff or resources to maximize patient care.

6. Advantages of Using Linear Programming for Resource Allocation

1. Optimality: LP guarantees finding the best solution under given constraints if one exists.

2. Scalability: LP solvers can handle large-scale problems with many variables and
constraints.

3. Versatility: LP can model a wide range of real-world resource allocation problems.

25/212
4. Interpretability: Solutions are mathematically rigorous and provide insight into the
trade-offs between competing activities.

7. Summary

Resource allocation problems involve distributing limited resources to maximize or


minimize an objective.

These problems can be formulated as LPs, characterized by a linear objective function


and linear constraints.

Solving such LPs helps industries achieve optimal resource usage efficiently.

LP's geometric representation and solution methods (e.g., graphical, simplex algorithm)
provide systematic approaches for solving real-world problems.

By generalizing resource allocation as a linear program, we establish a robust framework


applicable to diverse fields, laying the foundation for more advanced methods in
optimization.

Lecture 9: Approximate Degree as Linear Programming (LP)

1. Introduction to Boolean Functions and Polynomials

A Boolean function is a function that takes binary inputs (i.e., inputs from {0, 1}n ) and
outputs a value from {0, 1}. These functions are of central importance in areas such as logic
circuits, computational complexity, and optimization.

A polynomial representation of a Boolean function is a polynomial over the finite field


F2 = {0, 1}, where the polynomial evaluates to either 0 or 1 for all input vectors from

{0, 1}n . For a Boolean function f : {0, 1}n → {0, 1}, its polynomial representation is a
polynomial Pf (x1 , x2 , … , xn ) over F2 , where Pf can be written as a sum of monomials:
​ ​ ​ ​ ​ ​

P f (x 1 , x 2 , … , x n ) = ∑ αS ∏ x i .
​ ​ ​ ​ ​ ​ ​ ​

S⊆[n] i∈S

Here, αS ∈ F2 are coefficients, and each term ∏i∈S xi is a monomial. The polynomial
​ ​ ​ ​

evaluates to f (x1 , x2 , … , xn ) for all possible inputs (x1 , x2 , … , xn ).


​ ​ ​ ​ ​ ​

For example, the Boolean function f (x1 , x2 ) ​ ​


= x1 ⊕ x2 (the XOR of x1 and x2 ) can be
​ ​ ​ ​

represented by the polynomial:

26/212
P f (x 1 , x 2 ) = x 1 + x 2 .
​ ​ ​ ​ ​

2. Approximate Degree of a Boolean Function

The approximate degree of a Boolean function f is the degree of the lowest-degree


polynomial that approximates f in a certain sense. This is defined as the smallest degree of a
polynomial P over F2 such that the difference between f (x) and P (x) is small in some

sense.

Formally, the approximate degree degϵ (f ) is defined as the smallest degree of a polynomial

P for which:

Pr [f (x) 
= P (x)] ≤ ϵ,

x∈{0,1}n

where ϵ is a small error probability. In other words, the polynomial P should agree with f on
most inputs (with a probability of at least 1 − ϵ).

3. Context and Motivation

Why do we care about the approximate degree of a Boolean function?

1. Computational Complexity: Approximate degree gives insight into the minimal


complexity required to approximate a Boolean function using polynomials, which is
useful in circuit complexity and query complexity.

2. Optimization Problems: Approximate degree is used in optimization algorithms to


determine how well a function can be approximated, impacting performance guarantees
and efficiency.

In this lecture, we will define a linear program (LP) to find the approximate degree of a
Boolean function, which is a more involved example of linear programming.

4. Formulating Approximate Degree as an LP

To compute the approximate degree of a Boolean function, we need to find the lowest-
degree polynomial that approximates the function within an error tolerance ϵ. This can be
framed as a linear program in the following way:

1. Decision Variables:
We will introduce decision variables for the coefficients of the polynomial. If
f (x1 , x2 , … , xn ) is the Boolean function, we aim to find a polynomial of the form:
​ ​ ​

P (x1 , x2 , … , xn ) = ∑ αS ∏ xi ,
​ ​ ​ ​ ​ ​ ​

S⊆[n] i∈S

27/212
where the coefficients αS are our decision variables.

2. Objective Function:
The objective is to minimize the degree of the polynomial, subject to constraints that the
polynomial approximates the Boolean function f within an error of ϵ. Thus, we are
minimizing the maximum degree of the polynomial such that the approximation error is
bounded by ϵ.

3. Constraints:
For each possible input x = (x1 , x2 , … , xn ) ∈ {0, 1}n , we need the polynomial P (x)
​ ​ ​

to be close to the value of f (x). This leads to constraints of the form:

∣f (x) − P (x)∣ ≤ ϵ,

∈ {0, 1}n . These constraints ensure that the polynomial P is a good


for all x
approximation of f with a small error.

4. Linearization of Constraints:
Since f (x) is a Boolean function and P (x) is a polynomial, the constraint ∣f (x) −
P (x)∣ ≤ ϵ is not linear. However, this can be approximated by linear constraints in the
LP formulation by considering a piecewise linear representation of the absolute
difference and bounding it appropriately.

5. Example of Approximate Degree Calculation

Let f (x1 , x2 )
​ = x1 ⊕ x2 . We want to approximate this function using a polynomial of
​ ​ ​

minimal degree. A candidate polynomial could be P (x1 , x2 ) = x1 + x2 , which agrees with


​ ​ ​ ​

f (x1 , x2 ) on all inputs.


​ ​

For a more complicated Boolean function, say a function with higher complexity or more
variables, the LP formulation will find the polynomial with the smallest degree that
approximates the function within the given error bounds. The LP solver would minimize the
degree while ensuring the constraints are satisfied.

6. Applications of Approximate Degree

The approximate degree of a Boolean function has significant applications in various


domains:

1. Circuit Complexity: Understanding the approximate degree of a Boolean function helps


in determining the complexity of Boolean circuits that approximate the function.

2. Query Complexity: In decision tree and query complexity, the approximate degree
provides bounds on how efficiently a function can be approximated by a decision tree.

28/212
3. Learning Theory: Approximate degree is used to characterize the complexity of learning
Boolean functions in the framework of learning theory.

7. Summary

A Boolean function can be represented by a polynomial over F2 . ​

The approximate degree of a Boolean function is the degree of the lowest-degree


polynomial that approximates the function with an error probability ϵ.

This problem can be formulated as a linear program by introducing decision variables


for the polynomial coefficients and imposing constraints to ensure the approximation
error is within ϵ.

Linear programming offers a structured approach to solving optimization problems like


the calculation of approximate degree, providing a powerful tool for both theoretical and
applied research in areas like circuit complexity and learning theory.

This framework generalizes the concept of polynomial approximations to Boolean functions


and uses linear programming to find efficient solutions.

Lecture 10: Equivalent Linear Programs (LPs)

1. Introduction to Equivalent LPs

In linear programming (LP), problems can often be presented in various forms, with different
types of constraints and objective functions. A critical concept in LP is that many different
formulations of a problem can be equivalent, meaning that they represent the same
optimization problem and yield the same solution, but the structure of the problem may
differ.

It is often useful to convert LPs into a "standard form" because it allows us to streamline
solution methods (e.g., the simplex algorithm) and avoid dealing with multiple cases for
different types of constraints. This standardization makes solving LPs easier and more
systematic.

In this lecture, we will explore several methods for converting LPs into equivalent forms and
discuss why these transformations are important. By doing so, we will ensure that we can
always work with a "canonical" form of LP that is easier to manipulate and solve.

2. Standard Forms of LP

There are several common forms for representing an LP, with the most widely used being:

Canonical (or Standard) Form: The problem is written as:

29/212
Maximize c⊤ x

subject to:

Ax = b, x ≥ 0.

In this form, all the constraints are equalities and all variables are non-negative.

Inequality Form (Primal form): The problem is written as:

Maximize c⊤ x

subject to:

Ax ≤ b, x ≥ 0.

In this form, the constraints are in the form of inequalities, with non-negative
variables.

Dual Form: The dual of an LP can also be represented in a standard form, which is
derived by associating the primal problem's variables with constraints in the dual
problem. We will revisit this duality in later lectures.

3. Converting Between Inequalities and Equalities

One of the most common types of transformation involves converting inequality constraints
into equality constraints. This is important because the simplex method, for example, works
with equality constraints.

3.1 Converting "Less Than or Equal" to Equalities

Consider an inequality constraint of the form:

Ax ≤ b.

To convert this into an equality, we introduce a slack variable s ≥ 0 such that:

Ax + s = b, s ≥ 0.

The vector s represents the "slack" or unused portion of the resource represented by the
constraint. This ensures that the equation holds while keeping the variable s non-negative.

Example:

30/212
For the constraint 2x1 ​ + 3x2 ≤ 10, introduce a slack variable s ≥ 0 and rewrite the

constraint as:

2x1 + 3x2 + s = 10,


​ ​ s ≥ 0.

3.2 Converting "Greater Than or Equal" to Equalities

Consider a constraint of the form:

Ax ≥ b.

To convert this into an equality, we introduce a surplus variable t ≥ 0 such that:

Ax − t = b, t ≥ 0.

The surplus variable t represents the amount by which the left-hand side exceeds the
required bound.

Example:

For the constraint 2x1 ​


+ 3x2 ≥ 10, introduce a surplus variable t ≥ 0 and rewrite the

constraint as:

2x1 + 3x2 − t = 10,


​ ​
t ≥ 0.

4. Converting Variable Constraints to Non-Negative Variables

Another standard transformation is to ensure that all decision variables are non-negative. If
a variable xi is constrained by:

xi ∈ R,

i.e., xi can take any real value, we can rewrite it as two non-negative variables:

xi = x+


i − xi , ​ ​ x+ −
i , xi ≥ 0.
​ ​

This transformation allows us to work exclusively with non-negative variables, which is a


standard requirement in many solution methods.

Example:

For the variable x1 ​ ∈ R, we introduce x+ −


1 and x1 as non-negative variables:
​ ​

x1 = x+


1 − x1 , ​ ​ x+ −
1 , x1 ≥ 0.
​ ​

5. Standard Form for Maximization and Minimization

31/212
The canonical form described earlier applies to maximization problems. However, we may
encounter minimization problems. In this case, we can convert a minimization problem into
a maximization problem by multiplying the objective function by −1.

For example, if we have:

Minimize c⊤ x,

we can rewrite it as:

Maximize − c⊤ x.

6. Summary of Transformations

Less than or equal to constraints can be converted into equalities by introducing slack
variables.

Greater than or equal to constraints can be converted into equalities by introducing


surplus variables.

Variables with no sign restrictions can be converted into non-negative variables by


splitting them into positive and negative components.

Minimization problems can be transformed into maximization problems by negating


the objective function.

7. Conclusion

By converting an LP into an equivalent form, we can simplify the problem and apply standard
solution methods like the simplex algorithm more easily. The ability to manipulate the form
of an LP is essential for solving problems efficiently and is a powerful tool in both theory and
practice. By ensuring all constraints are in equality form and all variables are non-negative,
we can work with a "canonical" LP form that standardizes the problem and reduces
complexity.

These transformations not only streamline the computational process but also help in
understanding the relationships between different types of LP formulations, ensuring
consistency in optimization solutions.

Lecture 11: Introduction to Convexity

1. Introduction to Convexity in Linear Programming

In linear programming (LP), constraints such as non-negativity (x ≥ 0) restrict the feasible


region to certain subsets of the space. This constraint forces us to consider combinations of

32/212
feasible solutions, where the coefficients are non-negative. These combinations, and the
geometric properties of the resulting feasible regions, give rise to the theory of convexity.

In this lecture, we will introduce fundamental concepts related to convexity, starting with
linear combinations, affine combinations, and conic combinations. These ideas will lead to
the more specific notion of convex combinations, which play a central role in optimization
and linear programming.

2. Linear Combinations

A linear combination of vectors is a sum of scalar multiples of those vectors. Given vectors
v1 , v2 , … , vk ∈ Rn , a linear combination of these vectors is expressed as:
​ ​ ​

k
v = ∑ αi v i , ​ ​ ​

i=1

where α1 , α2 , … , αk are scalars, called the coefficients. Importantly, the coefficients can be
​ ​ ​

any real numbers, not necessarily non-negative.

1 0
Example: Given vectors v1 = ( ) and v2 = ( ), a linear combination of these vectors
0 1
​ ​ ​ ​

could be:

1 0 3
v = 3 ( ) + (−2) ( ) = ( ) .
0 1 −2
​ ​ ​

3. Affine Combinations

An affine combination of vectors is similar to a linear combination but with the additional
constraint that the coefficients sum to 1. Formally, given vectors v1 , v2 , … , vk ​ ​ ​ ∈ Rn , an
affine combination is expressed as:

k
v = ∑ αi v i , ​ ​ ​

i=1

where the coefficients satisfy the condition:

k
∑ αi = 1.
​ ​

i=1

Affine combinations allow us to express points that lie on the affine span of the vectors,
which is the flat affine subspace formed by the given vectors.

( ) ( )
33/212
1 0
Example: For vectors v1 = ( ) and v2 = ( ), an affine combination with α1 + α2 = 1
0 1
​ ​ ​ ​ ​ ​

could be:

1 0 0.4
v = 0.4 ( ) + 0.6 ( ) = ( ) .
0 1 0.6
​ ​ ​

This point lies on the line segment connecting v1 and v2 , illustrating that affine ​ ​

combinations preserve the geometric relationships between points.

4. Conic Combinations

A conic combination is a special type of linear combination where the coefficients are non-
negative. In other words, given vectors v1 , v2 , … , vk ​ ​ ​ ∈ Rn , a conic combination is
expressed as:

k
v = ∑ αi v i , ​ ​ ​

i=1

where αi ≥ 0 for all i. Conic combinations are used to describe points in a cone, a set that is
closed under linear combinations with non-negative coefficients.
1 0
Example: Given the vectors v1 = ( ) and v2 = ( ), a conic combination could be:
0 1
​ ​ ​ ​

1 0 0.5
v = 0.5 ( ) + 0.3 ( ) = ( ) .
0 1 0.3
​ ​ ​

Here, both coefficients are non-negative, indicating that the point lies within the cone
formed by the vectors v1 and v2 .
​ ​

5. Convex Combinations

A convex combination is a particular type of conic combination in which the coefficients are
not only non-negative, but they also sum to 1. Formally, given vectors v1 , v2 , … , vk ​ ​ ​ ∈ Rn ,
a convex combination is given by:

k
v = ∑ αi v i , ​ ​ ​

i=1

where the coefficients satisfy the conditions:

k
αi ≥ 0
​ for all i, and ∑ αi = 1.​ ​

i=1

34/212
Convex combinations are particularly important in the context of optimization because they
allow us to describe points inside a convex set, which is central to the theory of convex
optimization.
1 0
Example: For the vectors v1 = ( ) and v2 = ( ), a convex combination could be:
0 1
​ ​ ​ ​

1 0 0.3
v = 0.3 ( ) + 0.7 ( ) = ( ) .
0 1 0.7
​ ​ ​

This point lies inside the triangle formed by the vectors v1 and v2 , representing a convex
​ ​

combination of those points.

6. Convex Sets

A set C is called convex if, for any two points x, y ∈ C , the entire line segment connecting
them lies entirely within the set. In other words, for all t ∈ [0, 1], the point tx + (1 − t)y
must also lie in C .

Mathematically, this is expressed as:

If x, y ∈ C, then tx + (1 − t)y ∈ C for all t ∈ [0, 1].

1
Example: Consider the set C of points inside a triangle with vertices at v1 = ( ) , v2 =
0
​ ​ ​

0 0
( ) , v3 = ( ). Any convex combination of these points lies within the triangle, hence the
1 0
​ ​ ​

triangle is a convex set.

7. Convexity in Linear Programming

The concept of convex combinations and convex sets is fundamental in linear


programming. The feasible region of an LP, defined by linear constraints, is always a convex
set. This property ensures that local optima are global optima in convex optimization
problems, making LPs particularly efficient to solve.

8. Summary

A linear combination allows for arbitrary real coefficients.

An affine combination restricts the sum of coefficients to 1.

A conic combination involves non-negative coefficients.

35/212
A convex combination is a conic combination where the coefficients also sum to 1.

A convex set is one where the line segment between any two points in the set lies
entirely within the set.

The theory of convexity is central to optimization problems, particularly in linear


programming, where feasible regions are convex, ensuring the efficiency and reliability of
solution methods.

Lecture 12: Different Kinds of Convex Sets

1. Introduction to Convex Sets

In this lecture, we will explore different types of convex sets, affine sets, and conic sets
through concrete examples, starting with the case in two dimensions. Understanding these
sets is crucial for visualizing how convexity applies to optimization problems in linear
programming. Convex sets are important because they form the feasible regions for
optimization problems, and convex combinations of points in these sets are also contained
within the set.

We will define each of these sets and provide visual examples in 2D. Later, we will generalize
the concepts to higher dimensions, which is often necessary in optimization problems.

2. Convex Sets

A set C ⊂ Rn is called convex if, for any two points x, y ∈ C , the entire line segment
connecting these two points lies entirely within the set. This is mathematically defined as:

∀x, y ∈ C, ∀t ∈ [0, 1], tx + (1 − t)y ∈ C.

This property implies that if you pick any two points in the set, the points between them,
formed by convex combinations, will also belong to the set.

Example in 2D:

Consider the set C defined by a disk in R2 with center (0, 0) and radius 1. This disk is a
convex set because any line segment between two points inside the disk will remain inside
the disk.

In mathematical terms, the disk can be described as:

2 2 2

36/212
C = {(x, y) ∈ R2 : x2 + y 2 ≤ 1}.

For any two points x = (x1 , y1 ) and y = (x2 , y2 ) inside the disk, and any t ∈ [0, 1], the
​ ​ ​ ​

point tx + (1 − t)y will also lie inside the disk.

Example in 3D:

A sphere in R3 , like the disk in 2D, is a convex set. If C is the set of points inside a sphere of
radius r centered at the origin, i.e.,

C = {(x, y, z) ∈ R3 : x2 + y 2 + z 2 ≤ r2 },

then the set of all points inside the sphere forms a convex set. Any two points inside the
sphere have their connecting line segment entirely inside the sphere.

3. Affine Sets

An affine set is a set that can be expressed as an affine combination of points. More
formally, a set A is affine if for any two points x, y ∈ A and any scalar t ∈ R, the point
tx + (1 − t)y lies in A.
An affine set does not necessarily have to be convex. However, the line segment between two
points in an affine set always lies within the set, as affine combinations of points preserve
the structure of the set.

Example in 2D:

Consider the set of points forming a line in R2 . A line is an affine set because any point on
the line can be expressed as an affine combination of two points on the line. For example,
the set L defined by the equation:

L = {(x, y) : y = 2x + 1}

is affine. The line is not convex (in the context of LPs), but any affine combination of points
on the line will also lie on the line.

Example in 3D:

The set of points forming a plane in R3 is affine. For example, the set P defined by the
equation:

P = {(x, y, z) : 2x + 3y − z = 5}

37/212
is affine. Any affine combination of points on this plane will also lie on the plane.

4. Conic Sets

A conic set is a set that is closed under non-negative scalar multiplication. In other words,
if x ∈ C and α ≥ 0, then αx ∈ C . This set is important because conic combinations, or
sums of non-negative multiples of vectors, describe the geometry of cones.

Example in 2D:

Consider the set C formed by the positive orthant in R2 , which includes all points with non-
negative coordinates. This set can be described as:

C = {(x, y) ∈ R2 : x ≥ 0, y ≥ 0}.

This is a conic set because any point in the positive orthant can be scaled by a non-negative
scalar and still remain in the set.

Example in 3D:

In R3 , the positive octant is the set of all points with non-negative coordinates:

C = {(x, y, z) ∈ R3 : x ≥ 0, y ≥ 0, z ≥ 0}.

This set is conic because any point within the positive octant, when scaled by a non-negative
scalar, will remain within the octant.

5. Visualizing Convex, Affine, and Conic Sets

In 2D, convex sets can be easily visualized. For example:

Convex Set: A circle or ellipse in R2 .

Affine Set: A straight line or any flat plane (a line segment).

Conic Set: The positive quadrant of the plane or a cone.

These sets can be extended into higher dimensions, although visualization becomes
increasingly difficult beyond two or three dimensions. However, the principles remain the
same:

38/212
Convexity ensures that the entire line segment between any two points within the set
lies inside the set.

Affine sets are formed by affine combinations of points and allow for geometric
representations like lines and planes.

Conic sets include points that can be scaled by non-negative coefficients.

6. Conclusion

Convex sets are central to optimization problems because they ensure that any linear
combination of points inside the set will remain inside the set, allowing efficient
optimization.

Affine sets include linear structures like lines and planes, and they provide important
geometric properties in higher-dimensional spaces.

Conic sets allow scaling by non-negative factors, which is important in many areas of
optimization and linear programming.

Understanding these basic geometric concepts is essential for working with linear programs
and convex optimization problems. They form the foundation for studying more advanced
topics in optimization theory.

Lecture 13: Feasible Region of Linear Programming

1. Introduction to the Feasible Region

The feasible region of a linear programming (LP) problem is the set of all points that satisfy
the system of constraints. These constraints are typically expressed as linear equalities and
inequalities, and the feasible region represents the space of all possible solutions that
satisfy these conditions. The geometric structure of the feasible region is crucial in
understanding the behavior of the linear program and the optimization process.

In this lecture, we will explore the geometric properties of the feasible region, which is
generally described by a set of linear inequalities and equations. This region can be
represented by polygons (in two dimensions) or polytopes (in higher dimensions), and we
will examine their characteristics, including how they are formed, their dimensional
properties, and the significance of their vertices in the context of optimization.

39/212
2. Defining the Feasible Region

Consider a standard linear programming problem in the form:

Maximize cT x

subject to:

Ax ≤ b,

x ≥ 0,

where:

x ∈ Rn is the vector of decision variables,


A ∈ Rm×n is the matrix of coefficients for the inequalities,
b ∈ Rm is the vector of constants in the inequalities,
c ∈ Rn is the vector of coefficients for the objective function.

The feasible region F is the set of all points x ∈ Rn that satisfy the system of constraints:

F = {x ∈ Rn : Ax ≤ b, x ≥ 0}.

This region is defined by a set of linear inequalities and the non-negativity constraints.

3. Geometric Interpretation of Constraints

Each constraint in the linear programming problem can be interpreted geometrically:

A linear inequality such as a1 x1


​ ​ + a2 x2 ≤ b represents a half-space in R2 (or a half-
​ ​

space in higher dimensions for more variables).

The non-negativity constraints xi ≥ 0 represent the portion of the space where all

variables are non-negative, i.e., the first orthant in Rn .

The feasible region is the intersection of all these half-spaces (and orthants), forming a
region where all constraints are satisfied simultaneously.

40/212
4. Polygons and Polytopes

The feasible region in a linear program often forms a polygon (in two dimensions) or a
polytope (in higher dimensions). A polytope is a geometric object with flat sides, defined as
the convex hull of a finite set of points (vertices), and it can be described by a system of linear
inequalities.

In R2 , the feasible region is often a polygon. A polygon is a closed, bounded region


defined by a finite number of straight edges (linear inequalities).

In Rn (with n ≥ 3), the feasible region is a polytope, a generalization of a polygon to


higher dimensions.

Examples:

1. In R2 : Consider a system of two linear inequalities:

x1 + x2 ≤ 4,
​ ​ x1 ≥ 0,
​ x2 ≥ 0.

The feasible region is a triangle with vertices at (0, 0), (4, 0), and (0, 4). This is a
polygon, and the intersection of the inequalities forms a bounded region.

2. In R3 : Consider the system:

x1 + x2 + x3 ≤ 6,
​ ​ ​ x1 ≥ 0,
​ x2 ≥ 0,
​ x3 ≥ 0.

The feasible region is a tetrahedron (a 3D polytope) with vertices at (0, 0, 0), (6, 0, 0),
(0, 6, 0), and (0, 0, 6).

5. Properties of Polygons and Polytopes

The geometric properties of the feasible region are important in understanding how to solve
linear programs efficiently. These properties include:

Vertices: The points at which the edges or faces of the feasible region meet. In linear
programming, the optimal solution (if one exists) is always found at one of these
vertices.

Edges: In two dimensions, the boundaries of the feasible region are line segments. In
higher dimensions, they become faces and facets, which are generalizations of edges.

Convexity: The feasible region is always convex, meaning that for any two points in the
feasible region, the line segment connecting them lies entirely within the region. This

41/212
property is crucial because linear programming problems can be solved by searching for
the optimal solution along the boundary of the feasible region, which is convex.

6. Understanding the Geometry in Higher Dimensions

In higher dimensions (Rn , where n ≥ 3), the feasible region is a polytope. A polytope is a
geometric object that generalizes the concept of a polygon (in 2D) and a polyhedron (in 3D)
to any number of dimensions. A polytope is characterized by:

A finite number of vertices.

A finite number of edges (in higher dimensions, faces, facets, etc.).

The set of linear inequalities that define it, which correspond to the hyperplanes that
define the boundaries of the polytope.

For example, in R3 , the feasible region might be a convex polyhedron, which could have
polygonal faces and edges. In R4 , the feasible region could be a 3-dimensional polytope, but
it is difficult to visualize directly.

7. Role of the Feasible Region in Linear Programming

The feasible region is central to the solution of a linear program:

Optimization: The objective function cT x is maximized or minimized over the feasible


region. Linear programming algorithms (such as the Simplex method or interior-point
methods) explore the boundary of this region to find the optimal solution.

Vertices and Optimality: Since the feasible region is convex, if an optimal solution exists,
it will be located at one of the vertices of the polytope. This property allows for efficient
solution algorithms by focusing on the vertices rather than exploring the entire region.

8. Conclusion

42/212
The feasible region of an LP is defined by a system of linear inequalities and is always a
convex set.

In two dimensions, the feasible region is often a polygon, and in higher dimensions, it is
a polytope.

The feasible region is bounded by linear constraints, and its properties (such as
convexity, vertices, edges, and facets) are crucial for solving the optimization problem.

The vertices of the feasible region play a key role in finding the optimal solution to the
linear programming problem.

By understanding the geometric structure of the feasible region, we gain insight into the
nature of linear programming problems and can apply efficient algorithms to solve them.

Lecture 14: Proof of Weyl's Theorem

1. Introduction to Weyl's Theorem

Weyl’s Theorem establishes an important equivalence between two different representations


of convex cones. Specifically, it demonstrates that a convex cone can be described both:

By a set of combinations: A cone is the set of all non-negative linear combinations of a


finite set of vectors (a generating set).

By a set of constraints: A cone is also the set of points satisfying a system of linear
inequalities that define its boundary and structure.

This theorem shows that both representations are equivalent in the sense that they describe
the same geometric object. Weyl's Theorem is particularly important in the study of convex
geometry and linear programming, as it allows us to switch between different
characterizations of convex sets and simplifies the process of analyzing and solving
optimization problems.

We will now provide a proof of Weyl's Theorem and examine the implications of these
representations for convex cones.

2. Convex Cones: Basic Definitions

A cone C ⊂ Rn is a subset of Rn such that:

43/212
1. If x ∈ C , then αx ∈ C for all α ≥ 0. This means that the set is closed under non-
negative scalar multiplication.

2. A convex cone is a cone that is also convex. That is, for any two points x, y ∈ C , the
entire line segment joining them lies within the cone. Mathematically, for t ∈ [0, 1], the
point tx + (1 − t)y ∈ C .

A polyhedral cone is a cone that can be described by a finite set of linear inequalities. It is of
particular interest in linear programming.

We now describe two ways to represent a convex cone:

1. Combinatorial Representation: A cone C can be expressed as the set of all non-


negative linear combinations of a finite set of vectors. Specifically, if C is generated by
vectors v1 , … , vk , then:
​ ​

C = {α1 v1 + ⋯ + αk vk : αi ≥ 0 for all i}.


​ ​ ​ ​ ​

This is called the generating set or span of the cone.

2. Constraint Representation: A cone can also be described by a set of linear inequalities,


such that it consists of all points x satisfying:

Ax ≥ 0,

where A ∈ Rm×n is a matrix of constraints that define the cone’s boundaries.

3. Weyl’s Theorem: Statement

Weyl’s Theorem states that these two representations of a convex cone are equivalent.
Specifically, it asserts:

Theorem (Weyl's Theorem): For a convex cone C ⊂ Rn , the following two statements are
equivalent:

1. C can be described as the set of all non-negative linear combinations of a finite set of
vectors v1 , … , vk , i.e.,
​ ​

C = {α1 v1 + ⋯ + αk vk : αi ≥ 0 for all i}.


​ ​ ​ ​ ​

2. There exists a matrix A ∈ Rm×n such that C is the set of solutions to the system of
inequalities Ax ≥ 0.

44/212
This equivalence allows us to move from a description of a cone via its generators
(combinations of vectors) to a description via linear inequalities (constraints), and vice versa.

4. Proof of Weyl’s Theorem

We will prove the equivalence between the two representations of a convex cone in two
steps.

Step 1: Combinatorial Representation Implies Constraint Representation

Assume that C is a convex cone generated by the vectors v1 , … , vk . That is, C ​ ​ = {α1 v1 +
​ ​

⋯ + αk vk : αi ≥ 0}. We aim to show that this cone can be represented by a system of


​ ​ ​

linear inequalities.

Consider the following matrix A ∈ Rk×n , where each row corresponds to one of the
generating vectors:

v1T ​

v2T ​

A= ​ ​ ​ .

vkT ​

Let x ∈ Rn . If x ∈ C , then x can be written as a non-negative combination of the


generators:

x = α1 v 1 + ⋯ + αk v k ,
​ ​ ​ ​ αi ≥ 0.

This implies that:

v1T x ​

v2T x ​

Ax = ​ ​ ​ ≥ 0.

vkT x ​

Thus, x satisfies the system of linear inequalities Ax ≥ 0, which represents the cone C .
Step 2: Constraint Representation Implies Combinatorial Representation

Next, assume that C is described by the system of inequalities Ax ≥ 0, where A ∈ Rm×n .


We aim to show that C can be described as the set of non-negative linear combinations of a
finite set of vectors.

45/212
The solution set of the inequalities Ax ≥ 0 defines a convex polyhedral cone. By the
fundamental theorem of convex polyhedra, the cone is the convex hull of its extreme rays.
Each extreme ray can be described as a non-negative linear combination of the rows of A,
and these rows (or their scalar multiples) form the generators of the cone.

Thus, the cone C can be written as:

C = {α1 v1 + ⋯ + αk vk : αi ≥ 0},
​ ​ ​ ​ ​

where v1 , … , vk are the generating vectors corresponding to the extreme rays of the
​ ​

polyhedral cone.

5. Conclusion

Weyl's Theorem provides a powerful result connecting two different representations of


convex cones. It states that a convex cone can be described either by a set of generators (as
non-negative linear combinations of a finite set of vectors) or by a system of linear
inequalities (constraints). This equivalence simplifies the study of convex geometry and is
central to many optimization techniques, such as linear programming, where convex cones
often define the feasible regions of optimization problems.

Lecture 15: Definition of Convex Functions

1. Introduction to Convex Functions

In the study of optimization, after understanding the structure of the feasible region of a
linear program, we shift our focus to the objective function. In linear programming, the
objective function is typically a linear function, which is both convex and concave. However,
in more general optimization problems, the objective function can be non-linear. To handle
these cases, we need to introduce the concept of convex functions, which play a central role
in optimization theory.

Convex functions are widely studied in optimization due to their desirable properties that
allow for efficient solution methods. The key property of convex functions is that local
minima are also global minima, which is essential for ensuring the optimality of solutions.
This property makes convex optimization problems more tractable and ensures that
algorithms like gradient descent and interior-point methods converge to the global optimum
under certain conditions.

46/212
In this lecture, we will formally define convex functions, explore their properties, and discuss
their significance in optimization.

2. Definition of Convex Function

A function f : Rn → R is convex if, for any two points x, y ∈ Rn and for any t ∈ [0, 1], the
following inequality holds:

f (tx + (1 − t)y) ≤ tf (x) + (1 − t)f (y).

This condition means that the function value at any point on the line segment joining x and
y is less than or equal to the weighted average of the function values at x and y.
Geometrically, this means that the graph of a convex function lies below the straight line
joining any two points on the graph, i.e., the function is "bent" upwards or "concave up."

2.1 Convex Set vs Convex Function

It is important to distinguish between a convex set and a convex function:

A convex set is a set of points such that for any two points in the set, the entire line
segment joining them lies entirely within the set.

A convex function is a function that satisfies the convexity condition defined above,
which ensures that the function has a global minimum (if the domain is convex) and that
any local minimum is a global minimum.

3. Convexity in Optimization

Convex functions are important in optimization because they guarantee that any local
minimum is also the global minimum. This is a crucial property for optimization algorithms,
as it ensures that methods such as gradient descent, Newton’s method, and other iterative
techniques will converge to the optimal solution, provided the function is convex and the
problem is properly formulated.

Consider the following example in the context of linear programming:

The objective function in a linear program is of the form cT x, which is a linear function.
Since every linear function is both convex and concave, linear programming problems

47/212
are a special case of convex optimization problems.

However, many practical optimization problems involve nonlinear convex functions, and
understanding the properties of convex functions is necessary for solving these problems.
The analysis of convexity allows us to extend optimization techniques to more general
problem types.

4. Examples of Convex Functions

Linear Functions: Any linear function of the form f (x) = cT x is convex (and also
concave). This follows directly from the definition of convexity because a linear function
satisfies the convexity inequality with equality.

Quadratic Functions: A function of the form f (x) = xT Ax + bT x + c, where A is a


positive semidefinite matrix, is convex. If A is positive definite, the function is strictly
convex.

Exponential Functions: The function f (x) = ex is convex. This can be verified by


checking that its second derivative is non-negative for all x.

Logarithmic Functions: The function f (x) = log(x) is convex for x > 0, as its second
derivative is positive for all x > 0.
Norm Functions: The p-norm f (x) = ∥x∥p is convex for p ≥ 1.

5. Properties of Convex Functions

Several important properties of convex functions will be crucial for optimization:

5.1 First-Order Condition for Convexity

A differentiable function f : Rn → R is convex if and only if for any two points x, y ∈ Rn ,


the following inequality holds:

f (y) ≥ f (x) + ∇f (x)T (y − x).

This inequality means that the tangent line at any point x lies below the graph of the
function. It is a first-order condition for convexity and is often used in optimization
algorithms.

48/212
5.2 Second-Order Condition for Convexity

A twice-differentiable function f: Rn → R is convex if and only if its Hessian matrix


H(f )(x) is positive semidefinite for all x ∈ Rn . That is:

zT H(f )(x)z ≥ 0 ∀z ∈ Rn .

If the Hessian is positive definite, the function is strictly convex.

5.3 Convexity and Jensen’s Inequality

Jensen’s inequality is a fundamental result that stems from the definition of convexity. It
states that if f is a convex function and X is a random variable, then:

f (E[X]) ≤ E[f (X)].

This inequality is widely used in various fields, including economics, statistics, and machine
learning.

6. Importance of Convex Functions in Optimization

Convex functions play a central role in optimization because of the following reasons:

Global Minima: A convex function on a convex domain has a unique global minimum (if
it has one), and any local minimum is also the global minimum.

Efficient Optimization Algorithms: Convexity enables the development of efficient


algorithms for solving optimization problems. For example, gradient-based methods
and interior-point methods are designed to exploit the properties of convex functions to
find optimal solutions efficiently.

Applications: Convex functions appear in many real-world optimization problems,


including in areas such as machine learning (e.g., convex loss functions), economics
(e.g., cost functions), and finance (e.g., portfolio optimization).

7. Conclusion

In this lecture, we introduced the concept of convex functions, which are a fundamental
class of functions in optimization. A convex function satisfies a specific inequality that

49/212
ensures any local minimum is also a global minimum. We explored several examples of
convex functions, including linear, quadratic, and exponential functions, and discussed the
key properties that characterize convex functions, such as first- and second-order conditions.
Convexity plays a critical role in optimization, enabling the use of powerful algorithms that
guarantee global optimality under certain conditions.

Lecture 16: Properties of Convex Functions and Examples

1. Introduction to the Properties of Convex Functions

Convex functions possess several remarkable properties that make them especially suitable
for optimization problems. These properties ensure that optimization problems involving
convex functions are well-behaved and can be solved efficiently. In this lecture, we explore
these properties in depth, with particular attention to how they relate to linear programming
(LP). Since linear functions are both convex and concave, many important results about the
objective function in LP can be derived from these properties. Additionally, we will discuss
how these properties provide the theoretical foundation for the simplex algorithm, one of
the most widely used methods for solving linear programming problems.

2. Key Properties of Convex Functions

2.1 Convexity and Local Minima

A fundamental property of convex functions is that every local minimum is a global


minimum. This is a key characteristic that distinguishes convex functions from non-convex
ones. Specifically, if f : Rn → R is convex and differentiable, and x∗ is a local minimum,
then:

f (x∗ ) ≤ f (x) ∀x ∈ Rn .

This property implies that convex functions do not have multiple local minima, which
simplifies the optimization process. For linear programming, where the objective function is
linear (and thus convex), the goal is to find the optimal solution within a feasible region that
is also convex.

2.2 Convexity and Continuity of the Objective Function

Convex functions are typically continuous and, under certain conditions, differentiable. The
continuity of convex functions ensures that there are no discontinuities or jumps in the

50/212
function, which is crucial for optimization. If the convex function is also differentiable, we can
apply calculus-based optimization methods like gradient descent to find the optimal
solution.

In linear programming, the objective function is linear and hence both continuous and
differentiable. This makes LP problems well-behaved and suitable for solution by gradient-
based methods or more specialized methods like the simplex algorithm.

2.3 Affine Functions and Their Role

Affine functions, which are of the form f (x) = cT x + b, are both convex and concave. This
duality arises because affine functions are linear transformations and do not exhibit any
curvature. The convexity of the objective function in linear programming follows from the
fact that it is affine. Therefore, when we optimize a linear function subject to linear
constraints, we are essentially working with a convex optimization problem.

In the context of the simplex algorithm, which is an iterative method for solving LP problems,
the linear objective function ensures that the algorithm progresses towards the optimal
solution through successive vertices of the feasible polytope.

2.4 Jensen’s Inequality

Jensen’s inequality is a direct consequence of the definition of convexity and provides a way
to understand the behavior of convex functions when applied to random variables or
expectations. For a convex function f and a random variable X , Jensen’s inequality states:

f (E[X]) ≤ E[f (X)],

where E[X] is the expected value of X . This inequality is important in many optimization
contexts, particularly in stochastic optimization problems, where the objective function
involves an expectation over random variables.

In linear programming, Jensen’s inequality does not have direct applicability, but it provides
insight into the behavior of convex functions in more complex optimization problems where
uncertainty or randomness is involved.

2.5 First-Order Condition for Convexity

For a differentiable function f: Rn → R, the first-order condition for convexity states that
the following inequality must hold for any two points x, y ∈ Rn :

f (y) ≥ f (x) + ∇f (x)T (y − x),

where ∇f (x) denotes the gradient of f at x. This inequality means that the function at any
point lies above the tangent hyperplane to the function at that point. It is an essential tool in

51/212
optimization, particularly for methods like gradient descent, which use the gradient to
iteratively approach the minimum of the function.

2.6 Second-Order Condition for Convexity

: Rn → R, the second-order condition for convexity


For a twice-differentiable function f
involves the Hessian matrix H(f )(x). A function is convex if and only if its Hessian matrix is
positive semidefinite at every point in its domain:

zT H(f )(x)z ≥ 0 ∀z ∈ Rn .

This condition ensures that the function is "curved upwards" and does not exhibit any local
maxima. If the Hessian is positive definite, the function is strictly convex, which means it has
a unique global minimum. For linear functions, the Hessian is zero, indicating that the
function is affine (neither strictly convex nor concave).

3. Examples of Convex Functions

3.1 Linear Functions

As previously noted, any linear function f (x) = cT x + b is both convex and concave. In the
case of linear programming, the objective function is linear, and the feasible region (defined
by linear constraints) is also convex. The linearity of the objective function ensures that the
optimization problem is convex, and the Simplex algorithm can be used to efficiently find the
optimal solution by navigating the vertices of the feasible polytope.

3.2 Quadratic Functions

A quadratic function f (x) = xT Ax + bT x + c is convex if the matrix A is positive


semidefinite. If A is positive definite, the function is strictly convex. Quadratic functions arise
in many optimization problems, including quadratic programming, which generalizes linear
programming by allowing quadratic terms in the objective function.

3.3 Exponential and Logarithmic Functions

The exponential function f (x) = ex is convex because its second derivative is always
positive, indicating that the function is "curved upwards" at every point.

The logarithmic function f (x) = log(x), for x > 0, is also convex, and it is used in
various optimization contexts, particularly in convex regression and maximum
likelihood estimation.

52/212
4. Connection to the Simplex Algorithm

The properties of convex functions, particularly the fact that linear functions are convex, are
integral to understanding why the simplex algorithm is effective for solving linear
programming problems. The simplex algorithm works by iterating over the vertices of the
feasible polytope defined by the linear constraints. Because the objective function in linear
programming is linear (and thus convex), the algorithm moves from one vertex to another,
always improving the objective value until it reaches the optimal vertex.

Convexity of the objective function guarantees that the simplex algorithm will not cycle
indefinitely and will always reach the optimal vertex.

Linear constraints define a convex feasible region, and the algorithm can efficiently
traverse the vertices of this region.

Despite its name, the simplex algorithm does not always follow a simplex shape in practice,
but its ability to find the optimal solution in a finite number of steps, under the assumption
of no degeneracy, is largely due to the convexity properties of the linear objective function.

5. Conclusion

In this lecture, we examined the key properties of convex functions, which play a central role
in optimization. We discussed the fundamental characteristics of convex functions, including
the relationship between local and global minima, as well as important conditions for
convexity, such as the first- and second-order conditions. These properties are essential for
understanding the theoretical foundation of linear programming and optimization
algorithms like the simplex algorithm, which is built on the convexity of linear functions and
the geometry of convex polytopes. By leveraging these properties, we can effectively solve
optimization problems with convex objective functions.

Lecture 17: Basic Feasible Solution

1. Introduction to Basic Feasible Solutions

In linear programming, the optimal solution lies at one of the vertices of the feasible region.
This fact is central to methods like the simplex algorithm, which iterates over the vertices of

53/212
the feasible polytope to find the optimal solution. However, before we can apply such
algorithms, we need to understand the structure of the feasible region and how we can
define its vertices.

In this lecture, we will formally introduce the concept of Basic Feasible Solutions (BFS),
which correspond to the vertices of the feasible region of a linear program. We will also
discuss how these solutions arise from the system of linear equations that define the
constraints of the linear program.

2. Feasible Region and Constraints

A linear program consists of an objective function to be optimized subject to a set of linear


constraints. These constraints define a feasible region, which is a polytope (or convex set) in
Rn . The feasible region is formed by the intersection of half-spaces, each corresponding to
one of the linear inequalities in the constraints.

The feasible region can be expressed as:

Feasible Region = {x ∈ Rn ∣ Ax ≤ b, x ≥ 0},

where A is the matrix of coefficients, b is the vector of constants, and x is the vector of
decision variables. The feasible region is typically a convex polytope, and its vertices
represent potential optimal solutions to the linear program.

3. Definition of Basic Feasible Solution

A Basic Feasible Solution (BFS) is defined as a solution that corresponds to a vertex of the
feasible region. These vertices are the solutions to the system of linear equations formed by
selecting a subset of the constraints, which are active at that point (i.e., the constraints are
either equalities or strict inequalities).

To define a BFS more precisely:

1. Selecting Active Constraints: Given a linear system of inequalities, select m linearly


independent constraints (where m is the number of variables in the linear program).
These m constraints are called active constraints, as they are satisfied with equality at
the solution.

54/212
2. Solving the System: Solve the system of m active constraints. If the solution exists and is
feasible, this solution is a BFS.

Formally, a Basic Feasible Solution is a solution x∗ ∈ Rn where:

Ax∗ = b, x∗ ≥ 0.

This solution is called basic because it corresponds to a set of m linearly independent


constraints that uniquely determine the solution, and it is feasible because it lies within the
feasible region defined by all the constraints.

4. Geometric Interpretation of BFS

Geometrically, a BFS corresponds to a vertex of the feasible region. Each vertex of a convex
polytope is determined by the intersection of a set of m linearly independent hyperplanes
(the active constraints). Since the feasible region is convex, the optimal solution to a linear
program will always lie at one of these vertices.

In R2 , the feasible region is a polygon, and the BFS corresponds to the vertices of this
polygon.

In R3 , the feasible region is a polyhedron, and the BFS corresponds to the vertices of this
polyhedron.

For example, if the feasible region is a polytope in Rn with m constraints, each BFS
corresponds to an intersection point of the hyperplanes defined by these m constraints.

5. BFS and Linear Systems

Given a linear program:

maximize cT x, subject to Ax ≤ b, x ≥ 0,

the BFS corresponds to a solution to the system of equations formed by the active
constraints. Let us assume that we have selected m constraints that are active at a solution
x∗ . These constraints can be written as:

55/212
Aactive x∗ = bactive ,
​ ​

where Aactive is the matrix of coefficients corresponding to the selected active constraints,

and bactive is the vector of corresponding constants.


To ensure that this solution is feasible, we must also check that x∗ ≥ 0, as the solution must
lie within the non-negative orthant.

6. Examples of Basic Feasible Solutions

Consider a simple linear program with two decision variables x1 and x2 , and two linear
​ ​

constraints:

x1 + x2 ≤ 5,
​ ​ x1 ≥ 0,
​ x2 ≥ 0.

The feasible region is a triangle in R2 , bounded by the axes and the line x1 ​ + x2 = 5 .

To find the BFS:

Select the active constraints at each vertex:

1. At (x1 , x2 )
​ ​ = (0, 0), the active constraints are x1 ≥ 0 and x2 ≥ 0.
​ ​

2. At (x1 , x2 )
​ ​ = (5, 0), the active constraints are x1 + x2 = 5 and x1 ≥ 0.
​ ​ ​

3. At (x1 , x2 )
​ ​ = (0, 5), the active constraints are x1 + x2 = 5 and x2 ≥ 0.
​ ​ ​

Each of these vertices corresponds to a Basic Feasible Solution, as they satisfy the active
constraints and are feasible.

7. Uniqueness of Basic Feasible Solutions

In general, a linear program may have multiple BFSs. However, the simplicity of the linear
program means that the set of BFSs is finite, and at least one of them is optimal. The simplex
algorithm leverages this fact by systematically exploring the BFSs to find the one that
maximizes or minimizes the objective function.

While a BFS corresponds to a vertex of the feasible region, multiple BFSs can exist at
different vertices of the polytope. The key point is that each BFS is a candidate for the

56/212
optimal solution, and the simplex algorithm moves between these vertices to find the
optimal solution.

8. Conclusion

In this lecture, we defined Basic Feasible Solutions and explored their role in linear
programming. These solutions correspond to the vertices of the feasible region, and they are
fundamental to the optimization process in linear programs. By understanding the structure
of the feasible region and how BFSs are formed, we can better understand optimization
algorithms like the simplex method, which relies on exploring these BFSs to find the optimal
solution. Basic feasible solutions serve as the cornerstone for solving linear programs and
form the basis of many computational techniques in optimization.

Lecture 18: BFS and Vertices

1. Introduction to BFS and Vertices

In the previous lecture, we introduced Basic Feasible Solutions (BFSs) and established that
they correspond to vertices of the feasible region of a linear program. This is a crucial
concept because many optimization algorithms, particularly the simplex method, explore
these vertices in search of the optimal solution.

In this lecture, we will investigate the relationship between Basic Feasible Solutions (BFSs)
and the vertices of the feasible region. Specifically, we will discuss how every vertex
corresponds to a BFS, and conversely, how each BFS defines a vertex. Additionally, we will
explore the possibility that different BFSs may correspond to the same vertex.

2. Vertices and Basic Feasible Solutions

Consider a linear program defined by the following system of inequalities:

maximize cT x, subject to Ax ≤ b, x ≥ 0.

The feasible region of this linear program is the set of all points x ∈ Rn that satisfy the
constraints. This region is a convex polytope (or a polyhedron in higher dimensions).

57/212
Vertices of the feasible region are points where several hyperplanes (the boundaries of the
constraints) intersect. Each of these intersection points corresponds to a Basic Feasible
Solution (BFS). A BFS is defined as a solution to the system of active constraints, where a
subset of constraints are satisfied as equalities. These active constraints uniquely determine
a point in the feasible region.

3. BFS as Vertices

From the previous lecture, we know that a Basic Feasible Solution corresponds to a solution
to a system of active constraints, which are linearly independent. Each set of m active
constraints, where m is the number of variables, defines a unique point in the feasible
region, which is a vertex.

Conversely, every vertex of the feasible region is associated with a BFS. The geometrical
interpretation is that the feasible region can be viewed as a convex polytope in Rn , where
each vertex is formed by the intersection of m linearly independent constraints. These
constraints give rise to the BFS corresponding to that vertex.

Thus, we can conclude that:

Every vertex of the feasible region is a BFS.

Every BFS corresponds to a vertex of the feasible region.

This equivalence is foundational in linear programming because the search for the optimal
solution can be reduced to examining the vertices of the feasible region.

4. Multiple BFSs Corresponding to the Same Vertex

While each vertex corresponds to a BFS, it is important to note that different BFSs can
correspond to the same vertex. This can occur when different subsets of constraints define
the same point in the feasible region. In other words, while a vertex is defined by a unique
intersection of m linearly independent constraints, different combinations of active
constraints may describe the same vertex.

For example, consider the following scenario:

58/212
A vertex of the feasible region could be formed by the intersection of three active
constraints: x1 ​ = 2, x2 = 3, and x3 = 1.
​ ​

Another BFS might involve the intersection of a different set of three constraints, say
x1 + x2 = 5, x2 = 3, and x3 = 1, which still leads to the same point (x1 , x2 , x3 ) =
​ ​ ​ ​ ​ ​ ​

(2, 3, 1).

In this case, both sets of constraints describe the same vertex, but they correspond to
different BFSs. This phenomenon occurs because there are often multiple ways to select the
active constraints that uniquely define a given vertex.

5. Geometrical Insight

To illustrate this equivalence further, let’s consider a simple linear program in R2 with two
decision variables x1 and x2 , and two linear constraints:
​ ​

x1 + x2 ≤ 5,
​ ​ x1 ≥ 0,
​ x2 ≥ 0.

The feasible region is a triangle, and the three vertices of this triangle correspond to three
BFSs. These BFSs correspond to:

1. x1 = 0, x2 = 0 (vertex at the origin).


​ ​

2. x1 = 5, x2 = 0 (vertex on the x1 -axis).


​ ​ ​

3. x1 = 0, x2 = 5 (vertex on the x2 -axis).


​ ​ ​

In this case, each vertex corresponds to a unique BFS. However, in higher-dimensional


problems, particularly when the feasible region is more complex, it is often the case that
different BFSs will correspond to the same vertex. This equivalence between BFSs and
vertices is crucial for understanding how optimization algorithms, like the simplex method,
explore the feasible region.

6. Mathematical Formulation of Equivalence

Let’s formalize the equivalence between BFSs and vertices:

59/212
A vertex is the intersection point of a set of m linearly independent hyperplanes
(constraints) that uniquely determine the point in the feasible region.

A BFS is a solution to the system of active constraints at a given point, where these
constraints are linearly independent.

Thus, the relationship between BFSs and vertices is as follows:

Each BFS corresponds to a unique point (vertex) in the feasible region, which is the
solution to a system of m active, linearly independent constraints.

Different BFSs can correspond to the same vertex if different sets of active constraints
describe the same point.

Formally, a vertex is characterized by:

Aactive x = bactive ,
​ ​ x ≥ 0,

where Aactive represents the matrix of the active constraints, and x is the vector of decision

variables at the vertex.

7. Conclusion

In this lecture, we have explored the important relationship between Basic Feasible
Solutions (BFSs) and the vertices of the feasible region in a linear program. We established
that:

Every vertex of the feasible region is a BFS.

Every BFS corresponds to a vertex of the feasible region.

Additionally, we discussed the fact that multiple BFSs can correspond to the same vertex,
depending on the choice of active constraints. This understanding is fundamental for
optimization algorithms like the simplex method, which explores the vertices of the feasible
region to find the optimal solution. Understanding the equivalence between BFSs and
vertices allows for efficient traversal of the feasible region during optimization.

Lecture 19: Simplex Algorithm

1. Introduction to the Simplex Algorithm

60/212
In this lecture, we will formally introduce the Simplex Algorithm, a widely used algorithm for
solving linear programming problems. The algorithm leverages the fact that the optimal
solution to a linear program lies at a vertex of the feasible region. Since every Basic Feasible
Solution (BFS) corresponds to a vertex of the feasible region, the simplex method iteratively
moves from one BFS to another, improving the objective function at each step until the
optimal solution is found.

We will first review the key ideas behind the Simplex algorithm and then walk through an
example to demonstrate its operation.

2. Key Concepts Behind the Simplex Algorithm

The Simplex algorithm works on the following principles:

Feasible Region and Vertices: The feasible region of a linear program is a convex
polytope, and the optimal solution is found at one of its vertices. Each vertex
corresponds to a BFS, and the Simplex algorithm explores these vertices.

Objective Function: The goal is to optimize the objective function (either maximizing or
minimizing), which is typically a linear function. The Simplex method moves along the
edges of the polytope, from one vertex to another, improving the objective function at
each step.

Pivoting: At each iteration, the Simplex algorithm selects an adjacent BFS (i.e., an
adjacent vertex of the polytope) to move to. This selection is done based on the
coefficients of the objective function, and the algorithm uses a pivot operation to move
between vertices.

Termination: The algorithm terminates when the current BFS cannot be improved
further, meaning that the optimal solution has been found. This occurs when the
objective function cannot be increased (or decreased, in the case of minimization) by
moving to an adjacent vertex.

3. Steps in the Simplex Algorithm

The Simplex algorithm proceeds in the following steps:

61/212
1. Initial Basic Feasible Solution (BFS): Start with an initial BFS. If the problem is in
standard form (i.e., all constraints are inequalities, and all variables are non-negative),
this step is straightforward. Otherwise, artificial variables may be introduced to obtain a
feasible starting point (e.g., using the Big-M method or two-phase method).

2. Pivoting:

Select the entering variable: Determine which non-basic variable (a variable not
currently in the BFS) should enter the basis (i.e., which variable should increase to
improve the objective function). This is typically done by examining the coefficients
of the objective function in the tableau.

Select the leaving variable: Once the entering variable is chosen, determine which
basic variable (currently part of the BFS) should leave the basis. This is done by
checking the constraints and finding the variable that will become zero first when
the entering variable increases.

Perform a pivot operation: The pivot operation updates the tableau by replacing
the old basis with the new one. This involves solving a system of linear equations to
update the values of the variables and objective function.

3. Repeat the Process: After performing the pivot, update the BFS and repeat the process
until no further improvements to the objective function can be made.

4. Optimality Check: The algorithm checks the optimality condition: if all the coefficients of
the objective function corresponding to non-basic variables are non-negative (in the case
of maximization), the current BFS is optimal. If not, the algorithm continues to pivot to a
better BFS.

4. Example of Solving a Linear Program Using the Simplex Method

Consider the following linear program:

maximize Z = 3x1 + 2x2 ​ ​

subject to x1 + x2 ≤ 4

2x1 + x2 ≤ 5
​ ​

x1 ≥ 0,

x2 ≥ 0

62/212
To apply the Simplex algorithm, we first convert this problem into standard form by
introducing slack variables s1 and s2 for the constraints. This gives:
​ ​

maximize Z = 3x1 + 2x2 ​ ​

subject to x1 + x2 + s 1 = 4
​ ​ ​

2x1 + x2 + s2 = 5
​ ​ ​

x1 ≥ 0,

x2 ≥ 0, ​
s1 ≥ 0, ​
s2 ≥ 0 ​

Now, we set up the initial tableau for the Simplex method. The initial tableau includes the
coefficients of the objective function and the constraints.

Basic Variable x1 x2 s 1​ ​ ​ s2 ​ RHS


s1 ​ 1 1 1 0 4
s2 2 1 0 1 5
​ ​ ​ ​ ​ ​ ​

Z −3 −2 0 0 0

4.1 Iteration 1:

Identify the entering variable: We look at the coefficients in the objective function row.
The most negative coefficient is −3 (corresponding to x1 ), so x1 will enter the basis. ​ ​

Identify the leaving variable: We perform the ratio test on the RHS values divided by
the corresponding coefficients of the entering variable:

4 5
= 4, = 2.5
1 2
​ ​

Since 2.5 is smaller, s2 will leave the basis.


Perform the pivot: The pivot element is 2 (the coefficient of x1 in the s2 -row). We now ​ ​

perform the pivot operation to update the tableau.

After performing the pivot, we get the following updated tableau:

Basic Variable x1 ​ x2 s1 ​ ​ s2 ​ RHS


s1 ​ 0 0.5 1 −0.5 2
x1 1 0.5 0 0.5 2.5
​ ​ ​ ​ ​ ​ ​

Z 0 −0.5 0 1.5 7.5

4.2 Iteration 2:

Identify the entering variable: The most negative coefficient is −0.5 (corresponding to
x2 ), so x2 will enter the basis.
​ ​

Identify the leaving variable: We perform the ratio test again:

63/212
2 2.5
= 4, =5
0.5 0.5
​ ​

The smallest ratio is 4, so s1 will leave the basis.


Perform the pivot: The pivot element is 0.5 (the coefficient of x2 in the s1 -row). After ​ ​

performing the pivot, we obtain the final tableau:

Basic Variable x1 ​ x2 ​ s1 s2​ ​ RHS


x2 ​ 0 1 2 −1 4
x1 1 0 −1 1 1
​ ​ ​ ​ ​ ​ ​

Z 0 0 1 1 10

At this point, all coefficients in the objective function row are non-negative, so the algorithm
terminates. The optimal solution is:

x1 = 1, ​ x2 = 4,
​ Z = 10.

5. Conclusion

In this lecture, we introduced the Simplex Algorithm, which is used to solve linear
programming problems by moving from one BFS to another, improving the objective
function at each step. Through an example, we demonstrated the mechanics of the
algorithm, including how to form the initial tableau, perform the pivot operations, and
identify the entering and leaving variables. The algorithm terminates when no further
improvement in the objective function is possible, and the optimal solution is reached. The
Simplex algorithm is efficient and widely used in practice to solve large-scale linear
programming problems.

Lecture 20: Details of Simplex Algorithm

1. Introduction to the Formalization of Simplex Algorithm

In this lecture, we will formalize the Simplex Algorithm and describe it step-by-step,
highlighting the mathematical formulations and operations that occur at each iteration.
Building on the informal example presented earlier, we will now provide the formal
mechanics of the algorithm using the current Basic Feasible Solution (BFS) and the
corresponding coefficients.

The goal is to provide a detailed understanding of each phase of the algorithm, focusing on
how the BFS evolves and how we update the solution at each step. The simplex algorithm

64/212
progresses through a series of pivot operations, where each operation involves selecting an
entering variable (which is not in the BFS) and a leaving variable (which is part of the BFS),
and updating the tableau accordingly.

2. The Simplex Tableau

The Simplex method operates using a tableau, which is an organized matrix that represents
the current solution, including the coefficients of the objective function and the constraints.
The structure of the Simplex tableau is as follows:

Basic Variables x1 ​ x2 ​ … xn​ RHS


Constraint 1 a11 ​ a12 ​ … a1n ​ b1

Constraint 2 a21 ​ a22 ​ … a2n ​ b2


​ ​ ​ ​ ​ ​ ​

⋮ ⋮ ⋮ ⋱ ⋮ ⋮
Constraint m am1 ​ am2 ​ … amn ​ bm ​

Z −c1 ​ −c2 ​ … −cn ​ Z0 ​

Where:

Basic Variables: These are the variables that are currently in the BFS.

x₁, x₂, ..., xn: These are the decision variables (non-basic variables).

RHS: Right-hand side values (the constants on the right of the equations).

c₁, c₂, ..., cn: Coefficients of the objective function.

Z₀: The current value of the objective function at the BFS.

3. Steps of the Simplex Algorithm

The Simplex algorithm proceeds through the following steps at each iteration:

3.1 Initialization

We begin with an initial BFS corresponding to a feasible solution. The initial tableau is
constructed based on the constraints and the objective function in standard form.

If the problem is not in standard form (e.g., if there are artificial variables or the problem
is not feasible), methods such as the Big-M method or Two-Phase method can be used

65/212
to obtain a feasible starting point.

3.2 Identifying the Entering Variable

At each iteration, we identify the entering variable. The entering variable is selected by
examining the coefficients in the last row of the tableau (corresponding to the objective
function). We pick the variable that will most improve the objective function (i.e., the most
negative coefficient in the case of maximization).

For a maximization problem, the entering variable is the one with the most negative
coefficient in the objective row.

For a minimization problem, the entering variable is the one with the most positive
coefficient.

This variable corresponds to a non-basic variable that will be increased from 0 to a positive
value, potentially improving the objective.

3.3 Identifying the Leaving Variable

Once the entering variable is selected, we need to determine the leaving variable. The
leaving variable is the basic variable that will be replaced by the entering variable in the BFS.
The leaving variable is selected using the minimum ratio test.

The ratio test involves dividing the RHS values by the corresponding coefficients of the
entering variable in each constraint row. The leaving variable is the one whose ratio is the
smallest and positive, ensuring that the new solution remains feasible.

The ratio test is formulated as:

RHSi
Ratio =

coefficient of entering variable in row i


The leaving variable is the one with the smallest positive ratio. If all the ratios are negative or
infinite, then the current solution is optimal.

3.4 Pivoting

The pivot operation is performed to update the tableau after selecting the entering and
leaving variables. This operation updates the tableau such that the entering variable
becomes part of the BFS, and the leaving variable is removed.

The basic steps of pivoting are:

1. Identify the pivot element, which is the coefficient of the entering variable in the row of
the leaving variable.

66/212
2. Normalize the pivot row by dividing it by the pivot element.

3. Perform row operations to update the other rows, ensuring that all other coefficients in
the entering variable's column are zero.

3.5 Updating the Tableau

After the pivot, the tableau is updated to reflect the new BFS. The coefficients of the objective
function row and the constraint rows are modified to incorporate the new values of the
variables.

4. Optimality Check

Once the tableau is updated, we check for optimality. The current solution is optimal if all
the coefficients in the objective function row (except for the RHS column) are non-negative in
the case of maximization. If this condition is met, the algorithm terminates, and the current
BFS is the optimal solution.

If there are still negative coefficients in the objective row, the algorithm proceeds to the next
iteration by selecting a new entering variable and repeating the process.

5. Example of Formal Simplex Steps

Consider the following linear program:

maximize Z = 3x1 + 2x2 ​ ​

subject to x1 + x2 ≤ 4

2x1 + x2 ≤ 5
​ ​

x1 ≥ 0,

x2 ≥ 0

The problem is converted into standard form by introducing slack variables s1 and s2 : ​ ​

maximize Z = 3x1 + 2x2 ​ ​

subject to x1 + x2 + s 1 = 4
​ ​ ​

2x1 + x2 + s2 = 5
​ ​ ​

67/212
x1 ≥ 0,
​ x2 ≥ 0, ​ s1 ≥ 0, ​ s2 ≥ 0 ​

The initial tableau is:

Basic Variables x1 x2 s 1​ ​ ​
s2 ​
RHS
s1 ​ 1 1 1 0 4
s2 2 1 0 1 5
​ ​ ​ ​ ​ ​ ​

Z −3 −2 0 0 0

Step 1: Choose the entering variable

The most negative coefficient in the objective function row is −3 (corresponding to x1 ), ​

so x1 enters the basis.


Step 2: Choose the leaving variable

We perform the ratio test:


4 5
= 4, = 2.5
1 2
​ ​

The smallest ratio is 2.5, so s2 leaves the basis.


Step 3: Pivoting

The pivot element is 2 (the coefficient of x1 in the s2 -row). After performing the pivot, we
​ ​

get the following updated tableau:

Basic Variables x1 ​ x2 s1 ​ ​ s2 ​ RHS


s1 ​
0 0.5 1 −0.5 2
x1 1 0.5 0 0.5 2.5
​ ​ ​ ​ ​ ​ ​

Z 0 −0.5 0 1.5 7.5

Step 4: Repeat

Since the most negative coefficient in the objective row is −0.5, x2 will enter the basis, ​

and s1 will leave. The pivoting steps will continue until all coefficients in the objective

row are non-negative.

6. Conclusion

In this lecture, we formalized the Simplex Algorithm by breaking it down into precise
mathematical steps. The key steps involved in the algorithm include selecting the entering
and leaving variables, performing the pivot operation, and updating the tableau until the

68/212
optimal solution is reached. The algorithm terminates when no further improvement in the
objective function is possible, and the optimal BFS is found. The formal approach ensures
that the Simplex algorithm is systematic and effective for solving linear programming
problems.

Lecture 21: Starting BFS

1. Introduction to the Problem of Finding the Starting BFS

In the Simplex algorithm, we need a Basic Feasible Solution (BFS) to begin the optimization
process. However, in many cases, it is not immediately obvious what the starting BFS is,
especially if the linear program is not already in a form that readily gives a feasible solution.
In this lecture, we will discuss a clever method to find the starting BFS by transforming the
original linear program into a new linear program with a different objective function. This
transformation ensures that the new linear program is easy to solve, providing us with an
initial BFS from which we can begin applying the Simplex method.

2. Standard Form of a Linear Program

Recall that a linear program is typically given in the following standard form:

maximize cT x

subject to Ax = b

x≥0

Where:

x ∈ Rn is the vector of decision variables.


A ∈ Rm×n is the matrix of coefficients for the constraints.
b ∈ Rm is the vector of constants on the right-hand side of the constraints.
c ∈ Rn is the vector of coefficients of the objective function.

In this form, the feasible region is defined by the system of linear equations Ax = b, subject
to the non-negativity constraints x ≥ 0.
A key challenge in applying the Simplex algorithm is finding an initial BFS that satisfies these
constraints. If the linear program is not in a form that trivially provides a feasible solution (for
example, if the system has artificial variables or if it is not feasible), the starting BFS is not
obvious.

3. Two-Phase Method for Finding the Starting BFS

69/212
To overcome the issue of finding an initial feasible solution, we use a technique called the
Two-Phase Method. This method converts the original problem into a new problem with a
different objective function, making it easier to identify a starting BFS.

3.1 Phase 1: Transforming the Original Problem

The idea behind the Two-Phase method is to introduce artificial variables that ensure a
feasible starting solution. Artificial variables are introduced in such a way that the initial
solution can be constructed easily and is guaranteed to be feasible.

For each constraint Ax = b, if no feasible solution exists (i.e., b cannot be satisfied by


x ≥ 0 alone), we introduce artificial variables a1 , a2 , … , am such that:
​ ​ ​

Ax + Ia = b

where I is the identity matrix and a is the vector of artificial variables. This system
ensures that an initial BFS is feasible because the artificial variables can be set to satisfy
the equality constraints.

3.2 New Objective Function for Phase 1

Once the artificial variables are introduced, the new linear program has the following
structure:
m
maximize ∑ ai ​ ​

i=1

subject to Ax + Ia = b

x ≥ 0, a≥0

In this transformed problem, the goal is to minimize the sum of the artificial variables. The
intuition here is that if the optimal value of the objective function is 0, all the artificial
variables are 0, and the original problem has a feasible solution. If the optimal value is
greater than 0, the original problem is infeasible.

3.3 Phase 1 Solution

The Phase 1 linear program can be solved using the Simplex method. The solution to this
phase will either:

Achieve an optimal objective value of 0: In this case, all artificial variables are 0, and the
original system is feasible. We can then proceed to Phase 2.

70/212
Achieve a positive optimal objective value: In this case, the original system is
infeasible, and no BFS exists.

4. Phase 2: Returning to the Original Problem

Once Phase 1 is completed, we have a feasible solution (if the objective is 0). Now, we return
to the original problem and apply the Simplex method to optimize the original objective
function.

4.1 Starting BFS in Phase 2

The BFS found at the end of Phase 1 (where the artificial variables are 0) serves as the
starting BFS for the original problem in Phase 2. Since we have already established that this
solution satisfies the constraints of the original problem, we can proceed with the Simplex
method from this point.

4.2 Phase 2 Objective Function

The objective function for Phase 2 is simply the original objective function:

maximize cT x

subject to the constraints Ax = b and x ≥ 0. The solution to this problem is obtained by


applying the Simplex algorithm, starting from the BFS found in Phase 1.

5. Example of the Two-Phase Method

Consider the following linear program:

maximize Z = x1 + x2 ​ ​

subject to x1 + 2x2 ≥ 4
​ ​

x1 + x2 ≤ 5
​ ​

x1 ≥ 0,
​ x2 ≥ 0

To convert this into standard form, we need to:

Convert x1 ​ + 2x2 ≥ 4 into x1 + 2x2 − s1 = 4, where s1 is a surplus variable.


​ ​ ​ ​ ​

Convert x1 ​ + x2 ≤ 5 into x1 + x2 + s2 = 5, where s2 is a slack variable.


​ ​ ​ ​ ​

Now, the system becomes:

71/212
maximize Z = x1 + x2 ​ ​

subject to x1 + 2x2 − s1 = 4
​ ​ ​

x1 + x2 + s 2 = 5
​ ​ ​

x1 , x2 , s 1 , s 2 ≥ 0
​ ​ ​ ​

To ensure feasibility, we introduce artificial variables a1 and a2 for the two constraints: ​ ​

maximize a1 + a2 ​ ​

subject to x1 + 2x2 − s1 + a1 = 4
​ ​ ​ ​

x1 + x2 + s 2 + a 2 = 5
​ ​ ​

x1 , x2 , s 1 , s 2 , a 1 , a 2 ≥ 0
​ ​ ​ ​ ​ ​

Phase 1 of the Simplex method is then used to minimize a1 ​ + a2 . If we can achieve an


optimal value of 0, we proceed to Phase 2, where we drop the artificial variables and
optimize the original objective function Z = x1 + x2 . ​

6. Conclusion

In this lecture, we discussed how to find the starting BFS for the Simplex algorithm using the
Two-Phase Method. This method transforms the original linear program into a new linear
program with a different objective function. By introducing artificial variables and minimizing
their sum in Phase 1, we can ensure a feasible starting solution. Once Phase 1 completes
with an optimal objective value of 0, we move on to Phase 2 and apply the Simplex algorithm
to the original problem, starting from the BFS found in Phase 1. This approach guarantees
that we always have a valid starting point for the Simplex method.

Lecture 22: Degeneracy

1. Introduction to Degeneracy in the Simplex Algorithm

In the Simplex algorithm, we traverse from one Basic Feasible Solution (BFS) to another by
pivoting through the feasible region. The primary goal is to improve the objective function at
each step. However, in some cases, the algorithm might enter a cycle of pivots, where the
objective value does not increase. This phenomenon is called degeneracy.

72/212
Degeneracy occurs when a BFS corresponds to a solution where more than n variables
(where n is the number of decision variables) are non-zero in the optimal solution, resulting
in a situation where the algorithm revisits the same BFS without improving the objective
function. This can lead to cycling, where the algorithm does not make progress and could
theoretically never terminate.

In this lecture, we will define degeneracy, explore its consequences, and discuss techniques
to avoid it during the execution of the Simplex algorithm.

2. What is Degeneracy?

A linear program is said to be degenerate at a given BFS if there are more than n linearly
independent constraints active at the BFS, or equivalently, if more than n basic variables are
non-zero. This can result in multiple ways to enter a new BFS without improving the objective
function, causing the algorithm to "cycle" and revisit the same BFS without progress.

3. Why Does Degeneracy Happen?

Degeneracy arises because, at a BFS, we may have multiple constraints that are "active" (i.e.,
satisfied as equalities), but not all of them are necessary to describe the vertex. This excess
of active constraints creates a situation where pivoting between different BFSs results in no
improvement in the objective function value, causing a cycle.

Geometrically, degeneracy occurs when the feasible region has a vertex where more
than n constraints are active. This excess of constraints does not change the point of
intersection, even if the BFS changes.

4. Example of Degeneracy

Consider the following linear program:

maximize Z = x1 + x2 ​ ​

subject to x1 + x2 = 2

x1 = 1​

x2 = 1​

x1 ≥ 0,
​ x2 ≥ 0

The constraints x1 ​ + x2 = 2, x1 = 1, and x2 = 1 form a degenerate BFS at (1, 1), because


​ ​ ​

all three constraints are active at this point. Despite having more than two active constraints,
the feasible region only defines a single point. The Simplex algorithm could pivot between

73/212
different bases (such as choosing x1 and x2 to be basic variables) without changing the
​ ​

objective value, leading to a cycle of solutions.

5. Consequences of Degeneracy

Degeneracy can cause the Simplex algorithm to:

Cycle indefinitely: The algorithm might revisit the same BFS without making any
progress towards improving the objective function, resulting in an infinite loop.

Wasting computational resources: Even though the objective value does not improve,
the algorithm continues to perform unnecessary computations, potentially making the
process inefficient.

6. How to Avoid Degeneracy: The Rule of Bland’s Lemma

Bland's Lemma provides a simple and effective way to avoid degeneracy and prevent cycling
in the Simplex algorithm. Bland’s Lemma states that the Simplex algorithm will not cycle if, at
each pivot step, we select the entering and leaving variables in a lexicographically consistent
manner.

Lexicographical Rule: When selecting a variable to enter the basis, choose the variable
with the smallest index (i.e., the first index that results in a positive coefficient in the
objective function row). Similarly, when selecting a variable to leave the basis, choose the
one with the smallest index among the candidates for leaving.

This rule ensures that each pivot step strictly increases the objective function or leads to a
new BFS that has not been visited before. By following this rule, the Simplex algorithm is
guaranteed to terminate in a finite number of steps.

7. Formalizing Bland’s Lemma

Bland’s Lemma guarantees that if there is a degenerate cycle, the algorithm will eventually
move to a different BFS. Specifically, it ensures that:

1. Pivot Selection: At each step, the entering variable is chosen with the smallest index
among all variables that can increase the objective function. Similarly, the leaving
variable is chosen with the smallest index among those that satisfy the ratio test.

2. Termination: With this approach, the algorithm cannot cycle because every pivot step
selects a "new" direction, and the algorithm moves forward in a systematic, non-
repetitive manner.

8. Example of Applying Bland’s Lemma

74/212
Consider the following Simplex tableau at some point in the algorithm:

Basic x1 x2
​ ​ x3 ​ x4 ​ RHS
x1 ​
1 1 0 0 5
x2 0 1 1 0 2
​ ​ ​ ​ ​ ​ ​

z −1 0 0 0 0

At this step, if the Simplex method chooses x1 to leave the basis, we must select the entering

variable. Suppose x2 has the largest positive coefficient. Using Bland's Lemma, if we have

any ties in coefficients, we select the variable with the smallest index. This ensures that the
algorithm does not get stuck in a cycle.

9. Conclusion

Degeneracy is an issue in the Simplex algorithm that can lead to cycles, causing the
algorithm to revisit the same BFS without improving the objective function. However, by
following the lexicographical rule outlined in Bland’s Lemma, we can avoid cycling and
ensure that the Simplex algorithm terminates in a finite number of steps. This simple rule
provides an effective way to handle degeneracy and guarantees the correctness of the
Simplex method, even in cases where degeneracy might occur.

Lecture 23: Introduction to Duality

1. Overview of Duality in Linear Programming

Duality is a fundamental concept in linear programming that connects every linear program
(referred to as the primal problem) with another linear program, called the dual problem.
The dual problem provides an alternative perspective on the original optimization problem,
and in many cases, solving the dual problem is computationally more efficient. Moreover, the
solutions to the primal and dual problems are related in an elegant way.

In this lecture, we introduce the theory of duality, beginning with basic concepts of
separation and a brief review of some key mathematical foundations. The connection
between the primal and dual problems will be explored, and we will establish the
relationships that exist between them.

2. Primal Problem (Maximization)

We begin by considering a standard linear programming problem in maximization form.


The primal problem is given by:

maximize cT x

subject to Ax ≤ b

75/212
x≥0

Where:

x ∈ Rn is the vector of decision variables.


A ∈ Rm×n is the matrix of coefficients for the constraints.
b ∈ Rm is the vector of constraint bounds.
c ∈ Rn is the vector of coefficients in the objective function.
x ≥ 0 represents the non-negativity constraints on the decision variables.

The objective is to maximize cT x, subject to the constraints Ax ≤ b and x ≥ 0.

3. Dual Problem (Minimization)

For every linear program (primal), there exists a corresponding dual linear program, which in
the case of the primal maximization problem is a minimization problem. The dual problem is
formulated as follows:

minimize bT y

subject to AT y ≥ c

y≥0

Where:

y ∈ Rm is the vector of decision variables in the dual problem.


bT y is the objective function in the dual problem, which we aim to minimize.
AT y ≥ c are the constraints in the dual problem.
y ≥ 0 represents the non-negativity constraints for the dual variables.

Interpretation of the Dual Problem:

Each dual variable yi corresponds to a constraint in the primal problem.


The objective function bT y in the dual problem represents the total "cost" of the
resources (or constraints) in the primal.

The constraints AT y ≥ c in the dual problem ensure that the dual solution respects the
objective function of the primal.

4. Relation Between Primal and Dual

76/212
The dual of a primal maximization problem with ≤ constraints is a dual minimization
problem with ≥ constraints. The key idea is that the solution to the primal problem provides
information about the optimal values of the dual variables, and vice versa.

We introduce the concept of separation: the separation theorem provides a formal way to
understand how the primal and dual solutions are related. The optimal value of the dual is
always a lower bound to the primal's optimal value in the maximization case. The duality
gap, the difference between the primal and dual objective values, is zero when both the
primal and dual problems have feasible solutions and are optimal.

This relationship leads to strong duality, which states that if both the primal and dual
problems have optimal solutions, their optimal objective values are equal.

5. Geometric Interpretation of Duality

Geometrically, duality can be viewed as a transformation from the space of decision variables
in the primal problem to a space of dual variables. In this sense:

The primal problem focuses on finding the best combination of variables to optimize the
objective function under certain constraints.

The dual problem focuses on determining the values for the constraints that give the
best (minimum) cost for achieving the optimal solution of the primal.

This geometric interpretation leads to the fundamental theorem of duality: if the primal
problem has an optimal solution, then so does the dual problem, and the objective values of
the primal and dual problems are equal.

6. Weak Duality Theorem

The weak duality theorem asserts that for any feasible solution x to the primal problem and
y to the dual problem, the value of the objective function for the dual is always greater than
or equal to the value of the objective function for the primal. In mathematical terms:

cT x ≤ bT y for all feasible x and y.

This theorem provides a lower bound for the primal objective and an upper bound for the
dual objective. If both the primal and dual problems have optimal solutions, weak duality
implies that their optimal values must be equal.

7. Strong Duality Theorem

The strong duality theorem states that if the primal problem has an optimal solution and
the dual problem also has an optimal solution, then the optimal objective values of the

77/212
primal and dual problems are equal:

Optimal value of the primal = Optimal value of the dual.

This result guarantees that the optimal solutions to the primal and dual problems coincide in
value, which is a powerful tool in optimization.

8. Example: Primal-Dual Pair

Consider the following simple linear program (primal):

maximize 3x1 + 2x2 ​ ​

subject to x1 + x2 ≤ 4 ​

x1 ≤ 3

x1 , x2 ≥ 0
​ ​

The corresponding dual problem is:

minimize 4y1 + 3y2 ​ ​

subject to y1 + y2 ≥ 3 ​ ​

y1 + 2y2 ≥ 2
​ ​

y1 , y2 ≥ 0
​ ​

Here, the primal maximization problem involves maximizing the value of 3x1 ​ + 2x2 subject

to two constraints. The dual problem seeks to minimize the cost 4y1 ​ + 3y2 , subject to

constraints involving the dual variables y1 and y2 . The strong duality theorem assures us
​ ​

that if both the primal and dual have optimal solutions, their objective values will be equal.

9. Conclusion

Duality is a central concept in linear programming that reveals deep connections between
optimization problems. The primal problem and its dual problem provide complementary
perspectives on the same problem. The study of duality provides a powerful framework for
solving optimization problems, understanding the structure of optimal solutions, and
developing efficient algorithms.

In the next steps, we will delve deeper into the theory and practical implications of duality,
including how to construct the dual for more complex problems and apply duality to derive
bounds and solutions.

78/212
Lecture 24: Hyperplane Separation Problem

1. Introduction to Separation Theorems

The hyperplane separation problem is a key concept in convex geometry and optimization.
It deals with the possibility of separating convex sets using hyperplanes. A hyperplane is a
flat affine subspace of one dimension less than the space in which it resides (in Rn , a
hyperplane is an (n − 1)-dimensional affine subspace). The separation theorem is
concerned with finding a hyperplane that can "separate" two distinct convex sets, or a
convex set and a point, such that one set lies entirely on one side of the hyperplane and the
other set lies entirely on the opposite side.

This problem is fundamental to optimization, as it provides the mathematical foundation for


understanding how convex optimization problems are structured and solved.

2. The Basic Hyperplane Separation Theorem

The separation theorem states that if we have two disjoint convex sets, then there exists a
hyperplane that separates them, meaning the hyperplane divides the space into two half-
spaces, with each set lying entirely in one of the half-spaces. Formally:

Let C1 and C2 be two convex sets, where C1


​ ​ ​ ∩ C2 = ∅ .

There exists a hyperplane H such that:

either C1 ⊆ {x : aT x ≤ β},
​ C2 ⊆ {x : aT x ≥ β}

for some vector a ∈ Rn and scalar β .

In simpler terms, there is a hyperplane that divides the space such that all points in C1 lie on

one side of the hyperplane and all points in C2 lie on the other side.

3. Application to Convex Sets and Points

In addition to separating two disjoint convex sets, the separation theorem can also be
applied to situations where we are trying to separate a convex set from a single point.
Suppose C is a convex set and p is a point not in C . The separation theorem asserts that
there exists a hyperplane that separates the point p from the convex set C , meaning:

There exists a hyperplane H such that:

p ∈ {x : aT x ≤ β}, C ⊆ {x : aT x ≥ β}

where a ∈ Rn and β ∈ R.

79/212
This result is significant in convex optimization because it tells us that if a point is outside a
convex set, we can always find a hyperplane that separates them, which is used in
optimization algorithms, particularly in duality theory and constraint qualification.

4. Proofs of Separation Theorems

4.1 Separation Between Two Convex Sets

Let’s consider two disjoint convex sets C1 and C2 . The goal is to find a hyperplane that
​ ​

separates them. We will sketch a basic outline of the proof.

Assume, for simplicity, that C1 and C2 are non-empty and disjoint. Since both are

convex, their geometric properties will help establish the separation.

Using a support vector (or a separating hyperplane), we show that there exists a
hyperplane such that each set lies entirely in one of the half-spaces determined by the
hyperplane.

The detailed proof relies on the supporting hyperplane theorem, which asserts that for any
convex set, there is a hyperplane that "supports" the set at any point on its boundary. This
supporting hyperplane can be extended to separate the two convex sets if they are disjoint.

4.2 Separation Between a Point and a Convex Set

For the case where we separate a convex set C from a point p, the proof is more
straightforward. If the point p is not in C , the point can be used to construct a separating
hyperplane based on the fact that the convex set does not include p.

One common proof technique is to use supporting hyperplanes. The supporting hyperplane
theorem states that for any point p outside a convex set C , there exists a hyperplane that
"just touches" C at a boundary point and separates p from the set.

The construction of the separating hyperplane involves finding the direction in which p and
C are "most apart" and using this direction to define the hyperplane.

5. Geometric Interpretation

The separation theorem has an intuitive geometric interpretation. If two convex sets are
disjoint, we can always find a hyperplane that divides the space such that each set lies
entirely on one side of the hyperplane. This is useful in optimization problems where the
constraints form convex sets and we need to separate feasible regions or separate the
objective function from the feasible region.

Similarly, when separating a point from a convex set, the hyperplane provides a way to
define a boundary that distinguishes between the point and the set. This concept is central

80/212
to methods such as convex hull algorithms and support vector machines in machine
learning.

6. Applications in Linear Programming

The separation theorem is instrumental in understanding the duality theory in linear


programming. In particular, the idea that the primal problem can be transformed into the
dual problem relies on separation arguments. In duality, the hyperplane separation helps
define the relationships between the constraints in the primal and the objective function in
the dual.

In the context of linear programming, the separation theorem can be used to prove Farkas'
Lemma, which provides a characterization of the solvability of systems of linear inequalities.
This lemma is closely related to the duality of linear programs.

7. Conclusion

The hyperplane separation problem is a key result in convex analysis and optimization. It
provides a way to separate convex sets using hyperplanes, a concept that has deep
implications for linear programming, convex optimization, and machine learning.
Understanding the separation of convex sets and points is crucial for the development of
efficient algorithms for solving optimization problems, particularly in the duality theory of
linear programming. In future lectures, we will explore further applications and proofs
related to convexity, including more advanced results in optimization theory.

Lecture 25: Farkas Lemma

1. Introduction to Farkas Lemma

Farkas' Lemma is a fundamental result in convex analysis and linear programming,


specifically in the context of separating a point from a convex cone. It provides a
characterization of the solvability of systems of linear inequalities and plays a crucial role in
the theory of linear programming, duality, and optimization.

The lemma is especially important because it allows us to understand conditions under


which a linear system has a solution by leveraging separation theorems for convex sets,
particularly focusing on the geometry of cones.

Farkas’ Lemma is often viewed as a generalization of the separation theorem for convex sets
but with a key difference: it deals specifically with the separation of a point from a convex
cone and guarantees that the separating hyperplane passes through the origin. This makes
Farkas' Lemma a cornerstone in understanding the duality in optimization problems.

81/212
2. Statement of Farkas Lemma

Farkas' Lemma can be stated as follows:

Let A ∈ Rm×n be a matrix, and let b ∈ Rm be a vector. Consider the system of linear
inequalities:

Ax ≥ b, x ≥ 0.

Then, one of the following two conditions must hold:

1. Existence of a solution: There exists a vector x ≥ 0 such that Ax ≥ b.


2. Existence of a separating hyperplane: There exists a vector y ∈ Rm such that:

y T A = 0, y T b < 0, y ≥ 0.

The lemma asserts that, for a given system of linear inequalities, either a solution exists (in
the form of a non-negative vector x satisfying the inequalities), or there exists a separating
hyperplane that separates the feasible region from the origin, with the hyperplane passing
through the origin. This hyperplane is represented by the vector y , which defines a linear
functional that is orthogonal to the row space of A.

3. Geometric Interpretation

Geometrically, the Farkas Lemma provides insight into the relationship between the
feasibility of a system of inequalities and the geometry of cones. To interpret it:

Convex cone: The set of all non-negative solutions to the system Ax ≥ b can be viewed
as a convex cone in Rn , denoted C = {x : Ax ≥ b, x ≥ 0}.
Separation condition: If no solution exists, the hyperplane y T A = 0 separates the
origin from the convex cone formed by the solutions of the system. The condition
y T b < 0 ensures that the hyperplane separates the origin from the feasible set,
effectively "cutting off" the solutions.

The hyperplane passes through the origin: This is the key aspect of Farkas' Lemma. It
tells us that the separating hyperplane defined by the vector y not only separates the
feasible region from the origin, but it also passes through the origin itself, thus defining
a linear relationship between the cone and the solution space.

4. Applications of Farkas Lemma

The Farkas Lemma has several important applications, particularly in the field of
optimization:

82/212
Duality theory: Farkas’ Lemma is closely related to the dual of a linear programming
problem. In fact, it forms the basis for understanding the duality theory in linear
programming, where the dual problem can be derived from the primal problem by using
the geometric separation arguments.

Characterization of feasible solutions: In optimization, Farkas’ Lemma provides a way


to check the feasibility of a system of linear constraints. It tells us that either there exists
a feasible solution or there exists a hyperplane separating the feasible region from the
origin, implying infeasibility.

Linear programming: Farkas' Lemma is often used in the proof of strong duality in
linear programming, which asserts that the optimal values of the primal and dual
problems are equal under certain conditions.

5. Proof Outline of Farkas Lemma

To understand the proof of Farkas' Lemma, we will break it down into steps:

5.1 Existence of a Solution (Direct Case)

If the system Ax ≥ b has a solution, then the first part of the lemma is trivially true. The
solution x is a vector in the non-negative orthant that satisfies the system of
inequalities.

5.2 Existence of a Separating Hyperplane (Contradiction)

If the system Ax ≥ b has no solution, we proceed by contradiction.


We know that the set C = {x : Ax ≥ b, x ≥ 0} is a convex cone. Since C does not
contain a solution, the point 0 (the origin) cannot be in the closure of C .

By the separation theorem for convex sets, there must exist a hyperplane that
separates the origin from the set C . This hyperplane can be described by a vector y such
that y T x ≥ 0 for all x ∈ C and y T 0 < 0. This guarantees that y T A = 0 and y T b < 0,
thus completing the proof.

5.3 Why y ≥0

The condition y ≥ 0 ensures that the hyperplane does not just touch C at a single point,
but rather separates the entire cone from the origin. The non-negativity of y ensures
that the separating hyperplane is oriented correctly to divide the space.

6. Conclusion

83/212
Farkas’ Lemma is a powerful result that has broad applications in linear programming,
convex optimization, and duality theory. It provides a concrete geometric criterion for
understanding the feasibility of a system of linear inequalities and allows us to separate the
feasible set from the infeasible region via a hyperplane passing through the origin.
Understanding Farkas' Lemma is crucial for grasping the deeper properties of linear
programs and convex optimization problems, particularly in the context of duality and
constraint qualification.

Lecture 26: How to Take Dual

1. Introduction to Duality

In this lecture, we introduce the concept of the dual of a linear programming problem.
Duality is a fundamental idea in optimization theory, particularly in linear programming, as it
provides a way to approach optimization problems from a different perspective. Instead of
directly solving the primal problem, one can solve its dual, which is often easier or more
insightful.

Duality connects two optimization problems: the primal problem and the dual problem. The
dual problem provides a lower bound for a maximization problem (or an upper bound for a
minimization problem), and it is often easier to solve due to the structure it derives from the
original problem.

The key insight is that every feasible solution of the primal maximization problem gives us a
lower bound on the optimal value, and we seek to find an upper bound by formulating the
dual problem.

2. Primal Problem and Dual Problem: Overview

Consider a primal linear programming problem in standard form:

maximize cT x

subject to:

Ax ≤ b, x ≥ 0.

Here:

c ∈ Rn is the objective vector,


A ∈ Rm×n is the matrix of constraints,
b ∈ Rm is the right-hand side of the inequalities,

84/212
x ∈ Rn is the vector of decision variables.

The goal is to maximize cT x subject to the constraints Ax ≤ b and x ≥ 0.


The dual of this primal problem is a new linear program that is related to the primal
problem, but typically involves minimizing instead of maximizing.

The dual problem is formulated as follows:

minimize bT y

subject to:

AT y ≥ c, y ≥ 0.

Here:

y ∈ Rm represents the vector of dual variables corresponding to the constraints Ax ≤


b,
bT y is the objective function of the dual, which involves minimizing the dot product of b
and the dual variables y ,

The condition AT y ≥ c ensures that the dual variables provide valid upper bounds for
the primal objective coefficients.

The dual problem provides bounds on the optimal value of the primal problem. In particular:

If the primal problem is feasible and bounded, the dual problem provides a lower bound
for the primal objective.

Conversely, the dual problem provides an upper bound for a minimization problem.

3. Geometric Interpretation of Duality

In geometric terms, the dual problem can be thought of as determining the tightest possible
upper bound for the primal problem's objective function. Here's how duality manifests
geometrically:

The primal problem aims to maximize a linear objective subject to constraints that form
a polytope (a bounded region). The feasible region defined by Ax ≤ b is a convex
polytope, and the objective function cT x is a linear function.

The dual problem involves finding a set of coefficients y (dual variables) that represent
how strongly the constraints in the primal problem should be weighted to provide an
optimal bound on the objective. The constraints in the dual problem are derived from

85/212
the primal problem’s coefficients, ensuring that any feasible dual solution offers an
upper bound on the primal objective.

The relationship between the feasible regions of the primal and dual problems can be
visualized as follows:

If the primal problem is feasible and bounded, the dual problem provides a guaranteed
lower bound on the primal objective value.

The dual can be viewed as a way to approach the primal problem from a different
direction, often simplifying the structure of the optimization.

4. Lower and Upper Bounds in Duality

A critical component of duality is the relationship between lower and upper bounds. We now
discuss how the dual provides an upper bound for the primal problem and how dual
variables can be interpreted as pricing or weight coefficients for each constraint in the primal
problem.

Primal problem: The primal maximization problem aims to find the maximum value of
cT x subject to Ax ≤ b. Any feasible solution of the primal problem gives us a lower
bound on the optimal objective value.

Dual problem: The dual problem minimizes bT y subject to the constraint AT y ≥ c. This
dual formulation gives us an upper bound on the primal objective value. The
relationship between the primal and dual objectives ensures that the optimal value of
the dual problem will provide a bound for the optimal value of the primal problem.

In the case of strong duality, if both the primal and dual problems are feasible and
bounded, the optimal values of the primal and dual problems are equal. This is known as
duality gap zero.

5. Primal-Dual Relationship in Optimization

The primal and dual problems are deeply related. The key idea is that the solution of one
problem provides valuable information about the solution to the other. Here are some
important aspects of the primal-dual relationship:

Weak Duality: The optimal value of the dual problem provides a bound on the optimal
value of the primal problem. Specifically, for a maximization primal problem and its
corresponding dual minimization problem, we have the following inequality:

primal objective value ≤ dual objective value.

86/212
This is known as weak duality.

Strong Duality: In some cases (under regularity conditions, such as Slater's condition for
convex problems), the optimal values of the primal and dual problems are equal. This is
called strong duality, and it allows us to solve either problem to obtain the optimal
solution for both.

Complementary Slackness: Complementary slackness provides a relationship between


the optimal solutions of the primal and dual problems. Specifically, for each pair of
corresponding primal and dual variables, one of the following must hold:

If a primal constraint is active (i.e., Ax = b), then the corresponding dual variable is
positive.

If a dual constraint is active (i.e., AT y = c), then the corresponding primal variable
is positive.

This condition is useful for verifying the optimality of a solution.

6. Example: Formulating the Dual Problem

To illustrate how the dual is formed, let’s consider a simple example:

Primal Problem:

maximize 3x1 + 2x2 ​ ​

subject to:

x1 + x2 ≤ 4,
​ ​ x1 ≥ 0,
​ x2 ≥ 0.

To formulate the dual problem:

The dual variables y1 and y2 correspond to the constraints x1


​ ​ ​ + x2 ≤ 4 and x1 ≥
​ ​

0, x2 ≥ 0, respectively.

The dual problem will then minimize 4y1 , subject to the dual constraints.

Dual Problem:

minimize 4y1 ​

subject to:

y1 ≥ 3,

y1 ≥ 2,

y1 ≥ 0.

87/212
Here, the dual problem provides an upper bound for the primal problem, and solving the
dual gives us information about the primal problem's optimal solution.

7. Conclusion

In this lecture, we introduced the dual problem and explained its formulation and
significance in linear programming. Duality provides a powerful tool for deriving bounds on
optimization problems, with the dual offering an upper bound on a maximization problem’s
objective. Understanding duality is crucial for optimization theory, as it connects two
problems and provides deeper insight into the structure of linear programming problems.
Through the use of the dual, we can gain a more complete understanding of the solution
space and the optimal values of primal and dual problems.

Lecture 27: Examples of Taking Dual


In this lecture, we will work through several examples to reinforce the process of formulating
the dual of a linear programming (LP) problem. The goal is to solidify the understanding that
the constraints of the primal correspond to the variables of the dual and vice versa.

1. Review of Primal-Dual Relationship

Before diving into the examples, let's briefly review the key concepts:

Primal Problem: A linear program typically seeks to maximize or minimize a linear


objective function subject to linear constraints. The general form of a primal
maximization problem is:

maximize cT x

subject to:

Ax ≤ b, x ≥ 0.
Dual Problem: The dual of a linear programming problem provides an alternative way of
formulating the same problem, typically minimizing or maximizing a different objective
function. The dual of the above primal maximization problem is:

minimize bT y

subject to:

AT y ≥ c, y ≥ 0.

In this dual formulation:

88/212
The primal variables x correspond to the dual constraints y .

The primal constraints (involving A, b) correspond to the dual variables.

2. Example 1: Simple Primal Problem

Let’s start with a simple primal problem to illustrate how the dual is formed:

Primal Problem:

maximize 2x1 + 3x2 ​ ​

subject to:

x1 + 2x2 ≤ 6,
​ ​

x1 + x2 ≤ 4, ​ ​

x1 ≥ 0,

x2 ≥ 0.

Step 1: Identify the Dual Variables

The primal has two constraints (the first and second inequalities), so we introduce dual
variables y1 and y2 for these constraints, respectively. The dual variables correspond to
​ ​

the constraints in the primal.

Step 2: Formulate the Dual Objective

The dual objective function is to minimize bT y = 6y1 + 4y2 .


​ ​

Step 3: Formulate the Dual Constraints

The dual constraints are derived from the primal's objective function. Each dual
constraint corresponds to a primal variable:

The primal objective involves x1 and x2 , so the dual constraints will correspond to
​ ​

x1 and x2 .
​ ​

For the dual constraints to hold, we need:


y1 + y2 ≥ 2
​ ​ (for x1 ), ​

2y1 + y2 ≥ 3 ​ ​ (for x2 ). ​

Step 4: Formulate the Dual Problem

Combining the objective and constraints, the dual problem is:


minimize 6y1 + 4y2
​ ​

subject to:

89/212
y1 + y2 ≥ 2,​ ​

2y1 + y2 ≥ 3, ​ ​

y1 ≥ 0,
​ y2 ≥ 0. ​

3. Example 2: Minimization Problem

Next, consider a primal minimization problem to demonstrate how the dual changes when
the objective of the primal is a minimization:

Primal Problem:

minimize 4x1 + 5x2 ​ ​

subject to:

x1 + x2 ≥ 2,
​ ​

2x1 + x2 ≥ 3,​ ​

x1 ≥ 0, ​ x2 ≥ 0. ​

Step 1: Identify the Dual Variables

The primal has two constraints, so we introduce dual variables y1 and y2 corresponding ​ ​

to these constraints.

Step 2: Formulate the Dual Objective

Since the primal is a minimization problem, the dual will be a maximization problem, and
the dual objective will be to maximize bT y = 2y1 + 3y2 . ​ ​

Step 3: Formulate the Dual Constraints

The dual constraints are:


y1 + 2y2 ≤ 4
​ ​
(for x1 ), ​

y1 + y2 ≤ 5
​ ​ (for x2 ). ​

Step 4: Formulate the Dual Problem

Combining the objective and constraints, the dual problem is:


maximize 2y1 + 3y2 ​ ​

subject to:

y1 + 2y2 ≤ 4,
​ ​

90/212
y1 + y2 ≤ 5,
​ ​

y1 ≥ 0, ​ y2 ≥ 0. ​

4. Example 3: Primal with Equality Constraints

Consider a primal problem where the constraints involve equalities. This case demonstrates
how the dual formulation changes when the primal problem has equality constraints.

Primal Problem:

maximize 3x1 + 4x2 ​ ​

subject to:

x1 + x2 = 5,
​ ​

x1 − x2 = 1,
​ ​

x1 ≥ 0,
​ x2 ≥ 0.

Step 1: Identify the Dual Variables

The primal has two equality constraints, so we introduce dual variables y1 and y2 for ​ ​

these equality constraints.

Step 2: Formulate the Dual Objective

The dual objective is to minimize 5y1 ​ + y2 , derived from the right-hand side of the

equality constraints.

Step 3: Formulate the Dual Constraints

The dual constraints are:


y1 + y2 ≥ 3
​ ​ (for x1 ), ​

y1 − y2 ≥ 4
​ ​
(for x2 ). ​

Step 4: Formulate the Dual Problem

The dual problem is:


minimize 5y1 + y2 ​ ​

subject to:

y1 + y2 ≥ 3,
​ ​

y1 − y2 ≥ 4,
​ ​

91/212
y1 ∈ R,
​ y2 ∈ R.

5. Key Points in Taking the Dual

Through these examples, we reinforce several key principles for formulating the dual:

Dual Variables Correspond to Primal Constraints: Each constraint in the primal


problem corresponds to a dual variable.

Dual Objective: The dual objective is derived from the right-hand side of the primal
constraints.

Dual Constraints: The dual constraints are based on the coefficients of the primal's
objective function and the relationships between the primal variables and constraints.

6. Conclusion

By working through these examples, we have solidified our understanding of how to take
the dual of a linear programming problem. The process involves:

1. Identifying dual variables for each primal constraint.

2. Formulating the dual objective function based on the primal’s right-hand side.

3. Writing the dual constraints based on the primal's objective function.

Understanding how to formulate the dual problem is crucial for solving linear programming
problems efficiently and gaining insights into the relationship between primal and dual
optimization problems.

Lecture 28: Strong Duality


In this lecture, we discuss strong duality, which is a key concept in the theory of linear
programming. Strong duality explains why the value of the dual linear program gives a
reliable bound on the primal linear program’s optimal value, and in fact, the values of the
primal and dual optimal solutions are equal under certain conditions. This result is
foundational to the understanding of duality theory in optimization.

1. Introduction to Duality

We begin by recalling the primal and dual problems from earlier discussions:

Primal Problem (Maximization):

maximize cT x

subject to:

92/212
Ax ≤ b, x ≥ 0.
Dual Problem (Minimization):

minimize bT y

subject to:

AT y ≥ c, y ≥ 0.

The primal problem seeks to maximize the objective cT x subject to the constraints Ax ≤b
and x ≥ 0. The dual problem, in contrast, seeks to minimize bT y subject to the constraints
AT y ≥ c and y ≥ 0.
We have seen how the primal and dual variables relate to one another, but we have not yet
answered why the dual value is a good indication of the primal value, or why, under certain
conditions, the primal and dual optimal values are exactly equal. This is where strong duality
comes into play.

2. Weak Duality Theorem

Before discussing strong duality, we briefly recall the weak duality theorem, which is the
starting point for strong duality:

The weak duality theorem states that for any feasible solution x to the primal problem
and any feasible solution y to the dual problem, the value of the dual objective function
provides a bound on the value of the primal objective function:

cT x ≤ bT y.

This means that the objective value of the primal is always less than or equal to the
objective value of the dual, provided that both solutions are feasible.

The weak duality theorem establishes that the primal objective cannot exceed the dual
objective, but it does not explain why or when these two values are equal at optimality. This
is the essence of strong duality.

3. Statement of Strong Duality

The strong duality theorem provides the critical link between the primal and dual problems.
It states that under certain conditions, the optimal values of the primal and dual problems
are equal. Specifically:

If the primal problem has an optimal solution, then the dual problem also has an
optimal solution, and the optimal objective values of both problems are the same:

93/212
optimal value of primal = optimal value of dual.

This result is what we refer to as strong duality.

4. The Role of Feasibility

The strong duality theorem holds under the assumption that both the primal and dual
problems are feasible. More specifically:

Primal Feasibility: There exists an x∗ such that Ax∗ ≤ b and x∗ ≥ 0.


Dual Feasibility: There exists a y ∗ such that AT y ∗ ≥ c and y ∗ ≥ 0.

However, infeasibility of either the primal or dual problem can prevent strong duality from
holding. If either the primal or dual is infeasible, the strong duality theorem does not apply,
and the relationship between the primal and dual becomes more complicated.

5. Proof Outline of Strong Duality (Using Farkas’ Lemma)

To prove strong duality, we use a combination of Farkas' Lemma (which we discussed in a


previous lecture) and hyperplane separation theorems. The intuition behind the proof
involves showing that if the primal has an optimal solution, the dual must also have one, and
their values will be equal.

1. Farkas' Lemma: Farkas' Lemma provides a criterion for when a system of linear
inequalities has a solution. It is a key component in proving strong duality, as it
essentially guarantees that if the primal has a feasible solution, then there is a
corresponding feasible solution to the dual.

2. Separation Theorem: The separation theorem states that if a point is outside a convex
set, there exists a hyperplane that separates the point from the set. Using this theorem,
we can prove that if the primal problem has an optimal solution, the dual problem will
also have a corresponding optimal solution, and the objective values will coincide.

3. Duality Gap: The duality gap is the difference between the objective values of the primal
and dual problems. Strong duality tells us that when both the primal and dual have
feasible solutions, this gap is zero, i.e., the values are equal.

6. Geometric Interpretation

Geometrically, strong duality can be understood through the concept of separation. If the
primal problem has an optimal solution, there exists a separating hyperplane that divides
the feasible region of the primal from the feasible region of the dual. This separation ensures
that the optimal objective values of both problems are equal.

94/212
Moreover, this relationship between the primal and dual problems is critical in understanding
the simplicity of optimization problems. The strong duality theorem guarantees that by
solving the dual problem, we can determine the optimal value of the primal problem, and
vice versa.

7. Implications of Strong Duality

Efficiency in Optimization: Strong duality allows us to solve the dual problem to find
bounds on the primal problem. Solving the dual can often be computationally easier,
especially in large-scale problems, since dual variables might reduce the dimensionality
of the problem.

Optimality Conditions: The equality of the primal and dual optimal values provides a
useful test for optimality. If we find a feasible solution to either the primal or dual, we
can use the other to check if we have found the optimal solution.

8. Conclusion

The strong duality theorem is a central result in linear programming, asserting that under
feasibility conditions, the optimal values of the primal and dual problems are equal. This
result follows from hyperplane separation theorems and is formalized through Farkas'
Lemma. Strong duality provides a deep connection between the primal and dual
optimization problems, allowing us to solve one problem to obtain insights into the other
and ensuring that the values of the two problems coincide when optimal solutions are found.

Lecture 29: Proof of Strong Duality


In this lecture, we formally prove the strong duality theorem for linear programming. The
strong duality theorem states that if both the primal and dual linear programs are feasible,
then the optimal values of the primal and dual objectives are equal. To prove this, we use
Farkas' Lemma, a powerful result from convex analysis, to handle the boundary cases where
one of the programs is infeasible. We also explore the additional dimension (corresponding
to the objective function) and apply Farkas’ Lemma to establish the equality of the optimal
values for the primal and dual problems.

1. Restating Strong Duality

Recall the primal and dual linear programs:

Primal Problem:

maximize cT x

95/212
subject to:

Ax ≤ b, x ≥ 0.
Dual Problem:

minimize bT y

subject to:

AT y ≥ c, y ≥ 0.

The strong duality theorem asserts that under the assumption of feasibility for both
problems, the optimal objective values of the primal and dual problems are equal:

optimal value of primal = optimal value of dual.

In this lecture, we provide the formal proof of this result.

2. Weak Duality Recap

Before delving into the proof, recall the weak duality theorem, which we discussed in an
earlier lecture. The weak duality theorem states that if x∗ is feasible for the primal and y ∗ is
feasible for the dual, then:

c T x∗ ≤ b T y ∗ .

This establishes an upper bound for the primal objective by the dual objective. However,
weak duality alone does not guarantee that the optimal values are equal; this is where
strong duality comes into play.

3. Handling Boundary Cases (Infeasibility)

To begin the proof of strong duality, we first consider the possibility that one of the
programs (either primal or dual) is infeasible.

Case 1: Primal Infeasibility: If the primal problem is infeasible, then no feasible solution
exists for the primal, and by the Farkas’ Lemma, the dual problem must be unbounded.
This implies that the dual problem cannot have a finite optimal value.

Case 2: Dual Infeasibility: Similarly, if the dual problem is infeasible, then no feasible
solution exists for the dual, and the primal problem must be unbounded.

If either problem is infeasible, then the other problem must be unbounded, and there is no
feasible solution for either problem.

96/212
In the case where both problems are feasible, we proceed with the proof of strong duality.

4. Feasibility for Both Programs

Suppose that both the primal and dual programs are feasible. That is, there exist x∗ ≥ 0 and
y ∗ ≥ 0 such that:

Ax∗ ≤ b, AT y ∗ ≥ c.

Now, we look at a slightly modified version of the primal and dual problems by adding an
extra dimension corresponding to the objective function.

5. Introducing an Extra Dimension

Consider the following extended linear program, which incorporates the objective function
into the constraints:

Primal (Extended):
Maximize cT x subject to:

Ax ≤ b, x ≥ 0.
Dual (Extended):
Minimize bT y subject to:

AT y ≥ c, y ≥ 0.

In the next step, we introduce the augmented system. This system involves an extra
dimension that includes the objective function. We will apply Farkas' Lemma to this
extended system to prove the equality of the primal and dual objective values.

6. Applying Farkas' Lemma

Farkas’ Lemma provides a necessary and sufficient condition for the solvability of a system of
≤ b, there
linear inequalities. Specifically, it states that for a system of linear inequalities Ax
exists a solution x ≥ 0 if and only if there is no y ≥ 0 such that AT y ≥ c and y T b < cT x.

To apply Farkas' Lemma in the context of strong duality, we consider the extended linear
program with the extra dimension. This allows us to argue that if a feasible solution exists for
both the primal and dual, then the optimal objective values of the primal and dual must
coincide.

7. Proving Strong Duality

We now conclude the proof:

97/212
Step 1: Existence of Optimal Solutions: By Farkas' Lemma, if the primal has a feasible
solution, the dual also has a feasible solution. Similarly, if the dual has a feasible
solution, the primal also has a feasible solution.

Step 2: Optimal Values: Since both the primal and dual problems are feasible, the weak
duality theorem guarantees that the value of the primal objective is less than or equal to
the value of the dual objective. In the extended system, the introduction of the objective
function ensures that this inequality becomes an equality when both problems are
feasible.

Step 3: Equality of Objectives: Thus, the optimal value of the primal problem is equal to
the optimal value of the dual problem.

This completes the proof of the strong duality theorem.

8. Conclusion

The strong duality theorem provides a powerful result in linear programming: if both the
primal and dual problems are feasible, then their optimal objective values are equal. The
proof of this theorem involves using Farkas' Lemma and extending the linear programs into
an additional dimension. By leveraging Farkas' Lemma, we establish that the optimal values
for both the primal and dual problems must coincide under feasibility conditions, thus
proving strong duality.

This result is fundamental to the theory of linear programming and has wide-reaching
implications in optimization and economic theory, where it is used to analyze resource
allocation, pricing, and more.

Lecture 30: Complementary Slackness


In this lecture, we explore complementary slackness, a key result derived from strong
duality in linear programming. Complementary slackness provides a relationship between
the primal variables and the dual constraints, as well as between the dual variables and the
primal constraints. This relationship is fundamental in optimization, particularly in
understanding how the primal and dual solutions interact.

1. Recap of Strong Duality

The strong duality theorem states that if both the primal and dual linear programs are
feasible, then the optimal values of the primal and dual objectives are equal. Formally:

Primal Problem:

98/212
maximize cT x

subject to:

Ax ≤ b, x ≥ 0.
Dual Problem:

minimize bT y

subject to:

AT y ≥ c, y ≥ 0.

If both the primal and dual are feasible, their objective values are equal, i.e.,:

c T x∗ = b T y ∗ .

2. Complementary Slackness Condition

Complementary slackness is a condition that must be satisfied by the optimal solutions of


the primal and dual problems. It provides a powerful insight into the structure of the optimal
solutions. The condition can be stated as follows:

For each i, the i-th primal variable xi and the i-th dual constraint yi satisfy:
​ ​

xi (Ax − b)i = 0
​ ​ for all i,

where (Ax − b)i is the i-th component of the vector Ax − b (the residual of the primal

constraint).

Similarly, for each j , the j -th dual variable yj and the j -th primal constraint xj satisfy:
​ ​

yj (AT y − c)j = 0
​ ​
for all j,

where (AT y − c)j is the j -th component of the vector AT y − c (the residual of the dual

constraint).

These two conditions together form the complementary slackness condition, which states
that for every pair of primal and dual variables, at least one of the following must hold:

xi = 0 implies (Ax − b)i = 0 (primal variable is zero means the corresponding primal
​ ​

constraint is tight),

yj = 0 implies (AT y − c)j = 0 (dual variable is zero means the corresponding dual
​ ​

constraint is tight).

99/212
3. Intuition Behind Complementary Slackness

The complementary slackness conditions provide deep insight into how the primal and dual
solutions interact:

​ > 0), then the corresponding primal constraint


If a primal variable xi is positive (i.e., xi ​

Ax ≤ b must be strict. This implies that (Ax − b)i = 0, meaning the primal constraint ​

is tight for this index.

> 0), then the corresponding dual constraint


If a dual variable yj is positive (i.e., yj ​

AT y ≥ c must be tight, meaning (AT y − c)j = 0. ​

The complementary slackness condition ensures that if one variable is positive, the
corresponding constraint is tight, and vice versa. This condition helps us understand the
structure of the optimal solution and how changes in one solution (primal or dual) affect the
other.

4. Derivation of Complementary Slackness

To derive complementary slackness, we start with the weak duality theorem, which states
that for any feasible solution x to the primal and y to the dual, the following inequality holds:

cT x ≤ bT y.

Now, let us assume that the primal and dual are feasible and that we have reached optimal
solutions for both. We know from strong duality that:

c T x∗ = b T y ∗ .

For this equality to hold, we must have:

for all i, x∗i ⋅ (Ax∗ − b)i = 0.


​ ​

This means that if x∗i


> 0, then (Ax∗ − b)i = 0, i.e., the i-th primal constraint is tight.
​ ​

Conversely, if (Ax∗ − b)i > 0, then x∗i = 0.


​ ​

A similar argument holds for the dual variables:

for all j, yj∗ ⋅ (AT y ∗ − c)j = 0.


​ ​

This means that if yj∗ > 0, then (AT y ∗ − c)j = 0, i.e., the j -th dual constraint is tight.
​ ​

Conversely, if (AT y ∗ − c)j > 0, then yj∗ = 0.


​ ​

Thus, the complementary slackness condition follows directly from the equality cT x∗ = bT y ∗
, which holds due to strong duality.

100/212
5. Implications of Complementary Slackness

Complementary slackness provides several important insights:

Optimality Condition: The condition gives a practical way to check if a pair of solutions
(x∗ , y ∗ ) is optimal. If x∗ and y ∗ satisfy the complementary slackness conditions, then
they are optimal solutions for the primal and dual problems, respectively.

Dual Variables Interpretation: If a dual variable yj∗ = 0, it implies that the


corresponding primal constraint is not binding and does not affect the optimal solution.
Conversely, if yj∗ ​ > 0, the corresponding primal constraint is tight (binding), and the
value of yj∗ reflects the shadow price of the constraint in the primal problem.

Sensitivity Analysis: Complementary slackness plays a key role in sensitivity analysis. It


tells us how sensitive the optimal solution is to changes in the constraints or objective
function. The primal and dual variables’ relationship gives us a way to predict the effects
of perturbations in the system.

6. Conclusion

Complementary slackness is a fundamental concept in linear programming that describes


the strong relationship between the primal variables and dual constraints. It provides a
direct method to check optimality and gives deep insights into the interaction between the
primal and dual solutions. By ensuring that at least one of each pair of primal and dual
variables is zero, complementary slackness reveals the structure of the optimal solution and
is essential for both theory and practical applications of linear programming.

Lecture 31: Introduction to Algorithmic Game Theory


In this lecture, we introduce Algorithmic Game Theory, focusing on its application to linear
programming and duality theory. Specifically, we begin with an exploration of two-player
zero-sum games, which provide a natural setting to apply duality concepts and optimization
techniques such as linear programming. The key goal is to understand the structure of these
games and how they can be modeled and solved using linear programming methods.

1. Game Theory Overview

Game theory studies strategic interactions where the outcomes depend on the actions of
multiple agents (players), each with their own objectives. The fundamental concepts in game
theory include:

Players: The decision-makers in the game.

Strategies: The set of actions or decisions that each player can choose.

101/212
Payoffs: The outcomes or rewards each player receives based on the strategies chosen
by all players.

Games: The framework of interactions between players, defined by their available


strategies and payoffs.

A two-player game involves two players who make decisions simultaneously or in sequence,
with the goal of maximizing their own payoff while potentially minimizing the opponent's
payoff.

2. Zero-Sum Games

A zero-sum game is a special type of game where the total payoff for all players always sums
to zero. This means that one player's gain is exactly another player's loss. In mathematical
terms, if there are two players, A and B, with payoffs PA and PB , then:
​ ​

PA + PB = 0
​ ​

Thus, a gain for one player corresponds to an equivalent loss for the other.

The main objective in a zero-sum game is for each player to maximize their own payoff while
minimizing the opponent’s payoff. The game is typically represented in a payoff matrix.

3. Two-Player Zero-Sum Game Setup

Consider a simple two-player zero-sum game where Player 1 (Row player) and Player 2
(Column player) are involved. The game can be represented as a payoff matrix A where:

The rows correspond to the strategies of Player 1.

The columns correspond to the strategies of Player 2.

The entries Aij in the matrix represent the payoff to Player 1 when Player 1 chooses

strategy i and Player 2 chooses strategy j . The payoff to Player 2 is the negative of this
value (since the game is zero-sum).

Let’s denote the strategies of Player 1 as x1 , x2 , … , xm and the strategies of Player 2 as


​ ​ ​

y1 , y2 , … , yn , where:
​ ​ ​

x = (x1 , x2 , … , xm ) represents the mixed strategy (probability distribution over


​ ​ ​

strategies) of Player 1.

y = (y1 , y2 , … , yn ) represents the mixed strategy of Player 2.


​ ​ ​

4. Mixed Strategy and Linear Programming Formulation

102/212
In a zero-sum game, players are typically assumed to choose mixed strategies, which are
probability distributions over their set of pure strategies. The goal of each player is to
maximize their expected payoff, considering the strategies of the opponent.

4.1 Player 1’s Objective: Maximizing Expected Payoff

Let the mixed strategy of Player 1 be denoted by a vector x = (x1 , x2 , … , xm ), where xi ​ ​ ​ ​

represents the probability that Player 1 will choose the i-th pure strategy. The vector x must
satisfy the following conditions:

m
∑ xi = 1,
​ ​ xi ≥ 0
​ for all i.
i=1

Player 1’s expected payoff given Player 2’s strategy y = (y1 , y2 , … , yn ) is: ​ ​

m n
Payoff1 (x, y) = ∑ ∑ Aij xi yj .
​ ​ ​ ​ ​ ​

i=1 j=1

Player 1 aims to maximize this payoff by selecting an optimal mixed strategy x. This can be
formulated as a linear program:
m n
Maximize ∑ ∑ Aij xi yj
​ ​ ​ ​ ​

i=1 j=1

subject to:

m
∑ xi = 1,
​ ​ xi ≥ 0
​ for all i.
i=1

4.2 Player 2’s Objective: Minimizing Expected Payoff

Similarly, Player 2 wants to minimize Player 1’s payoff. Let y= (y1 , y2 , … , yn ) be the mixed ​ ​ ​

strategy of Player 2, where yj represents the probability of choosing the j -th strategy. The

mixed strategy y must also satisfy:

n
∑ yj = 1, ​ yj ≥ 0
​ for all j.
j=1

Player 2’s expected payoff is given by:

m n
Payoff2 (x, y) = ∑ ∑ Aij xi yj ,
​ ​ ​ ​ ​ ​

i=1 j=1

103/212
and Player 2 aims to minimize this payoff. Thus, Player 2’s objective is to find a mixed
strategy y that minimizes the expected payoff to Player 1, subject to the constraints on y .

5. Linear Programming Duality in Zero-Sum Games

In a two-player zero-sum game, we can use linear programming duality to relate the
optimal strategies of the two players. The primal problem is to maximize Player 1’s expected
payoff (while Player 2 minimizes this). The dual problem involves finding Player 2’s optimal
mixed strategy, and the strong duality theorem ensures that the values of the primal and
dual problems are equal, giving us a solution to the game.

The dual variables correspond to the constraints on Player 1’s and Player 2’s strategies, and
complementary slackness will ensure that the solution satisfies the necessary conditions for
optimality.

6. Algorithmic Approach: Solving Zero-Sum Games

The Linear Programming approach provides a systematic way to compute the optimal
strategies for both players in zero-sum games.

Simplex algorithm or other optimization algorithms can be employed to solve the linear
programs efficiently.

Nash equilibrium for two-player zero-sum games is characterized by the optimal mixed
strategies of both players, where neither player can improve their expected payoff by
unilaterally changing their strategy.

7. Applications of Zero-Sum Games

Zero-sum games and their solution via linear programming have widespread applications in:

Economic modeling: Competitive markets where one’s gain is another’s loss.

Cryptography: Secure communications often rely on game-theoretic models.

Auctions and bidding strategies: Where the bidders are competing against each other
in a zero-sum context.

Military strategy and conflict resolution: In situations where one player's gain
corresponds directly to the loss of the opponent.

8. Conclusion

This lecture introduces the foundational concepts of Algorithmic Game Theory and the
application of linear programming to solve two-player zero-sum games. We have shown
how linear programs can be used to model the strategic interactions between players in such

104/212
games, and how duality theory plays a critical role in deriving optimal solutions. These
concepts lay the groundwork for more advanced topics in game theory and optimization.

Lecture 32: Nash Equilibrium


In this lecture, we introduce the concept of Nash equilibrium within the context of game
theory. We explore its definition, significance, and address the computational question of
whether a Nash equilibrium exists for every game.

1. Game Theory Recap

As previously discussed, game theory studies strategic interactions between players, where
each player's payoff depends not only on their own actions but also on the actions of other
players. A game is defined by:

Players: The decision-makers in the game.

Strategies: The actions available to each player.

Payoffs: The rewards or costs associated with each combination of actions chosen by the
players.

In the context of two-player games, each player’s strategy influences the payoff they receive,
and the goal is to find the strategy that maximizes each player's payoff, given the strategy of
the other player.

2. Nash Equilibrium: Definition

A Nash equilibrium is a strategy profile (a combination of strategies, one for each player)
where no player can improve their payoff by unilaterally changing their own strategy,
assuming the strategies of the other players remain fixed. In other words, given the
strategies of all other players, a Nash equilibrium occurs when each player’s strategy is
optimal for them, and no player has an incentive to deviate.

Formally, for a game with n players, let each player i have a set of strategies Si and a payoff ​

function ui (s1 , s2 , … , sn ), where sj is the strategy of player j and i


​ ​ ​ ​ ​  j . A Nash
=
equilibrium is a strategy profile (s∗1 , s∗2 , … , s∗n ) such that:
​ ​ ​

ui (s∗1 , s∗2 , … , s∗n ) ≥ ui (s∗1 , … , s∗i−1 , si , s∗i+1 , … , s∗n )


​ ​ ​ ​ ​ ​ ​ ​ ​ ​

for all si ​ ∈ Si , i.e., no player i can improve their payoff by changing their strategy, assuming

the other players stick to their strategies.

For two players, this condition means that, given Player 2’s strategy, Player 1’s strategy is
optimal, and vice versa.

105/212
3. Existence of Nash Equilibrium

One of the central results in game theory is Nash’s existence theorem, which asserts that:

A Nash equilibrium always exists for any game in which players have finite strategy sets
and the payoff functions are continuous in the players' strategies.

This means that in finite games (games with a finite number of players and strategies), there
is always at least one strategy profile where no player has an incentive to unilaterally deviate.

The Nash equilibrium can occur in pure strategies, where players choose specific actions
with certainty, or in mixed strategies, where players randomize over their available
strategies.

Nash’s theorem was proven by John Nash in 1950 and applies to both zero-sum games and
more general games, including those with non-zero-sum payoffs.

4. Computational Question: Does a Nash Equilibrium Exist for Every Game?

While Nash’s existence theorem guarantees the existence of a Nash equilibrium in finite
games, it does not provide a method for finding one efficiently. The computational
complexity of finding a Nash equilibrium has been a key area of research in algorithmic
game theory.

Existence vs. Computability: Although Nash equilibrium exists in every finite game (with
continuous payoffs), finding the equilibrium can be computationally difficult. The
problem of computing a Nash equilibrium, even for simple games, can be quite hard.

Algorithmic Challenges: In some cases, finding a Nash equilibrium may require


checking many different strategy profiles to see if they satisfy the equilibrium condition.
This can be very computationally expensive, particularly as the number of players and
strategies increases.

In practical terms, finding a Nash equilibrium may require sophisticated algorithms. For
certain classes of games, like two-player zero-sum games, the equilibrium can be found
efficiently using linear programming or dynamic programming techniques. However, for
more general games, computing the Nash equilibrium is often a difficult problem.

5. Pure vs Mixed Strategies

Pure Strategy Nash Equilibrium: In a pure strategy equilibrium, each player chooses
one strategy with certainty. This is the most straightforward form of equilibrium.
However, pure strategy Nash equilibria do not always exist in every game.

106/212
Mixed Strategy Nash Equilibrium: In a mixed strategy equilibrium, players randomize
over their possible strategies. Even if a game does not have a pure strategy Nash
equilibrium, it may still have a mixed strategy Nash equilibrium. For example, in rock-
paper-scissors, there is no pure strategy equilibrium, but the mixed strategy equilibrium
involves each player randomizing their choices with equal probability.

6. Examples of Nash Equilibrium

6.1 Example: The Prisoner's Dilemma

In the Prisoner's Dilemma, two suspects are arrested for a crime. Each prisoner has two
choices: cooperate with the other by remaining silent, or defect by betraying the other. The
payoffs are structured so that mutual defection leads to a worse outcome than mutual
cooperation, but each player’s best response is to defect, regardless of what the other does.

The game’s payoff matrix is:

Cooperate Defect

Cooperate (-1, -1) (-3, 0)

Defect (0, -3) (-2, -2)

In this case, the Nash equilibrium is for both players to defect, as neither player can improve
their payoff by changing their strategy, given the strategy of the other.

6.2 Example: Matching Pennies

In Matching Pennies, each player has two strategies: heads (H) or tails (T). Player 1 wins if
the pennies match, and Player 2 wins if they do not.

The payoff matrix is:

H T

H (1, -1) (-1, 1)

T (-1, 1) (1, -1)

The Nash equilibrium in this game is for both players to randomize between heads and tails
with equal probability (i.e., choosing H or T with probability 0.5).

7. Applications of Nash Equilibrium

Nash equilibrium has widespread applications across various fields:

Economics: In oligopoly markets, firms choose strategies (pricing, production quantities,


etc.) in competition with one another.

107/212
Political Science: In voting systems and coalition formation, players (politicians) must
select strategies that account for the actions of others.

Computer Science: In network routing, distributed algorithms, and auction design, Nash
equilibrium helps predict how rational agents behave.

Evolutionary Biology: In evolutionary game theory, Nash equilibrium helps explain the
stable strategies that evolve in populations over time.

8. Conclusion

In this lecture, we introduced the concept of Nash equilibrium and discussed its significance
in game theory. We established that a Nash equilibrium always exists in finite games with
continuous payoffs, but the challenge lies in efficiently finding it. The lecture highlighted the
differences between pure and mixed strategy equilibria and provided illustrative examples to
better understand the equilibrium concept. Finally, we touched upon the wide-ranging
applications of Nash equilibrium across disciplines.

Lecture 33: Minimax and Nash Equilibrium


In this lecture, we prove a fundamental result in game theory: there exists a Nash
equilibrium for any two-player zero-sum game. This result is equivalent to the concept of
strong duality in linear programming. We will frame the payoffs for individual players as a
linear program, and through the application of strong duality, we will demonstrate the
existence of a Nash equilibrium.

1. Two-Player Zero-Sum Games

A zero-sum game is a type of game in which the total payoff (or loss) of all players is always
zero. In a two-player zero-sum game, if one player gains a certain amount, the other player
must lose the same amount. Mathematically, if Player 1 gains x, then Player 2 loses x, and
the total payoff is x + (−x) = 0.
For a two-player zero-sum game:

Player 1's objective: Maximize their payoff.

Player 2's objective: Minimize Player 1's payoff (equivalently, maximize their own payoff,
since Player 2’s payoff is the negative of Player 1’s payoff).

We can represent such a game using a payoff matrix. The rows correspond to the possible
strategies of Player 1, and the columns correspond to the strategies of Player 2. The entries

108/212
in the matrix represent the payoffs to Player 1, with Player 2’s payoff being the negative of
the value in the corresponding entry.

2. Minimax Theorem

The Minimax Theorem provides a solution to zero-sum games by asserting that:

There exists a strategy for Player 1 that minimizes the maximum possible loss, and
similarly,

There exists a strategy for Player 2 that maximizes the minimum possible gain.

This theorem can be formulated as follows:

Player 1's strategy: Player 1 seeks to choose a mixed strategy (a probability distribution
over the available pure strategies) that maximizes their minimum expected payoff,
given Player 2's possible strategies.

Player 2's strategy: Player 2 seeks to minimize Player 1's maximum expected payoff,
given Player 1's possible strategies.

This result establishes that both players can secure a guaranteed outcome (a value of the
game) even if they are playing against an adversary using the worst possible strategy.

3. Linear Programming Formulation of Zero-Sum Games

We can frame the payoffs of a zero-sum game in a linear programming framework. Consider
the following setup:

Let A be the payoff matrix for Player 1, where the entry Aij represents the payoff to

Player 1 when Player 1 chooses pure strategy i and Player 2 chooses pure strategy j .

Player 1 chooses a mixed strategy x = (x1 , x2 , … , xm ), where xi is the probability of


​ ​ ​ ​

choosing pure strategy i.

Player 2 chooses a mixed strategy y = (y1 , y2 , … , yn ), where yj is the probability of


​ ​ ​ ​

choosing pure strategy j .

The objective for Player 1 is to maximize their expected payoff, while the objective for Player
2 is to minimize Player 1’s payoff.

4. Player 1’s Linear Program

The payoff to Player 1 when they play mixed strategy x against Player 2’s mixed strategy y is
given by:

109/212
Payoff = xT Ay

Player 1’s objective is to maximize their expected payoff, while Player 2 will minimize this.
The linear program for Player 1 can be written as:

Maximize v
m
subject to ∑ xi = 1, ​ ​
xi ≥ 0

∀i
​ i=1 ​

n
∑ Aij yj ≥ v
​ ​ ​ ∀i
j=1

Where v represents the value of the game, which is the expected payoff for Player 1. This
formulation maximizes v subject to the condition that Player 1’s expected payoff is at least v
for all possible strategies of Player 2.

5. Player 2’s Linear Program

Similarly, Player 2’s objective is to minimize the expected payoff to Player 1, which is the
negative of the value. Player 2’s linear program is:

Minimize v
n
subject to ∑ yj = 1, ​ ​ yj ≥ 0
​ ∀j
j=1 ​

m
∑ Aij xi ≤ v
​ ​ ​ ∀j
i=1

Here, v is the value of the game, and the condition ensures that Player 2 minimizes the
maximum expected payoff to Player 1.

6. Strong Duality and Existence of Nash Equilibrium

The Minimax Theorem states that the value of the game is the same for both players, i.e.,
the maximin value for Player 1 equals the minimax value for Player 2. This result is closely
related to the concept of strong duality in linear programming.

Strong duality ensures that the value of the primal problem (Player 1’s problem) is equal
to the value of the dual problem (Player 2’s problem), provided both the primal and dual
problems are feasible.

In the context of two-player zero-sum games, the primal problem corresponds to Player
1’s linear program, and the dual problem corresponds to Player 2’s linear program.

110/212
By strong duality, we can conclude that the solutions to both linear programs yield the same
value of the game, and the strategies that achieve this value correspond to a Nash
equilibrium in the zero-sum game. Specifically, the mixed strategies x∗ and y ∗ derived from
the solutions to the linear programs will form a Nash equilibrium because:

Player 1 cannot improve their payoff by changing their strategy given Player 2’s strategy.

Player 2 cannot improve their payoff by changing their strategy given Player 1’s strategy.

Thus, through the duality theory in linear programming, we not only guarantee the existence
of a Nash equilibrium but also provide an explicit method for computing the equilibrium in
two-player zero-sum games.

7. Conclusion

In this lecture, we have demonstrated that the Minimax Theorem, which asserts the
existence of a Nash equilibrium in two-player zero-sum games, is equivalent to strong
duality in linear programming. By framing the payoffs for individual players as linear
programs, we leveraged the strong duality theorem to show that the strategies that
maximize and minimize the expected payoffs for the players lead to a Nash equilibrium. This
connection provides a powerful method for analyzing and solving two-player zero-sum
games.

Lecture 34: Deterministic Communication Complexity


In this lecture, we explore an important application of the minimax theorem by connecting
it to communication complexity, particularly the deterministic communication complexity.
This field studies the amount of communication required for two parties to compute a
function or solve a problem collaboratively, given that they only have access to partial
information.

The minimax theorem, which ensures the existence of a Nash equilibrium in two-player zero-
sum games, has direct implications for communication complexity. Specifically, we will see
how it can be used to analyze the communication required for certain problems.

1. Introduction to Communication Complexity

Communication complexity deals with the problem of determining how much


communication is needed between two parties, typically Alice and Bob, to compute a
function. In this setup, Alice and Bob each have some private inputs, and they communicate
via messages to compute an agreed-upon output.

111/212
The goal is to minimize the number of bits exchanged between Alice and Bob while ensuring
that the correct output is computed. The problem of communication complexity can be
formalized as follows:

Alice has input x from some domain X ,

Bob has input y from some domain Y ,

They must compute some function f (x, y) of their inputs, where the output is decided
by a communication protocol.

The complexity of the problem is defined as the number of bits exchanged between Alice
and Bob during the computation, and it can depend on the function f (x, y) as well as the
specific communication protocol used.

2. Deterministic Communication Complexity

In deterministic communication complexity, the parties must follow a predefined


communication protocol, and there is no randomness involved. That is, at each step, the
actions of Alice and Bob are completely determined by their respective inputs and previous
messages.

The deterministic communication complexity of a function f (x, y) is the minimum number


of bits Alice and Bob need to exchange in the worst case to compute f (x, y) correctly, under
the assumption that the protocol must always work deterministically for all possible input
pairs (x, y).

For example, consider a function f (x, y) where Alice knows x, Bob knows y , and they need
to compute the output f (x, y). A communication protocol might involve Alice sending a
message to Bob, then Bob responding with another message, and possibly Alice sending a
final message. The total number of bits exchanged in this process gives the deterministic
communication complexity.

3. Connection to Minimax Theorem

The minimax theorem is crucial in understanding communication complexity because it


provides a formal connection between the game-theoretic view of communication protocols
and the deterministic communication cost. In particular, the minimax theorem guarantees
that certain strategies for Alice and Bob (in a two-player zero-sum game) will lead to a Nash
equilibrium, providing a framework for optimizing communication.

In the context of communication complexity, one way to view Alice and Bob's interaction is
through the lens of a zero-sum game. Suppose we are interested in computing a function

112/212
f (x, y). We can think of Alice and Bob as playing a game where:
Alice chooses a strategy based on her input x,

Bob chooses a strategy based on his input y ,

They must agree on the outcome f (x, y).

The minimax theorem suggests that there exists an optimal strategy for each player that
minimizes the worst-case outcome for the other player. This game-theoretic perspective
allows us to model the communication complexity of certain problems by converting them
into game-theoretic formulations.

4. Example: Communication Complexity of AND Function

A classical example in communication complexity is the AND function. Suppose Alice’s input
x is a binary string of length n, and Bob’s input y is also a binary string of length n. The
function f (x, y) computes the AND of all the bits in x and y , i.e., f (x, y) = x1 ∧ y1 ∧ ⋯ ∧
​ ​

xn ∧ y n .
​ ​

The deterministic communication complexity of this function can be analyzed using the
minimax theorem. If both Alice and Bob want to minimize the number of bits exchanged
while ensuring the correct output, they can use optimal strategies based on the minimax
result. For instance, Alice could send a message encoding enough information about her bits
such that Bob can deduce the correct outcome without needing to communicate every
individual bit.

5. Applications of Minimax Theorem in Communication Complexity

The minimax theorem plays a key role in providing upper and lower bounds on the
communication complexity of many computational problems. By converting a
communication problem into a game-theoretic setting, we can apply the minimax theorem
to analyze the minimal communication required in the worst case, offering both theoretical
insights and practical strategies for designing efficient communication protocols.

Key applications of the minimax theorem in communication complexity include:

Lower bounds for deterministic communication complexity: The minimax theorem


helps in proving that certain problems require a minimum amount of communication,
even with the best strategy.

Upper bounds: By using strategies derived from the minimax framework, we can also
provide upper bounds, showing that a given amount of communication is sufficient to
solve the problem in the worst case.

113/212
Protocol design: In practice, the minimax theorem guides the construction of
communication protocols that minimize the amount of data exchanged, especially in
large-scale distributed systems.

6. Summary

In this lecture, we introduced the concept of deterministic communication complexity,


where the goal is to minimize the number of bits exchanged between Alice and Bob to
compute a function f (x, y). We explored the connection to the minimax theorem and how
this game-theoretic result can be used to optimize communication strategies.

The minimax theorem allows us to analyze communication problems as two-player zero-sum


games, helping to derive lower and upper bounds on communication complexity.
Understanding this connection provides insights into designing efficient communication
protocols for various computational problems.

Lecture 35: Randomized Communication Complexity


In this lecture, we extend our discussion of communication complexity by considering
randomized communication protocols. When randomization is allowed, communication
protocols can use random bits in addition to the inputs x and y to decide the messages
exchanged between Alice and Bob. This introduces a new level of flexibility and potential
efficiency compared to deterministic protocols.

1. Introduction to Randomized Communication Complexity

In randomized communication complexity, the parties Alice and Bob can use random bits
during their communication. This means that the protocol can make probabilistic decisions
at each step. The goal is still to compute a function f (x, y) of their inputs, but the challenge
now is to minimize the expected number of bits exchanged while ensuring that the correct
output is computed with high probability.

A randomized communication protocol has the following characteristics:

Randomness: Alice and Bob are allowed to flip random coins during their
communication. This randomness can influence the decisions made during the protocol.

Error Probability: The protocol is allowed to make mistakes, but only with some small
probability. That is, the protocol should compute the correct output with probability at
least 1 − ϵ, for some small error tolerance ϵ, where ϵ is typically very small (e.g., ϵ =
0.01).

114/212
The randomized communication complexity of a function f (x, y) is the minimum expected
number of bits that Alice and Bob must exchange to compute f (x, y) correctly with high
probability, using a randomized protocol.

2. Difference Between Deterministic and Randomized Communication Complexity

In deterministic communication complexity, the actions of Alice and Bob are completely
determined by their inputs, and no randomness is involved. The goal is to find the protocol
that minimizes the total number of bits exchanged in the worst case, ensuring that the
protocol always computes the correct output.

In contrast, in randomized communication complexity, the protocol can use random bits,
which means that it may involve stochastic decision-making. The advantage is that this
randomness allows the protocol to be potentially more efficient in terms of communication,
since the protocol can sometimes avoid the need to send as many bits as in a deterministic
protocol.

Key Differences:

Error Tolerance: Randomized protocols allow a controlled probability of error, whereas


deterministic protocols always compute the correct result.

Efficiency: Randomized protocols can sometimes achieve lower communication


complexity (in terms of the expected number of bits exchanged) compared to
deterministic protocols.

3. Types of Randomized Communication Protocols

There are two common types of randomized protocols in communication complexity:

Las Vegas Protocols: These are randomized protocols that always output the correct
result but may use a variable number of communication rounds, depending on the
random choices. The expected number of bits exchanged is minimized, and the protocol
always terminates with the correct answer. These protocols are guaranteed to be correct
but may require more communication than deterministic protocols in some cases.

Monte Carlo Protocols: These are randomized protocols where the output may not
always be correct. However, the protocol is designed such that the probability of error is
very low (for example, less than 1% or ϵ). These protocols aim to minimize the expected
communication while allowing for small errors.

4. Example: The Disjointness Problem

A classic example in randomized communication complexity is the disjointness problem.

115/212
Suppose Alice has a set A ⊆ {1, 2, ..., n}, and Bob has a set B ⊆ {1, 2, ..., n}.
The goal is to determine if the two sets are disjoint, i.e., if A ∩ B = ∅.

In the deterministic version of this problem, Alice and Bob may need to communicate all the
information in their sets to ensure the correct answer, which could involve exchanging up to
O(n) bits.
However, when allowing randomization, Alice and Bob can use a randomized protocol that
exchanges fewer bits. One well-known protocol involves Alice sending a random subset of
her set to Bob, and Bob checks if there is any overlap with his set. This allows them to answer
whether the sets are disjoint with high probability, using fewer bits than the deterministic
approach.

The expected number of bits exchanged in the randomized protocol is significantly lower
than in the deterministic protocol, showcasing the efficiency of randomized communication.

5. Lower Bounds on Randomized Communication Complexity

It is generally more challenging to establish lower bounds on randomized communication


complexity than on deterministic complexity. This is because randomization can exploit the
inherent uncertainty in the problem, allowing for more efficient protocols in some cases.

For example, for certain problems, there might be no deterministic communication protocol
that can solve the problem efficiently, but a randomized protocol can use randomness to
dramatically reduce the communication complexity.

A well-known result in this area is Yao's Minimax Principle in the context of randomized
communication complexity. It states that the minimum expected communication
complexity of a function f (x, y) is the same as the maximum probability with which one
can win a game based on the inputs x and y , where the game is played using randomization.

6. Applications of Randomized Communication Complexity

Randomized communication protocols have wide applications, particularly in distributed


computing, parallel computing, and cryptography. By minimizing communication,
randomized protocols can speed up computations in large-scale systems where bandwidth is
a limiting factor.

Some applications include:

Distributed Algorithms: Randomized communication can be used in distributed


algorithms where different nodes must exchange information in order to compute a
global result.

116/212
Parallel Computing: In parallel computing, minimizing the communication between
different processing units is critical for improving performance. Randomized protocols
can help reduce communication overhead in such systems.

Cryptography: Many cryptographic protocols rely on communication complexity, and


randomized protocols can be used to ensure both efficiency and security.

7. Summary

In this lecture, we introduced randomized communication complexity, where Alice and Bob
are allowed to use random bits to improve the efficiency of their communication protocol.
We discussed how randomized protocols can outperform deterministic protocols in terms of
the expected number of bits exchanged, especially when the problem allows for a small error
probability.

The Las Vegas and Monte Carlo types of randomized protocols were introduced, and we
explored how these protocols can be applied to problems like the disjointness problem. We
also highlighted the challenges in proving lower bounds for randomized communication
complexity and discussed some applications of randomized communication in various fields.

Lecture 36: Yao's Minimax Theorem


In this lecture, we discuss Yao's Minimax Theorem, a central result in the field of
communication complexity and algorithmic game theory. This theorem provides a way to
relate the performance of randomized and deterministic communication protocols by
showing a connection between average-case and worst-case analysis.

1. Introduction to Yao's Minimax Theorem

Yao’s Minimax Theorem is a powerful tool in the analysis of randomized algorithms. It


essentially tells us that the average-case performance of a randomized protocol can be
bounded by considering a corresponding deterministic problem. This result establishes an
equivalence between two seemingly different types of analysis: the worst-case analysis of
deterministic protocols and the average-case analysis of randomized protocols.

Formally, the theorem applies to randomized communication complexity and states:

Minimax Theorem: For any randomized communication protocol solving a function f ,


the expected number of bits exchanged in the best possible randomized protocol
(minimizing the expected communication) is at least as large as the worst-case
communication complexity of the best deterministic protocol solving the same problem,
when the inputs are chosen according to the worst distribution.

117/212
2. Restating the Problem: Deterministic vs. Randomized Protocols

To understand the theorem in context, let us recall the following setup:

Deterministic Protocol: A protocol where the actions of Alice and Bob are entirely
determined by their inputs. The goal is to minimize the number of bits exchanged in the
worst case, ensuring the protocol always computes the correct answer.

Randomized Protocol: A protocol where Alice and Bob can use random bits in their
communication. The goal is to minimize the expected number of bits exchanged while
ensuring that the correct answer is computed with high probability (i.e., with error
probability less than ϵ).

In general, randomized protocols can perform better than deterministic ones in terms of
expected communication, because they can make probabilistic decisions. However, proving
that a randomized protocol performs well for all cases can be tricky.

3. Relating Average-Case to Worst-Case Analysis

Yao's Minimax Theorem provides a formal way to relate these two types of analysis.
Specifically, it asserts the equivalence between the following:

Worst-case deterministic complexity: The communication complexity of the most


efficient deterministic protocol for a given problem, where the inputs are chosen in the
worst case.

Average-case randomized complexity: The expected communication complexity of the


best possible randomized protocol, where the inputs are drawn from some distribution.

The theorem can be seen as a minimax result, where we minimize the expected
communication for a randomized protocol and maximize the communication complexity
over the worst-case input distribution for a deterministic protocol. It shows that these two
strategies lead to equivalent lower bounds.

4. Proof Outline of Yao's Minimax Theorem

To understand how Yao's Minimax Theorem works, consider the following:

Let’s assume we have a randomized protocol for some function f , and we want to prove
a lower bound on its expected communication complexity.

We can use the fact that for a randomized protocol, the expected communication
depends on the distribution of inputs that Alice and Bob receive. This input distribution
can be chosen by an adversary to make the protocol inefficient.

118/212
To establish the equivalence, we imagine transforming the randomized protocol into a
deterministic problem by fixing the distribution of inputs and constructing a
corresponding deterministic input for which the protocol must work.

The idea behind the proof is to show that if a randomized protocol performs well on some
distribution, then no deterministic protocol can outperform it in the worst case. Specifically,
we can argue that:

1. Adversarial Input Generation: We generate an adversarial worst-case input distribution


for which the deterministic protocol must perform poorly, based on the average-case
performance of the randomized protocol.

2. Equivalence of Worst-Case and Average-Case Performance: We show that any


randomized protocol that performs well in the average case (minimizing expected
communication) cannot outperform a deterministic protocol that performs well in the
worst case. This establishes the minimax property of the problem.

5. Implications of Yao's Minimax Theorem

The theorem has significant implications for understanding the trade-offs between
randomized and deterministic protocols:

Equivalence in Bounds: The theorem shows that, in terms of communication


complexity, a randomized protocol cannot perform better than a deterministic
protocol on average, if both are solving the same problem. This means that the worst-
case performance of a deterministic protocol gives a lower bound for the average-case
performance of any randomized protocol.

Lower Bounds for Randomized Protocols: By applying Yao's Minimax Theorem, we can
establish lower bounds for randomized communication complexity by considering the
worst-case scenario of a corresponding deterministic protocol. This is especially useful in
proving that certain problems require significant communication even when randomized
algorithms are allowed.

Deriving Lower Bounds in Communication Complexity: The theorem is commonly used


in communication complexity to derive lower bounds for randomized protocols by
constructing a hard distribution on the inputs that forces the randomized protocol to
use a large number of bits, similar to how a worst-case deterministic protocol would
behave.

6. Example: Applying Yao's Minimax Theorem to a Problem

119/212
Let us consider the disjointness problem as an example.

Disjointness Problem: Alice has a set A ⊆ {1, 2, ..., n}, and Bob has a set B ⊆
{1, 2, ..., n}. The task is to determine whether A ∩ B = ∅.
In a deterministic protocol, Alice and Bob may need to communicate all the bits of their
sets to determine whether the sets are disjoint, leading to a communication complexity
of O(n).

Randomized Protocol: A randomized protocol can solve this problem efficiently by


sending a random subset of Alice's set to Bob, allowing them to check for intersection
with a reduced number of bits.

Using Yao’s Minimax Theorem, we can show that the expected communication complexity
of the best randomized protocol for this problem is still Ω(n). This is because the worst-case
scenario for the disjointness problem (where the sets are large) requires O(n)
communication even in a randomized protocol, and the minimax theorem establishes that
no randomized protocol can perform better than the worst-case deterministic protocol.

7. Summary

Yao's Minimax Theorem establishes a strong relationship between the average-case


performance of randomized protocols and the worst-case performance of deterministic
protocols. Specifically, it proves that no randomized protocol can outperform the best
deterministic protocol for a problem when considering worst-case inputs. This equivalence is
valuable in the study of communication complexity and algorithmic game theory, as it allows
us to reason about the lower bounds of randomized algorithms by analyzing deterministic
protocols.

The theorem provides a formal framework for understanding the trade-offs between
randomization and determinism in communication complexity, ensuring that these two
approaches are closely linked in terms of their communication requirements.

Lecture 37: Lower Bounds using Yao's Minimax


In this lecture, we discuss how to apply Yao's Minimax Theorem to derive lower bounds on
the randomized communication complexity of problems. The core idea is that by utilizing
the equivalence between worst-case and average-case analysis, we can establish a lower
bound on the performance of randomized protocols based on their worst-case performance
in deterministic settings.

1. Review of Yao's Minimax Theorem

120/212
To begin, recall the key idea of Yao’s Minimax Theorem:

Minimax Theorem: There is an equivalence between the worst-case performance of a


deterministic protocol and the expected performance of a randomized protocol. More
precisely, for any function f , the worst-case communication complexity of the best
deterministic protocol is at least as large as the expected communication complexity of
the best randomized protocol.

This theorem allows us to translate the analysis of randomized protocols into the analysis of
deterministic protocols, and it gives us a way to derive lower bounds on the randomized
communication complexity by analyzing corresponding deterministic settings.

2. Establishing Lower Bounds using Yao’s Minimax Theorem

The process of deriving lower bounds on randomized communication complexity involves


the following steps:

Step 1: Define the Problem and Communication Model We start by specifying the
problem for which we want to derive a lower bound, typically in the context of
communication complexity. This involves describing the communication between two
parties (Alice and Bob), their inputs, and the communication protocol they use to solve
the problem.

Step 2: Choose a Probability Distribution over Inputs In Yao’s minimax framework, we


choose a probability distribution D over the inputs to the problem. The key idea is that
the inputs Alice and Bob receive are not fixed, but instead are drawn from this
distribution.

Step 3: Construct the Deterministic Problem We then construct a corresponding


deterministic problem for which the inputs are chosen in the worst-case scenario. The
goal is to compute the deterministic communication complexity of the problem under
the worst distribution of inputs. This requires analyzing how much communication is
required to solve the problem with certainty in the worst case.

Step 4: Apply Yao’s Minimax Theorem Once we have the worst-case deterministic
problem, we can apply Yao’s minimax theorem. This allows us to lower-bound the
expected communication complexity of the best randomized protocol by considering
the deterministic worst-case scenario. Specifically, if we can show that the deterministic
problem requires at least C bits of communication in the worst case, the randomized
protocol must also require at least C bits on average.

3. Example of Lower Bound Derivation

121/212
To illustrate how to apply Yao’s Minimax Theorem to derive lower bounds, let’s consider a
specific example: the disjointness problem.

Disjointness Problem:

Alice has a set A ⊆ {1, 2, ..., n} and Bob has a set B ⊆ {1, 2, ..., n}.
The goal is to determine if A ∩ B = ∅ (i.e., whether the two sets are disjoint).

Step-by-Step Application of Yao’s Minimax Theorem:

1. Defining the Problem: The problem is a communication problem, where Alice and Bob
communicate to determine if their sets are disjoint. The sets are represented as binary
strings of length n, with A and B being the sets that Alice and Bob hold, respectively.
The communication complexity is the number of bits exchanged between Alice and Bob.

2. Choosing a Probability Distribution: We assume that the sets A and B are chosen
randomly according to some distribution. In this case, the distribution may assign a
probability to each possible pair of sets (A, B), where the sets could be either disjoint or
intersecting.

3. Constructing the Deterministic Problem: In the worst-case scenario, the sets A and B
could be very large, and Alice and Bob may need to communicate all of their bits to
determine if the sets are disjoint. This is the worst-case deterministic scenario, where the
communication complexity of the problem is O(n) (since Alice and Bob may need to
exchange all their bits in the worst case).

4. Applying Yao’s Minimax Theorem: By Yao’s minimax theorem, the expected


communication complexity of the best randomized protocol is at least as large as the
worst-case deterministic communication complexity. Since the deterministic
communication complexity is Ω(n), the expected communication complexity of any
randomized protocol is also Ω(n).

Thus, the randomized communication complexity of the disjointness problem is Ω(n). This
lower bound is important because it shows that no randomized protocol can solve the
problem using fewer than Ω(n) bits on average, even though randomized protocols might
seem more efficient in other contexts.

4. Generalizing to Other Problems

Yao’s Minimax Theorem can be applied to a wide range of problems in communication


complexity. For each problem, the process involves:

122/212
Choosing an appropriate distribution over inputs.

Constructing the worst-case deterministic version of the problem.

Applying Yao’s minimax theorem to relate the performance of randomized protocols to


deterministic protocols.

5. Conclusion and Implications

Using Yao’s Minimax Theorem to derive lower bounds on randomized communication


complexity is a powerful technique. It allows us to establish fundamental limitations on the
efficiency of randomized protocols by relating them to the performance of deterministic
protocols. This method provides a rigorous framework for proving that certain problems
require significant communication, even when randomization is allowed.

The result that the randomized communication complexity is bounded below by the
deterministic communication complexity (in the worst case) has deep implications in both
communication complexity and algorithmic game theory. It shows that, in many cases,
randomization does not provide a substantial advantage in terms of communication.

Lecture 38: Set Disjointness Problem


In this lecture, we will apply the tools from Yao’s Minimax Theorem to derive a lower bound
for the randomized communication complexity of the set disjointness problem. This
problem is central in communication complexity, and its lower bound will help us understand
the limits of efficiency for randomized protocols.

1. Review of Set Disjointness Problem

The set disjointness problem involves two parties, Alice and Bob, each holding a set of
elements. Their task is to determine whether the sets are disjoint, i.e., if A ∩ B = ∅, where:
Alice has a set A ⊆ {1, 2, … , n},
Bob has a set B ⊆ {1, 2, … , n},
The goal is to determine whether A ∩ B = ∅ (i.e., whether the two sets are disjoint).

The sets are represented as binary vectors A and B , where each entry Ai or Bi is either 0 or

1, indicating whether element i is in the set or not. Alice and Bob exchange messages to
decide whether their sets are disjoint.

The communication complexity is measured by the total number of bits exchanged between
Alice and Bob in a protocol to correctly determine whether A ∩ B = ∅.

123/212
2. Applying Yao’s Minimax Theorem

To derive a lower bound on the randomized communication complexity of the set


disjointness problem, we use Yao’s Minimax Theorem. This theorem allows us to relate the
deterministic and randomized communication complexities by converting the problem into
a worst-case deterministic problem.

Step 1: Define a Probability Distribution over Inputs

We will define a probability distribution over the inputs A and B . In the case of the set
disjointness problem, a hard distribution is one that forces any communication protocol to
be inefficient, meaning the protocol must exchange a large number of bits.

One common distribution to consider is the uniform distribution over all possible sets A
and B of size n. However, the key observation in the set disjointness problem is that some
inputs are much harder than others for communication protocols. For example, when the
sets A and B are large and nearly identical, the protocol must communicate more bits to
ensure they are disjoint.

Step 2: Construct the Worst-Case Deterministic Problem

In the worst-case scenario, Alice and Bob may need to exchange all of their bits to determine
whether the sets are disjoint. In this case, the deterministic communication complexity is
Ω(n), where n is the size of the sets. The reasoning is that if the sets A and B are almost
identical, each bit in the sets could potentially be required to be transmitted to check for
disjointness.

Step 3: Apply Yao’s Minimax Theorem

By Yao’s Minimax Theorem, the expected communication complexity of the best


randomized protocol is at least as large as the worst-case deterministic communication
complexity for the chosen probability distribution.

Since the worst-case deterministic communication complexity of the set disjointness


problem is Ω(n), the expected communication complexity of the best randomized protocol
must also be Ω(n).

This establishes a lower bound of Ω(n) for the randomized communication complexity of
the set disjointness problem. In other words, no randomized protocol can solve the set
disjointness problem using fewer than Ω(n) bits of communication, even in the best-case
distribution.

3. Hard Distribution and Lower Bound Insight

124/212
To gain more insight into this lower bound, we consider the hard distribution over the
inputs A and B . The hard distribution is designed to make the problem particularly difficult
for communication protocols. One such distribution could be one where A and B are
randomly chosen but are likely to have substantial overlap, which forces the protocol to
communicate a large number of bits to verify disjointness.

When A and B are highly similar (e.g., A = {1, 2, … , k} and B = {k + 1, k +


2, … , 2k}, for large k ), there are many elements to check, and each bit of the sets may
need to be transmitted to ensure that no element of A appears in B .

In this case, the hard distribution ensures that any protocol must exchange nearly n
bits in the worst case.

4. Conclusion

The lower bound on the randomized communication complexity of the set disjointness
problem is Ω(n), where n is the number of elements in each set. This lower bound shows
that no randomized protocol can solve the problem more efficiently than Ω(n) bits of
communication in the worst case, and it follows directly from applying Yao’s Minimax
Theorem. The hard distribution plays a crucial role in ensuring that the protocol cannot
achieve a better bound.

This result highlights the importance of understanding the limits of randomized protocols
and how communication complexity bounds can be derived using probability distributions,
deterministic problems, and the tools of duality theory from linear programming.

Lecture 39: LP for Max Flow Problem


In this lecture, we explore the Maximum Flow Problem in the context of network flow
algorithms, and demonstrate how it can be formulated as a Linear Program (LP). The Max
Flow Problem is a fundamental problem in combinatorial optimization and network theory,
with wide applications in various fields, including transportation, communication, and
logistics.

1. Introduction to the Max Flow Problem

The Maximum Flow Problem involves a directed graph G = (V , E), where:


V is the set of vertices (nodes),
E is the set of directed edges between vertices,
Each edge (u, v) ∈ E has a capacity c(u, v), which represents the maximum amount of
flow that can pass through that edge.

125/212
The goal is to find the maximum amount of flow that can be sent from a source vertex s to a
sink vertex t, while respecting the capacity constraints on the edges.

Formally, we define a flow f (u, v) as the amount of flow passing through edge (u, v),
subject to the following constraints:

Capacity Constraint: For every edge (u, v), the flow must not exceed the capacity of the
edge:

0 ≤ f (u, v) ≤ c(u, v)
Flow Conservation: For every vertex v except the source s and sink t, the amount of flow
entering v must equal the amount of flow leaving v :

∑ f (u, v ) = ∑ f (v , w)
​ ​

(u,v)∈E (v,w)∈E

This ensures that flow is conserved at all vertices except s and t.

The objective is to maximize the total flow from the source s to the sink t, which is defined
as:

Maximize ∑ f (s, v) ​

(s,v)∈E

This sum represents the total amount of flow leaving the source s.

2. Formulating the Max Flow Problem as a Linear Program

We now turn the Max Flow Problem into a linear programming problem.

Let f (u, v) be the flow on the edge (u, v). We define the following decision variables:

Let f (u, v) be the amount of flow on edge (u, v) where u and v are vertices in V .

The objective function is to maximize the total flow out of the source vertex s, which is:

Maximize ∑ f (s, v) ​

(s,v)∈E

Constraints:

1. Capacity Constraints: For each edge (u, v), the flow must not exceed the capacity of the
edge:

0 ≤ f (u, v) ≤ c(u, v) ∀(u, v) ∈ E

126/212
2. Flow Conservation: For every vertex v , except the source s and the sink t, the total flow
entering the vertex must equal the total flow leaving the vertex:

∑ f (u, v) = ∑ f (v, w) ∀v ∈ V ∖ {s, t}


​ ​

(u,v)∈E (v,w)∈E

3. Non-Negativity: The flow on any edge must be non-negative:

f (u, v) ≥ 0 ∀(u, v) ∈ E

Thus, the linear program (LP) for the maximum flow problem can be written as:

Maximize ∑ f (s, v) ​

(s,v)∈E

subject to:

0 ≤ f (u, v) ≤ c(u, v) ∀(u, v) ∈ E

∑ f (u, v) = ∑ f (v, w) ∀v ∈ V ∖ {s, t}


​ ​

(u,v)∈E (v,w)∈E

f (u, v) ≥ 0 ∀(u, v) ∈ E

3. Interpretation of the Linear Program

The objective function maximizes the flow from the source s to the sink t.

The capacity constraints ensure that the flow on each edge does not exceed its capacity.

The flow conservation constraints enforce that the amount of flow entering a vertex
(except for the source and sink) equals the amount of flow leaving the vertex.

The non-negativity constraints ensure that the flow on each edge is non-negative.

Thus, solving this LP will yield the maximum flow from the source s to the sink t in the graph.

4. Connection to Max Flow Algorithms

The formulation of the maximum flow problem as an LP is of theoretical interest. However, in


practice, we typically solve the problem using flow algorithms like the Ford-Fulkerson
method, Edmonds-Karp algorithm, or Push-Relabel algorithm.

These algorithms, while not directly solving the LP, can be seen as solving the same problem
iteratively by adjusting the flow values on the graph until an optimal solution is found. The LP
formulation provides a mathematical foundation for understanding the problem, and its
solution is equivalent to the solution obtained by these algorithms.

127/212
5. Conclusion

The Maximum Flow Problem is a classical optimization problem that can be efficiently
formulated as a linear program. The LP formulation involves maximizing the flow from a
source to a sink in a network while respecting edge capacities and flow conservation
constraints. This formulation not only provides a mathematical way of solving the problem
but also lays the groundwork for further theoretical explorations in network optimization
and algorithmic design.

Lecture 40: LP for Min Cut Problem


In this lecture, we discuss the Minimum Cut Problem, which is closely related to the
Maximum Flow Problem. We will explore the problem's definition, motivation, and how it
can be formulated as a Linear Program (LP). The relationship between maximum flow and
minimum cut is a central theme, and we will examine how these two problems are dual to
each other.

1. Introduction to the Min Cut Problem

The Minimum Cut Problem is concerned with finding a cut in a directed graph that
separates the source s from the sink t, and minimizes the total capacity of the edges
crossing the cut. More formally:

Given a directed graph G = (V , E), where V is the set of vertices and E is the set of
directed edges with capacities c(u, v) for each edge (u, v) ∈ E ,

A cut is a partition of the vertices into two disjoint sets S and T such that s ∈ S and t ∈
T.
The capacity of the cut (S, T ) is defined as the sum of the capacities of the edges that
go from S to T , i.e., the edges that have one endpoint in S and the other endpoint in T :

Capacity of the cut = ∑ ​ c(u, v)


(u,v)∈E,u∈S,v∈T

The minimum cut is the cut that minimizes this capacity, i.e., the smallest total capacity
among all possible cuts that separate s from t.

2. Max Flow - Min Cut Theorem

One of the key results in network flow theory is the Max Flow - Min Cut Theorem, which
states that:

Maximum Flow = Minimum Cut

128/212
This theorem establishes that the maximum flow from the source s to the sink t is equal to
the minimum capacity of a cut that separates s from t.

This theorem is fundamental because it allows us to solve the minimum cut problem by
solving the maximum flow problem. Since we already know how to formulate and solve the
maximum flow problem using linear programming (as shown in the previous lecture), we can
use that framework to find the minimum cut.

3. Formulating the Min Cut Problem as a Linear Program

We now turn to formulate the Minimum Cut Problem as a linear program.

Let G = (V , E) be a directed graph, and c(u, v) be the capacity of the edge (u, v). We
define the following decision variables:

Let x(v) be a binary variable that indicates whether a vertex v belongs to the set S (the
source side of the cut). Specifically, x(v) = 1 means v ∈ S , and x(v) = 0 means v ∈ T .

4. Objective Function

The objective of the Min Cut Problem is to minimize the capacity of the cut. The capacity of
the cut (S, T ) is the sum of the capacities of the edges (u, v) where u ∈ S and v ∈ T , or
equivalently, where x(u) = 1 and x(v) = 0. Therefore, the objective function is:

Minimize ∑ c(u, v) ⋅ (x(u) − x(v))


(u,v)∈E

Here, x(u) − x(v) takes the value:

1 if u ∈ S and v ∈ T (i.e., the edge (u, v) crosses the cut),


0 otherwise (i.e., if the edge does not cross the cut).

5. Constraints

To ensure that the partition of vertices into sets S and T is valid, we impose the following
constraints:

1. Binary Constraints: Each vertex must be either in set S or set T , which translates to:

x(v) ∈ {0, 1} ∀v ∈ V
2. Source and Sink Constraints: The source vertex s must belong to set S and the sink
vertex t must belong to set T :

x(s) = 1 (source is in set S)

x(t) = 0 (sink is in set T )

129/212
Thus, the linear program for the minimum cut can be written as:

Minimize ∑ c(u, v) ⋅ (x(u) − x(v))


(u,v)∈E

subject to:

x(v) ∈ {0, 1} ∀v ∈ V

x(s) = 1

x(t) = 0

6. Relation to Maximum Flow

The LP formulation for the minimum cut problem is closely related to the LP for the
maximum flow problem. In fact, by solving the maximum flow problem, we can determine
the minimum cut as follows:

After solving the maximum flow, identify the set S of vertices that are reachable from
the source s using the residual graph.

The set T consists of all the remaining vertices.

The minimum cut is the set of edges that go from S to T , and its capacity is the total
capacity of these edges.

This relationship between the maximum flow and minimum cut makes the Max Flow - Min
Cut Theorem a powerful tool in solving both problems.

7. Conclusion

The Minimum Cut Problem is a fundamental optimization problem in network flow theory. It
can be efficiently formulated as a linear program, which minimizes the capacity of a cut that
separates the source s from the sink t in a directed graph. The LP formulation of the
minimum cut problem is directly related to the LP for the maximum flow problem, and the
Max Flow - Min Cut Theorem ensures that the maximum flow equals the minimum cut.
Solving the maximum flow problem thus provides an optimal solution to the minimum cut
problem, and this duality is central to understanding many network optimization problems.

Lecture 41: Max Flow = Min Cut


In this lecture, we present an alternate proof of the well-known Max Flow = Min Cut
Theorem. The theorem asserts that the maximum flow in a directed graph from a source s

130/212
to a sink t is equal to the capacity of the minimum cut that separates s from t. This is one of
the cornerstone results in network flow theory.

1. Review of the Max Flow and Min Cut Problems

To recall:

Maximum Flow Problem: Given a directed graph G = (V , E) with source s, sink t, and
capacities c(u, v) on the edges, the objective is to find the maximum flow from s to t
that can be sent through the network, subject to the capacity constraints on the edges.

Minimum Cut Problem: Given the same graph, the objective is to find a cut (a partition
of the vertices into two sets S and T , where s ∈ S and t ∈ T ) that minimizes the total
capacity of the edges crossing the cut, i.e., the sum of the capacities of the edges (u, v)
where u ∈ S and v ∈ T .

The Max Flow = Min Cut Theorem states that the maximum value of the flow that can be
sent from s to t is equal to the capacity of the minimum cut that separates s from t.

Maximum Flow = Minimum Cut

2. Linear Program Formulation

As established in previous lectures, both the maximum flow and minimum cut problems
can be formulated as linear programs. Let us recall their formulations.

Maximum Flow LP: The maximum flow problem can be formulated as the following
linear program:

Decision Variables:

f (u, v) (flow on edge (u, v))

Objective Function:

Maximize ∑ f (s, v ) ​

(s,v)∈E

Constraints:

f (u, v) ≤ c(u, v) ∀(u, v) ∈ E

∑ f (u, v ) =

∑ f (v , u) ∀u ∈ V ∖ {s, t}

v:(u,v)∈E v:(v,u)∈E

f (u, v) ≥ 0 ∀(u, v) ∈ E

131/212
The objective is to maximize the flow sent from the source s to the sink t, subject to the
flow conservation constraints at each vertex (except at s and t), and the capacity
constraints on the edges.

Minimum Cut LP: The minimum cut problem can be formulated as follows:

Decision Variables:

x(v) (binary variable indicating whether vertex v belongs to set S)

Objective Function:

Minimize ∑ c(u, v ) ⋅ (x(u) − x(v))


(u,v)∈E

Constraints:

x(v) ∈ {0, 1} ∀v ∈ V

x(s) = 1 (source is in set S)

x(t) = 0 (sink is in set T )

The objective is to minimize the capacity of the cut separating s from t, subject to
ensuring that s ∈ S and t ∈ T , and that each vertex is assigned to one of the two sets.

3. Primal-Dual Pair

We now demonstrate that the maximum flow LP and the minimum cut LP form a primal-
dual pair. To do this, we examine the relationship between the constraints and objective
functions of the two linear programs.

The maximum flow problem is a primal optimization problem, where the goal is to
maximize the flow through the network while satisfying flow conservation and capacity
constraints.

The minimum cut problem is the dual optimization problem, where the goal is to
minimize the capacity of the cut separating the source and sink while ensuring that the
cut respects the binary assignment of vertices.

By interpreting the constraints and objective functions of these two LPs, we see that the dual
of the maximum flow problem corresponds to the primal of the minimum cut problem. This
means that the solution to the maximum flow problem provides a lower bound for the
minimum cut, and the solution to the minimum cut problem provides an upper bound for
the maximum flow.

132/212
4. Strong Duality and the Proof

The strong duality theorem of linear programming states that if both the primal and dual
problems have feasible solutions, then their optimal values are equal. In the context of the
max flow and min cut problems, this means that the value of the maximum flow is equal to
the value of the minimum cut.

To prove this, we consider the following steps:

1. Feasibility: Both the primal (maximum flow) and dual (minimum cut) LPs have feasible
solutions, as they are based on the feasible flow and cut definitions in the network.

2. Optimality: By strong duality, the optimal values of the primal and dual LPs are equal,
i.e., the maximum flow equals the minimum cut.

Thus, we conclude that the value of the maximum flow in the network is equal to the
capacity of the minimum cut that separates the source s from the sink t.

5. Conclusion

The Max Flow = Min Cut Theorem provides a powerful connection between two central
optimization problems in network flow theory. By formulating both problems as linear
programs, we have shown that they form a primal-dual pair. Applying the strong duality
theorem of linear programming leads to the conclusion that the maximum flow in a graph is
equal to the minimum cut capacity. This result has far-reaching implications in network
optimization, communications, and combinatorial optimization.

Lecture 42: Primal-Dual Approach


In this lecture, we explore the primal-dual approach to solving linear programs, particularly
when dealing with a primal-dual pair of linear programs. This approach is a powerful
technique in optimization, especially for problems like network flow and minimum cut, and
has applications in combinatorial optimization.

1. Review of Primal and Dual LPs

Recall that every linear program has an associated dual problem. For a given primal linear
program (LP), the dual provides an alternative perspective on the same optimization
problem. The primal and dual LPs are related in such a way that solving one provides
insights into solving the other.

To formalize:

133/212
Primal LP: This typically represents the optimization problem we are trying to solve. It is
expressed in terms of decision variables, constraints, and an objective function.

Dual LP: The dual represents a related problem, where the decision variables
correspond to the constraints in the primal, and the constraints in the dual correspond
to the objective in the primal.

The duality theory guarantees that if both the primal and dual problems are feasible, then
their optimal solutions have the same value, as stated by strong duality.

2. Primal-Dual Methodology

The primal-dual approach refers to a simultaneous solution strategy where we try to solve
both the primal and dual problems concurrently. The key idea is to improve the current
solutions of both the primal and the dual iteratively. This technique is particularly useful
when we do not have a clear method for solving either problem directly, but we can make
progress by leveraging the relationship between the two.

The approach involves the following steps:

Start with feasible solutions for the dual problem and the primal problem, or at least
approximate feasible solutions.

Iteratively update both the primal and dual solutions based on the current state,
adjusting them in such a way that we progress towards optimality.

In some cases, we may restrict the original primal or dual LP to a simpler form, which is
easier to solve, and then gradually improve this restricted solution.

3. The General Strategy

To implement the primal-dual approach, we follow these broad steps:

1. Initialization:

Start with an initial feasible solution for the dual LP and the primal LP. In some
cases, we may begin with an approximate solution that satisfies the constraints to
some extent but is not necessarily optimal.

2. Iterative Improvement:

Gradually refine both the primal and dual solutions. Each step consists of adjusting
the solutions for both LPs based on the information provided by the other. For
example, if the primal solution becomes better, the dual solution is updated
accordingly, and vice versa.

134/212
3. Restricted Primal/Dual:

Often, the original primal and dual LPs may be too complex to solve efficiently. In
such cases, we solve restricted versions of these LPs, where we limit the set of
variables or constraints involved. This restricted LP may be easier to solve, and we
can progressively expand the set of variables and constraints involved as the
solution improves.

4. Termination:

The iterative process stops when we find a solution that satisfies the optimality
conditions, meaning that the current solution is optimal for both the primal and the
dual LPs.

4. Applications of the Primal-Dual Approach

The primal-dual approach is particularly useful in problems where the primal and dual
solutions are intertwined, and solving them together is more efficient than solving them
independently. Some well-known applications include:

Network Flow Problems: In network flow problems, like maximum flow and minimum
cut, the primal-dual approach is often used to find the flow in a network or the cut
separating two vertices.

Combinatorial Optimization Problems: Problems like minimum spanning tree and


shortest path are well suited to the primal-dual method.

Approximation Algorithms: The primal-dual approach is frequently used to design


approximation algorithms for NP-hard problems. The approach can often be used to
find near-optimal solutions in polynomial time, even if exact solutions are
computationally infeasible.

One classic example is the primal-dual algorithm for the set cover problem, where we
iteratively select elements and sets to cover them while maintaining a feasible dual solution.

5. Detailed Example: Minimum Cut Problem

To illustrate the primal-dual approach, let us consider the minimum cut problem:

Primal Problem (Minimum Cut): We want to minimize the cut capacity between two
vertices, s and t, in a network.

Dual Problem: In the dual formulation, we are trying to maximize the flow from s to t.

Primal-Dual Strategy for Minimum Cut:

135/212
1. Start with an initial feasible solution: We begin by setting up an initial solution where
the flow is zero and all cuts are unassigned.

2. Iteratively improve both solutions: Gradually adjust the flow and cut by augmenting
the flow along certain paths, which reduces the residual capacity of the cut.

3. Terminate when a feasible cut is found: The process stops when no further
improvements can be made, and the solution corresponds to an optimal flow and an
optimal cut.

6. Advantages of the Primal-Dual Approach

Efficiency: The primal-dual approach can often be more efficient than solving the primal
or dual problem independently, especially in large-scale optimization problems.

Flexibility: The method can be adapted to a wide variety of optimization problems,


especially in combinatorial optimization and network flow.

Approximation: In some cases, the primal-dual approach can be used to find


approximate solutions, providing guarantees on the quality of the solution compared to
the optimal.

7. Conclusion

The primal-dual approach is a powerful strategy for solving linear programs, especially when
the primal and dual problems are closely related. By solving both problems simultaneously,
or by iteratively refining feasible solutions to both, we can often find optimal or near-optimal
solutions more efficiently. This approach has broad applications in network flow,
combinatorial optimization, and approximation algorithms, and it is an essential tool in the
field of optimization.

Lecture 43: Primal-Dual for Max-Flow


In this lecture, we will apply the primal-dual approach to the maximum flow problem. This
method will provide us with a detailed understanding of the algorithm used to compute the
maximum flow in a flow network. Specifically, we will derive the well-known Ford-Fulkerson
algorithm using the primal-dual approach.

1. Overview of the Max-Flow Problem

The maximum flow problem is a classical optimization problem in network theory. The goal
is to find the maximum amount of flow that can be sent from a source node s to a sink node
t in a directed flow network, subject to capacity constraints on the edges.

136/212
Formally, we are given a directed graph G = (V , E), where V is the set of vertices (nodes),
and E is the set of edges. Each edge (u, v) ∈ E has a capacity c(u, v), which represents the
maximum flow that can pass through the edge. We seek to maximize the flow from the
source s to the sink t.

2. Formulation of the Primal Problem (Max Flow)

We begin by formulating the primal linear program (LP) for the maximum flow problem:

Variables: Let f (u, v) be the flow through the edge (u, v).

Objective: Maximize the total flow from the source s to the sink t:

Maximize ∑ f (s, v) − ∑ f (v, t)


​ ​

(s,v)∈E (v,t)∈E

This objective represents the net flow into the sink node t minus the flow out of the
source node s.

Constraints:

1. Flow conservation at each node except s and t:

∑ f (u, v) = ∑ f (v, u) for all v ∈ V ∖ {s, t}


​ ​

(u,v)∈E (v,u)∈E

This ensures that the flow into a node equals the flow out of the node.

2. Capacity constraint for each edge:

0 ≤ f (u, v) ≤ c(u, v) for all (u, v) ∈ E

This ensures that the flow on an edge does not exceed its capacity.

Feasibility: The flow values must satisfy the above constraints for all edges and vertices
in the network.

3. Dual Problem for Max-Flow

The dual problem for the max-flow problem can be derived using duality theory. In the
dual, we seek to find an optimal set of cut capacities. The dual variables correspond to the
cuts in the network that separate the source from the sink.

Formally, let’s consider the cut in a flow network as a partition of the set of vertices V into
two disjoint subsets S and T = V ∖ S , where s ∈ S and t ∈ T . The capacity of a cut is the
sum of the capacities of the edges crossing the cut from S to T , i.e., the edges (u, v) where
u ∈ S and v ∈ T .

137/212
The dual problem maximizes the capacity of such cuts subject to the following constraints:

For every node v ∈ V ∖ {s, t}, we assign a dual variable y(v), where y(v) represents
the potential of node v .

The dual objective is to maximize the sum of capacities across all cuts while satisfying
the dual constraints.

4. Primal-Dual Algorithm for Max-Flow (Ford-Fulkerson)

To solve the max-flow problem using the primal-dual approach, we proceed as follows:

1. Initialization:

Start with an initial feasible solution for the primal problem. This is typically done by
setting the flow on each edge to zero initially, i.e., f (u, v) = 0 for all edges (u, v).
The dual solution can be initialized by setting the potentials y(v) = 0 for all
vertices, except for the source s, where y(s) = 0.
2. Iterative Process:

Augment the flow along a path from the source s to the sink t using a residual
graph. A residual graph represents the available capacity along each edge,
considering the current flow.

Update the primal solution: For each edge in the augmenting path, increase the
flow by the smallest available capacity along the path.

Update the dual solution: Adjust the dual variables (node potentials) based on the
change in flow. This step ensures that the dual constraints are maintained, and the
algorithm progresses toward an optimal solution.

3. Termination:

The algorithm terminates when no more augmenting paths can be found in the
residual graph. At this point, the flow in the network is maximal, and the cut that
separates the source from the sink provides the minimum cut, as per the max-flow
min-cut theorem.

5. Ford-Fulkerson Algorithm

The Ford-Fulkerson algorithm is one of the most well-known algorithms for solving the
maximum flow problem. It is based on the primal-dual approach, and it proceeds as follows:

1. Start with an initial flow of zero.

138/212
2. While there exists an augmenting path (a path from s to t in the residual graph),
augment the flow along the path.

3. Update the residual graph and repeat the process until no augmenting paths remain.

4. The final flow corresponds to the maximum flow, and the cut separating s and t
corresponds to the minimum cut.

The Ford-Fulkerson algorithm is guaranteed to find the maximum flow in a network.


However, it may not run in polynomial time if the capacities are irrational, due to the need for
infinite precision in some cases. When using integer capacities, the algorithm runs in
polynomial time.

6. Conclusion

The primal-dual approach provides a clear framework for solving the maximum flow
problem. By starting with an initial feasible solution and iteratively refining both the primal
flow and dual potentials, the algorithm converges to the optimal solution. This leads to the
Ford-Fulkerson algorithm, which is a cornerstone of network flow algorithms. The primal-
dual approach not only yields an efficient algorithm for the max-flow problem but also
deepens our understanding of the relationship between flow and cuts in a network.

Lecture 44: Set Cover Problem


In this lecture, we transition to another application of linear programming (LP): the set
cover problem. In the case of the min-cut problem, we saw that the values of the linear
programming (LP) relaxation and the integer linear programming (ILP) formulation were
the same. However, this is not always the case, and in general, the LP relaxation may provide
a solution that is a lower bound for the integer problem. The goal in this lecture is to show
that the value of the LP solution can be close to the ILP solution, which is the optimal
solution to the problem.

1. Introduction to the Set Cover Problem

The set cover problem is a classic optimization problem. The objective is to select the
minimum number of sets from a collection of sets such that their union covers a given
universal set.

Problem Definition:

Input: A universe U of n elements, and a collection of subsets S1 , S2 , … , Sm of U ,


​ ​ ​

where Si ​ ⊆ U.

139/212
Objective: Find the smallest subset of {S1 , S2 , … , Sm } whose union is equal to U , i.e.,
​ ​ ​

every element in U is contained in at least one of the chosen sets.

The set cover problem is NP-hard, meaning that finding the exact optimal solution is
computationally intractable for large inputs.

2. Integer Linear Programming Formulation (ILP)

We can model the set cover problem as an Integer Linear Program (ILP).

Variables: Let xi be a binary variable associated with each set Si . xi


​ ​ ​ = 1 if set Si is

selected in the cover, and xi ​ = 0 otherwise.


Objective: Minimize the total number of selected sets:

m
Minimize ∑ xi ​ ​

i=1

Constraints: Every element u ∈ U must be covered by at least one selected set. For
each element u ∈ U , we can define a constraint as follows:

∑ ​ xi ≥ 1

Si covers u

This ensures that each element in U is covered by at least one of the selected sets.

ILP Formulation:
m
Minimize ∑ xi ​ ​

i=1

Subject to:

∑ ​ xi ≥ 1
​ for all u ∈ U
Si covers u

xi ∈ {0, 1}
​ for all i

The goal is to minimize the total number of sets selected while ensuring that every element
in U is covered by at least one of the selected sets.

3. Linear Programming Relaxation (LP Relaxation)

Since the set cover problem is NP-hard, it is often useful to solve the LP relaxation of the ILP,
which involves relaxing the integrality constraints. Instead of requiring xi to be binary, we ​

allow xi to be a real number between 0 and 1.


140/212
LP Relaxation:
m
Minimize ∑ xi ​ ​

i=1

Subject to:

∑ ​ xi ≥ 1
​ for all u ∈ U
Si covers u

0 ≤ xi ≤ 1 ​ for all i

By solving this relaxed problem, we obtain a fractional solution, which is typically not
integral. However, the solution provides a lower bound on the optimal integer solution, and
in many cases, it is possible to find a solution that is close to the optimal ILP solution.

4. Approximation Algorithm for Set Cover

Since the set cover problem is NP-hard, an exact solution is difficult to obtain in polynomial
time. However, we can approximate the solution. A well-known approximation algorithm for
the set cover problem works as follows:

Greedy Algorithm:

1. Start with an empty cover.

2. At each step, select the set that covers the largest number of uncovered elements.

3. Add the selected set to the cover and mark the covered elements as covered.

4. Repeat until all elements are covered.

This greedy approach guarantees an approximation factor of ln n, meaning that the


solution found by the algorithm will be at most ln n times the size of the optimal solution,
where n is the number of elements in the universe.

5. LP Relaxation and Approximation Quality

One of the key insights of using LP relaxation is that the LP solution is often close to the ILP
solution. For the set cover problem, the LP relaxation provides a lower bound, and the
greedy algorithm provides an upper bound.

Approximation Guarantee: The value of the LP solution provides a good approximation


to the ILP solution. Specifically, the greedy algorithm achieves an approximation ratio of
ln n, and the value of the LP solution is typically close to the value of the greedy solution.

141/212
The relationship between the LP relaxation and the integer solution is essential in
understanding how close the LP solution is to the optimal integer solution. Even though the
fractional solution from the LP relaxation is not directly usable as a solution to the ILP, it
provides important insights and bounds.

6. Conclusion

The set cover problem is a classic example of an NP-hard problem that can be solved
approximately using linear programming. By relaxing the integer constraints and solving the
LP relaxation, we obtain a fractional solution that serves as a lower bound for the integer
problem. Using approximation algorithms, such as the greedy algorithm, we can find a
solution that is close to optimal.

In the context of the set cover problem, we see that even though LP and ILP solutions may
not always coincide, the LP relaxation provides a valuable tool for obtaining approximate
solutions. The relationship between the LP solution and the ILP solution highlights the power
of linear programming in approximating combinatorial optimization problems.

Lecture 45: Rounding for Set Cover


In this lecture, we discuss how to convert the fractional solution obtained from the LP
relaxation of the set cover problem into an integral solution that satisfies the requirements
of the ILP. This process of converting a fractional solution to an integral one is called
rounding. We explore a few natural rounding techniques and analyze their effectiveness in
obtaining a feasible solution for the set cover problem.

1. Introduction to Rounding

The LP relaxation of the set cover problem allows fractional values for the variables xi , ​

where 0 ≤ xi ≤ 1. While these fractional solutions provide useful bounds, they do not

directly give us a valid integral solution (which requires xi to be either 0 or 1). Therefore, we

need to round the fractional values in a way that:

The integral solution is feasible, i.e., it satisfies all the constraints of the ILP.

The value of the integral solution is as close as possible to the optimal LP solution,
ensuring that the rounding does not lead to a large degradation in the objective value.

2. Rounding Techniques

Several rounding strategies exist to convert fractional solutions into integral ones. We will
discuss a few natural methods and assess their applicability to the set cover problem.

(a) Randomized Rounding

142/212
In randomized rounding, for each set Si , we decide whether to include it in the solution

based on its fractional value xi (which is between 0 and 1).


Specifically, for each Si , we include it in the solution with probability xi , i.e., set Si is
​ ​ ​

included if we choose a random value from the interval [0, 1] that is less than or equal to
xi . Otherwise, we exclude it from the solution.

Expected Outcome:

The expected value of the objective function after rounding is approximately the value of
the fractional solution. This is because each set is included with probability proportional
to its fractional value.

The feasibility of the solution is preserved on average, but the outcome may not always
be feasible for every execution of the randomized process.

Applicability to Set Cover:

In the set cover problem, randomized rounding might not guarantee that each element
of the universe is covered, since the inclusion of sets depends on probabilities. However,
with appropriate adjustments and careful analysis, it is possible to ensure that the
expected coverage of elements remains valid.

(b) Greedy Rounding

In greedy rounding, we select the sets based on their fractional values, similar to the
greedy algorithm for the set cover problem. The main difference is that instead of
making decisions purely based on the coverage of elements, we choose the sets with the
highest fractional value xi and include them in the solution.

This technique can be applied iteratively, where sets with the largest values of xi are ​

included until all elements of the universe are covered.

Expected Outcome:

This method is deterministic and guarantees that each element is covered. The
challenge is that the value of the objective function might be worse than the LP solution
due to over-selection of sets.

Applicability to Set Cover:

In the case of the set cover problem, greedy rounding can be an effective way to ensure
coverage, though it may result in some redundancy (i.e., selecting more sets than

143/212
necessary). The greedy approach can be adjusted to ensure that the solution is both
feasible and close to optimal.

(c) Threshold Rounding

Threshold rounding involves setting a threshold t such that if xi ​ ≥ t, we include set Si ​

< t, we exclude Si . The threshold t is typically chosen based on


in the solution, and if xi ​ ​

the structure of the problem, often t = 1/2 or based on a heuristic that balances
between feasibility and the objective value.

Expected Outcome:

This method is simpler than randomized rounding and can be more effective in ensuring
a feasible solution. However, it may lead to suboptimal solutions if xi values near t are

excluded or included based on the threshold.

Applicability to Set Cover:

For the set cover problem, threshold rounding can be useful, particularly when the
fractional values of xi are not too close to 0 or 1. However, a careful choice of threshold

is required to balance the coverage and the number of sets selected.

3. Analysis of Rounding Methods

We now assess the effectiveness of rounding methods in terms of their approximation


guarantees for the set cover problem.

(a) Approximation Ratio for Randomized Rounding

Randomized rounding typically leads to an approximation ratio that is close to the


optimal LP solution, but it requires a probabilistic argument to guarantee that the
expected objective value of the integral solution is close to that of the LP solution.

In practice, randomized rounding is useful when we are interested in obtaining a


solution with good expected performance, even though the specific outcome might vary.

(b) Approximation Ratio for Greedy Rounding

Greedy rounding ensures that every element is covered, and the algorithm’s
approximation ratio is guaranteed to be within a logarithmic factor of the optimal
solution, i.e., O(log n), where n is the number of elements in the universe.

144/212
This approximation is similar to the greedy algorithm for set cover and can be viewed as
a deterministic alternative to randomized rounding.

(c) Approximation Ratio for Threshold Rounding

Threshold rounding can also give a feasible solution that is close to optimal. By
selecting an appropriate threshold, we can avoid selecting too many unnecessary sets
while ensuring that every element is covered.

The performance of threshold rounding is typically similar to that of the greedy


algorithm, although it may have a higher approximation factor in some cases.

4. Conclusion

The process of rounding allows us to convert the fractional solution of an LP relaxation of


the set cover problem into an integral solution that is feasible for the ILP. While there are
several rounding methods, such as randomized rounding, greedy rounding, and threshold
rounding, each has its trade-offs in terms of approximation guarantees and computational
complexity.

Greedy rounding and threshold rounding are simpler and often lead to good
approximation ratios for the set cover problem.

Randomized rounding provides an expected outcome but requires probabilistic analysis


to ensure feasibility and approximation quality.

In practice, rounding is an essential technique to bridge the gap between LP relaxation and
integer optimization, offering a way to obtain good feasible solutions for NP-hard problems
like the set cover problem.

Lecture 46: Analysis of Rounding (Lovász's Method)


In this lecture, we delve deeper into rounding techniques, specifically focusing on a well-
known approach by Lovász. This rounding method involves selecting sets in a cover with
probabilities proportional to the values of the corresponding variables in the LP relaxation.
We analyze this method to understand its effectiveness in the set cover problem,
particularly in terms of the expected size of the set cover and the probability of obtaining a
feasible solution.

1. Overview of Lovász’s Rounding Method

Lovász's rounding approach is based on the LP relaxation of the set cover problem. The idea
is to treat the LP solution as a probability distribution and select sets in the cover according

145/212
to the probabilities corresponding to their LP variables.

Given an LP solution with fractional values for the sets xi ​ ∈ [0, 1], the goal is to round these
fractional values to obtain an integral solution, i.e., a set cover, while minimizing the
increase in the size of the cover.

2. Rounding with Probabilities

In Lovász’s method, the process of rounding is as follows:

For each set Si , we associate a probability with it, equal to the fractional value xi from
​ ​

the LP relaxation.

Specifically, we include set Si in the cover with probability xi .


​ ​

This probabilistic selection ensures that the expected number of times set Si is included

in the final cover is proportional to xi , which corresponds to the LP solution.


3. Expected Size of the Set Cover

One key advantage of this rounding method is that the expected size of the cover does not
increase substantially compared to the LP solution.

Expected Size: The expected number of sets selected is equal to the sum of the LP
variables:

E[size of cover] = ∑ xi ​ ​

i
Since the LP relaxation gives a solution that is optimal, this expected size is close to the
optimal size of the set cover in the fractional setting.

4. Feasibility and High Probability of Coverage

While the method provides a solution with a guaranteed expected size, we must also ensure
that every element in the universe is covered with high probability.

The probability of coverage for each element is determined by the union of the
probabilities of sets that cover the element.

Since the sets are chosen probabilistically, there is a non-zero chance that an element
may not be covered. However, the probability of failure can be made arbitrarily small by
adjusting the rounding process or selecting more sets.

5. Analysis of Rounding with Lovász’s Method

Probability of Coverage: Each element is covered by at least one of the selected sets
with high probability. This can be shown by the union bound or other probabilistic

146/212
arguments.

Set Size: The expected size of the cover obtained through this rounding method is close
to the LP value. Therefore, the approximation ratio does not increase significantly.

Approximation Guarantee: Lovász's rounding method is a powerful tool because it


provides an approximation guarantee that is close to optimal. The expected size of the
set cover obtained through this method is almost the same as the optimal LP solution.

6. Key Insights from the Analysis

Feasibility with High Probability: The rounding method ensures that the probability of
covering every element is high. The number of sets chosen in the final solution is
proportional to the fractional solution of the LP, and the expected cover size is optimal
or nearly optimal.

Non-Increase in Set Cover Size: The rounding method does not cause a large increase
in the number of sets chosen. It only selects sets based on the LP solution, which
already provides a good approximation of the minimum cover.

Analysis of Set Sizes: Lovász’s method shows that rounding can lead to a cover with an
expected size that is very close to the LP relaxation’s solution, thus ensuring that the
integral solution is good in terms of both feasibility and cost.

7. Conclusion

Lovász’s rounding method is an effective technique for approximating the set cover
problem. By selecting sets probabilistically based on their fractional LP values, we achieve an
integral solution with the following key properties:

The expected size of the set cover does not increase significantly compared to the LP
solution.

Every element is covered with high probability.

The rounding method provides a good approximation of the optimal cover.

Thus, this method offers a practical way to obtain feasible and near-optimal solutions for the
set cover problem, leveraging the structure of linear programming and probabilistic
rounding.

Lecture 47: Algorithm for Set Cover


In this lecture, we discuss an improved algorithm for the set cover problem that uses
repeated sampling to address the issue of ensuring high probability of coverage for all

147/212
elements in the universe. This method builds upon the rounding technique from the
previous lecture, and enhances it by executing the rounding process multiple times, thereby
improving the chances of obtaining a feasible and near-optimal solution.

1. Problem Recap: Set Cover

In the set cover problem, we are given a universe of elements U and a collection of sets
S1 , S2 , … , Sm such that each set Si is a subset of U . The objective is to select the fewest
​ ​ ​ ​

number of sets whose union covers all elements in U .

2. LP Relaxation for Set Cover

We previously discussed the linear programming (LP) relaxation of the set cover problem,
where we treat the problem as minimizing the sum of the LP variables xi associated with

each set Si , subject to the constraints that each element of U is covered by at least one set

Si , with xi ∈ [0, 1]. The LP relaxation can provide a fractional solution, where the solution
​ ​

values xi may not necessarily be integer, but the LP value serves as a good approximation

for the optimal integer solution.

3. The Issue of Incomplete Coverage

In the previous lecture, we saw that by rounding the fractional LP solution, we can obtain a
solution where the sets are selected probabilistically. However, one issue that arises is that,
while the expected size of the cover is close to the optimal solution, there is a probability
that some elements of the universe may not be covered in a single rounding process. This
happens due to the probabilistic nature of the rounding, and the chance of missing coverage
increases if the selection probabilities are small.

4. Repeated Sampling for High Probability

To improve the likelihood that every element is covered, we can repeat the rounding
process multiple times. The idea is that by executing the sampling process several times,
the probability of failing to cover an element decreases exponentially.

How Repeated Sampling Works:

For each round of sampling, we independently select sets Si with probability xi (the
​ ​

value of the fractional solution from the LP).

If an element e ∈ U is not covered in one round, the probability of it not being


covered in all rounds reduces.

After a sufficiently large number of rounds, the probability that any element is not
covered is very small, ensuring high probability of coverage.

148/212
5. Algorithm Outline

The algorithm proceeds as follows:

1. Solve the LP Relaxation: Compute the fractional solution for the set cover problem
using linear programming, obtaining the values xi for each set Si .
​ ​

2. Repeated Rounding:

Repeat the rounding process k times, where k is chosen based on the desired
probability of covering all elements.

In each round, select each set Si with probability xi .


​ ​

3. Combine the Results:

Combine the sets chosen in each round. The union of the sets selected across all
rounds will cover all elements of the universe with high probability.

4. Return the Result: After k rounds, the union of the selected sets forms a cover for the
universe.

6. Analysis of the Algorithm

Probability of Coverage:

In each round, the probability that an element is not covered is reduced because
more sets are selected. Specifically, if an element e is not covered in one round, the
probability of missing it in the next round becomes progressively smaller.

After k rounds, the probability of an element not being covered is at most (1 − p)k ,
where p is the probability of an element being covered in each round.

By choosing k sufficiently large, the probability of missing coverage becomes


arbitrarily small.

Expected Size of the Cover:

The expected size of the set cover is the sum of the LP values xi , just as in the

original rounding method. Thus, the expected size of the final set cover remains
close to the LP solution.

The number of sets selected in each round is proportional to the LP solution, and
the total number of sets selected after k rounds is still near optimal in expectation.

Time Complexity:

Solving the LP relaxation takes polynomial time, and repeating the rounding process
k times involves selecting sets multiple times, which also runs efficiently in

149/212
polynomial time. The number of rounds k can be chosen to balance between
probability of coverage and computational efficiency.

7. Conclusion

This algorithm for the set cover problem uses repeated sampling to ensure high probability
of covering all elements in the universe. The key steps involve:

Solving the LP relaxation for the set cover problem.

Repeatedly rounding the fractional solution to obtain an integral set cover.

Combining the results to ensure complete coverage with high probability.

This technique effectively addresses the issue of incomplete coverage by the initial rounding
and ensures a feasible solution with high probability. Furthermore, the expected size of the
cover remains close to the LP solution, making this approach a practical and efficient
method for solving the set cover problem with high-probability guarantees.

Lecture 48: Linear Regression through LP


In this lecture, we explore how linear programming can be applied to the linear regression
problem, which is a fundamental technique in machine learning. Linear regression aims to
find the best-fit line for a given set of data points. We will show how this problem can be
framed as a linear program (LP) under a specific notion of error.

1. Overview of Linear Regression

Linear regression involves fitting a line to a set of data points such that the line best
represents the relationship between the dependent variable y and the independent
variable(s) x. The general form of a linear model is:

y = θ 0 + θ 1 x1 + θ 2 x2 + ⋯ + θ n xn
​ ​ ​ ​ ​ ​ ​

where:

y is the output (dependent variable),


x1 , x2 , … , xn are the input features (independent variables),
​ ​ ​

θ0 , θ1 , … , θn are the parameters (coefficients) of the model.


​ ​ ​

In simple linear regression, we have only one independent variable x, and the model
reduces to:

y = θ0 + θ1 x
​ ​

150/212
The goal is to find the best values for the parameters θ0 and θ1 , which minimize the error ​ ​

^ and the actual observed values y .


between the predicted values y ​

2. Error Measurement in Linear Regression

In regression, the error typically refers to the difference between the predicted and actual
values. There are various ways to measure the error, but one common method is to use the
sum of absolute deviations (L1 norm) or sum of squared errors (L2 norm). In this lecture,
we focus on the L1 norm, which measures the absolute error for each data point:
m
Error = ∑ ∣yi − y^i ∣ ​ ​ ​ ​

i=1

^i
where m is the number of data points, and y ​ ​ = θ0 + θ1 xi is the predicted value for the i-th
​ ​ ​

data point.

Minimizing this error function means finding the parameters θ0 and θ1 such that the total ​ ​

absolute error is as small as possible.

3. Formulating Linear Regression as an LP

We now frame the linear regression problem as a linear program. Given a set of data points
(x1 , y1 ), (x2 , y2 ), … , (xm , ym ), our goal is to minimize the total absolute error between the
​ ​ ​ ​ ​ ​

observed values yi and the predicted values y


​ ^i = θ0 + θ1 xi . This can be formulated as an LP
​ ​ ​ ​ ​

with the following steps:

1. Decision Variables:

Let ei represent the error for the i-th data point, i.e., ei
​ ​ = ∣yi − (θ0 + θ1 xi )∣.
​ ​ ​ ​

2. Objective Function:

The objective is to minimize the sum of the absolute errors across all data points:
m
min ∑ ei ​ ​

i=1

3. Constraints:

For each data point, the absolute error ei can be modeled using two linear
constraints:

ei ≥ yi − (θ0 + θ1 xi )
​ ​ ​ ​ ​

ei ≥ (θ0 + θ1 xi ) − yi
​ ​ ​ ​ ​

151/212
These constraints ensure that ei is at least as large as the absolute difference between

^i and the actual value yi .


the predicted value y ​ ​ ​

Hence, the linear program is formulated as:


m
min ∑ ei ​ ​

i=1

subject to:

ei ≥ yi − (θ0 + θ1 xi ) ∀i
​ ​ ​ ​ ​

ei ≥ (θ0 + θ1 xi ) − yi
​ ​ ​ ​ ​ ∀i

The decision variables are θ0 , θ1 , and ei for each i.


​ ​ ​

4. Interpretation and Solving the LP

The linear program seeks to find the values of θ0 and θ1 that minimize the total absolute
​ ​

error ∑ ei , subject to the constraints that each error ei is large enough to capture the
​ ​

absolute difference between the observed and predicted values. Once the LP is solved, the
optimal values of θ0 and θ1 provide the best-fit line in terms of minimizing the absolute
​ ​

errors.

5. Comparison with Least Squares Method

In contrast to the L1 norm approach we used here (absolute error), the least squares
method minimizes the sum of squared errors (L2 norm), which results in a different
objective function:

m
min ∑(yi − (θ0 + θ1 xi ))2
​ ​ ​ ​ ​

i=1

This method leads to a closed-form solution for θ0 and θ1 , but the L1 norm approach, while ​ ​

computationally more intensive (due to the linear programming), is more robust to outliers
in the data.

6. Applications of Linear Regression via LP

Using linear programming for linear regression has practical applications, especially in cases
where you want to minimize errors in a more robust way (using the L1 norm) or when
dealing with large datasets that may not fit well under least squares. Additionally, LP
formulations for regression can be extended to more complex models, such as support

152/212
vector machines (SVMs), and can be adapted for regularized regression (e.g., Lasso
regression).

7. Conclusion

In this lecture, we demonstrated how linear regression can be formulated as a linear


program using the L1 norm (absolute error). This approach provides a robust method for
fitting a line to data, especially in the presence of outliers. Linear programming offers an
effective and flexible framework for solving regression problems, with applications in
machine learning, optimization, and data analysis.

Lecture 49: Linear Classifiers through LP


In this lecture, we explore the application of linear programming to linear classification
problems. Given a set of data points with two distinct labels, the goal is to find a hyperplane
that separates the points belonging to each class. Linear programming can be employed to
solve this classification problem by formulating it as a linear program (LP).

1. Overview of Linear Classification

Linear classification involves partitioning a dataset into two classes using a linear boundary.
In a two-dimensional space, this boundary is a straight line, while in higher dimensions, it
becomes a hyperplane. The task is to find a hyperplane that separates the points of the two
classes in such a way that points on one side of the hyperplane belong to one class, and
points on the other side belong to the other class.

For simplicity, let's consider the case where we have a set of two classes C1 and C2 , and we
​ ​

wish to find a hyperplane that separates these classes perfectly (i.e., no data points are
misclassified).

2. Mathematical Formulation of Linear Classification

Given a dataset {(x1 , y1 ), (x2 , y2 ), … , (xm , ym )}, where:


​ ​ ​ ​ ​ ​

xi ∈ Rn represents the feature vector of the i-th data point (an n-dimensional vector),

yi ∈ {+1, −1} represents the label of the i-th data point (with two possible classes, +1

and −1),

the goal is to find a hyperplane w ⋅ x + b = 0 that separates the two classes. Here:
w ∈ Rn is a vector normal to the hyperplane,
b is the bias term (scalar), and
w ⋅ x denotes the dot product between the weight vector w and the feature vector x.

153/212
For the dataset to be linearly separable, the following constraints must hold:

For each data point (xi , yi ), the label yi determines on which side of the hyperplane the
​ ​ ​

point xi lies:

If yi ​ = +1, we want w ⋅ xi + b ≥ 1,

If yi ​ = −1, we want w ⋅ xi + b ≤ −1.


Thus, the linear classification problem can be expressed as the following set of constraints:

yi (w ⋅ xi + b) ≥ 1
​ ​ ∀i = 1, 2, … , m

3. Linear Program for Linear Classification

To solve the linear classification problem using linear programming, we aim to find the
optimal hyperplane that separates the classes. The objective of the linear program is to
maximize the margin between the two classes while satisfying the constraints.

Margin: The margin of a hyperplane is the distance between the closest data point (from
either class) and the hyperplane. Maximizing this margin ensures that the classifier is as
far as possible from the nearest data points, leading to better generalization.

The constraints ensure that all data points are correctly classified, and the objective is to
1
maximize the margin, which is proportional to ∥w∥ (the inverse of the magnitude of the

weight vector).

Thus, the linear program can be formulated as:

min ∥w∥

subject to:

yi (w ⋅ xi + b) ≥ 1
​ ​
∀i = 1, 2, … , m

This is a convex optimization problem, which can be solved efficiently using linear
programming methods.

4. Understanding the Optimization Problem

1. Objective: The objective function ∥w∥ is minimized, which corresponds to maximizing


the margin. The smaller the w (in terms of magnitude), the larger the margin.

2. Constraints: The constraints yi (w ⋅ xi + b) ≥ 1 ensure that all data points are classified
​ ​

correctly and lie on the correct side of the hyperplane. If a data point is from class C1 , ​

154/212
the constraint requires that w ⋅ xi + b is greater than or equal to 1, and if the point is

from class C2 , the constraint requires that w ⋅ xi + b is less than or equal to -1.
​ ​

3. Solution: Solving this linear program gives the optimal values for w and b, which define
the hyperplane that separates the data points of the two classes with the maximum
margin.

5. Support Vector Machine (SVM) and LP

The linear program described above is closely related to the Support Vector Machine (SVM),
a widely used machine learning algorithm. SVM aims to find the optimal hyperplane that
separates the data with the largest possible margin. The formulation above is the basis for
the hard-margin SVM, which assumes that the data is linearly separable.

For non-linearly separable data, SVM introduces slack variables to allow some
misclassification while still maximizing the margin. The resulting optimization problem
becomes a soft-margin SVM, which can also be solved using linear programming with
additional constraints.

6. Applications of Linear Classifiers

Linear classifiers are widely used in machine learning for classification tasks, including:

Text classification (e.g., spam detection),

Image recognition (e.g., face recognition),

Medical diagnosis (e.g., classifying diseases based on symptoms or test results).

By formulating the problem as a linear program, these classification tasks can be efficiently
solved using LP solvers, making them applicable to large-scale datasets.

7. Conclusion

In this lecture, we demonstrated how linear programming can be applied to solve linear
classification problems. By formulating the task of finding a separating hyperplane as a
linear program, we can efficiently find the optimal solution that maximizes the margin
between the two classes. This approach forms the foundation for Support Vector Machines
(SVMs) and provides a powerful method for solving classification problems in machine
learning.

Lecture 50: Sensitivity Analysis with Examples


In this lecture, we delve into sensitivity analysis within the context of linear programming.
Sensitivity analysis is a crucial tool for understanding how changes in the parameters of a

155/212
linear program (LP) affect the optimal solution. Specifically, we will look at how modifications
to the coefficients in the objective function or constraints influence the optimal solution and
its value.

1. Introduction to Sensitivity Analysis

Sensitivity analysis in linear programming examines the effect of small changes in the
parameters of the problem (such as coefficients in the objective function or the right-hand
side values in constraints) on the optimal solution. It helps answer key questions like:

How does a change in the coefficients of the objective function affect the optimal value?

How does a change in the right-hand side of a constraint impact the solution?

How stable is the solution to small perturbations in the problem?

The goal of sensitivity analysis is to understand the robustness of the optimal solution,
identify critical parameters, and provide insights for decision-making under uncertainty.

2. Key Concepts in Sensitivity Analysis

In the context of a linear program, the following parameters are typically analyzed for
sensitivity:

Objective Function Coefficients: How sensitive is the optimal value to changes in the
coefficients of the objective function?

Right-hand Side of Constraints: How does a change in the constraint bounds (the right-
hand side values) affect the optimal solution?

Feasibility and Optimality Conditions: What are the boundaries for changes that still
keep the solution feasible and optimal?

3. Sensitivity to Changes in Objective Function Coefficients

Consider a standard linear program in the following form:

Maximize cT x

subject to:

Ax ≤ b

x≥0

Where c is the vector of coefficients in the objective function, A is the matrix of coefficients
in the constraints, and b is the right-hand side vector.

156/212
If the optimal solution to this linear program is x∗ , sensitivity analysis involves determining
how changes in the vector c affect x∗ and the optimal value.

Allowable Range for Objective Coefficients: This refers to how much a given objective
function coefficient can change without altering the optimal basis of the solution.

Shadow Price: The shadow price (or dual value) of a constraint measures the change in
the objective function's optimal value per unit increase in the right-hand side of the
constraint. Sensitivity analysis on the right-hand side of constraints helps in
understanding how feasible solutions change when the resources (or limits) in the
constraints change.

4. Sensitivity to Changes in Right-hand Side of Constraints

Next, let's examine the sensitivity of the solution with respect to changes in the right-hand
side vector b, which represents the resource limits in the constraints.

Consider the LP:

Maximize cT x

subject to:

Ax ≤ b

x≥0

Feasibility Region: As b changes, the feasible region of the LP also changes. Sensitivity
analysis helps us understand how much we can increase or decrease the values of b
without violating the feasibility of the current solution.

Shadow Price: The shadow price provides valuable information in this context. If the
right-hand side of a constraint increases, the shadow price tells us how much the
objective function will improve per unit increase. Similarly, it indicates the cost of
relaxing the constraint.

5. Types of Sensitivity Analysis

There are several common types of sensitivity analysis used in linear programming:

1. Range of Optimality: This analysis focuses on the allowable range of coefficients in the
objective function for which the current optimal solution remains unchanged. If the
coefficient of a variable in the objective function is modified, the solution may still be
optimal within a specific range of values.

157/212
2. Range of Feasibility: This type of analysis examines the effect of changes in the right-
hand side of the constraints on the feasible region. It determines the limits within which
the current solution remains feasible.

3. Dual Prices or Shadow Prices: This refers to the change in the objective function value
resulting from a one-unit increase in the right-hand side of a constraint, assuming the
rest of the problem remains unchanged.

6. Examples of Sensitivity Analysis

Let’s walk through an example to demonstrate sensitivity analysis.

Example 1:

Consider the following linear program:

Maximize z = 3x1 + 4x2 ​ ​

subject to:

x1 + x2 ≤ 5
​ ​

2x1 + x2 ≤ 6
​ ​

x1 , x2 ≥ 0
​ ​

1. Initial Optimal Solution: Solve this linear program using the Simplex method or any LP
solver, and assume the optimal solution is x1 ​ = 2, x2 = 3, with the objective value z =

18.
2. Sensitivity to Changes in Objective Coefficients:

Suppose the coefficient of x1 is changed from 3 to 3.5. We would check whether the

current solution (x1 ​ = 2, x2 = 3) still remains optimal. If the new objective


function coefficient does not change the optimal solution, it is within the allowable
range.

3. Sensitivity to Changes in Right-hand Side:

Suppose the right-hand side of the first constraint is changed from 5 to 6. Sensitivity
analysis tells us whether this change will cause the current solution to become
infeasible. If the solution is still feasible, we calculate the shadow price to see how
much the objective function would improve per unit increase in the right-hand side.

Example 2:

158/212
Consider an LP for a transportation problem:

Minimize z = 8x1 + 10x2 + 6x3 ​ ​

subject to:

x1 + x2 + x3 = 20
​ ​ ​

x1 + 2x2 = 15
​ ​

x2 + x3 = 10
​ ​

x1 , x2 , x3 ≥ 0
​ ​ ​

Here, we could perform sensitivity analysis to examine:

The effect of changing the supply or demand on the optimal transportation plan.

How much the cost coefficients can change before the optimal solution changes.

7. Conclusion

Sensitivity analysis is an important tool for understanding the stability and robustness of
solutions in linear programming. By evaluating how small changes in the problem's
parameters affect the solution, decision-makers can better handle uncertainties and make
more informed decisions. Through sensitivity analysis, we gain insights into which
parameters are most influential, enabling more flexible and adaptive planning in various
applications, such as resource allocation, optimization, and game theory.

Lecture 51: Integer Programming: A Comparative Study


In this lecture, we explore Integer Programming (IP), comparing it with Linear
Programming (LP). Integer programming is a special type of optimization problem where
some or all of the decision variables are constrained to take integer values. This lecture will
cover the essential differences between LP and IP, challenges in solving integer programs,
and when integer programming becomes relevant in practice.

1. Overview of Linear Programming vs. Integer Programming

Linear programming involves optimization problems where all decision variables are
continuous, meaning they can take any real values within specified bounds. A typical LP
formulation is:

Maximize (or Minimize) cT x

subject to:

159/212
Ax ≤ b

x≥0

In contrast, Integer Programming (IP) is a class of optimization problems where some or all
decision variables are restricted to be integer values. An IP problem is typically formulated
as:

Maximize (or Minimize) cT x

subject to:

Ax ≤ b

x ∈ Zn or x ∈ {0, 1}n

Where:

x ∈ Zn means that the variables are integers.


x ∈ {0, 1}n represents a binary integer programming problem (often referred to as 0-1
IP), where the variables can only take values 0 or 1.

2. Key Differences Between LP and IP

Variable Types: In LP, the variables are continuous, whereas in IP, the variables are
restricted to integer values (which may even be binary in certain problems).

Feasibility and Solution Space: LP has a continuous feasible region that is often convex
and can be solved efficiently using algorithms like the Simplex method or Interior Point
methods. In contrast, IP has a discrete feasible region, and the solution space is not
convex. This makes IP problems significantly harder to solve.

Solving Methods: While LP problems can be solved in polynomial time using algorithms
like Simplex or Interior Point methods, Integer Programming is much harder. Most IP
problems are NP-hard, meaning they do not have polynomial-time algorithms for
solving them in the general case.

Complexity: The complexity of LP problems is relatively low compared to IP, which is


computationally more intensive. The solution process for integer programming involves
additional steps like branch-and-bound, branch-and-cut, or cutting planes, all of which
can significantly increase the solution time.

3. Challenges in Solving Integer Programming Problems

160/212
Computational Difficulty: Integer programming is computationally challenging due to
the discrete nature of the variables. While LPs are solvable in polynomial time, integer
programming problems typically require more advanced techniques like branch-and-
bound, branch-and-cut, or cutting-plane methods, all of which can be computationally
expensive.

NP-Hardness: The majority of integer programming problems are NP-hard, meaning


they cannot be solved in polynomial time unless P = NP. Even simple-looking IP
problems, such as the 0-1 knapsack problem, can be computationally intense.

Optimality and Approximation: Unlike LPs, where the optimal solution is guaranteed to
be found at a vertex of the feasible region, IPs may require exhaustive search techniques
like backtracking or dynamic programming, and finding an optimal integer solution can
take longer.

4. Solving Integer Programming Problems

Several approaches exist to solve integer programming problems. Here are the most
common techniques:

Branch-and-Bound: This method systematically explores branches of a decision tree,


where each node represents a subset of the feasible solution space. At each step, the
algorithm either narrows down the feasible space or discards non-promising branches
based on bounds, eventually converging to an optimal solution. It is widely used for
solving general integer programming problems.

Branch-and-Cut: This is an extension of the branch-and-bound method, specifically


designed for solving Mixed-Integer Linear Programming (MILP) problems. The
algorithm combines branching with cutting planes, where cuts are added to the LP
relaxation to improve the solution.

Cutting Planes: This method iteratively refines the LP relaxation of an integer


programming problem by adding linear inequalities (cuts) that exclude infeasible
solutions without eliminating any feasible integer solutions. This technique helps tighten
the LP relaxation, leading to a better approximation of the integer solution.

Relaxation Methods: In some cases, the integer constraints can be relaxed to


continuous constraints (by allowing the variables to take fractional values). This creates a
linear program (LP relaxation), which can be solved efficiently. The solution to the LP
relaxation provides a bound on the optimal integer solution, and further techniques like
rounding or branching can be applied to obtain the integer solution.

161/212
5. Applications of Integer Programming

Despite its computational complexity, integer programming has widespread applications in


various industries and fields, such as:

Supply Chain Optimization: Integer programming is used to solve problems like


warehouse location, vehicle routing, and facility planning, where decision variables need
to take integer values (e.g., number of trucks, facilities).

Resource Allocation: IP is used in various resource allocation problems where the


resources (such as machines, workers, or time slots) are limited and must be allocated in
integer amounts.

Scheduling Problems: Scheduling involves assigning tasks to resources over time, and
since both tasks and resources are usually discrete, integer programming is often used
for these types of problems.

Project Selection and Investment: In capital budgeting, integer programming can be


used to select projects or investments that maximize return subject to budget and
resource constraints, where decisions are typically binary (select or not).

Cutting Stock and Knapsack Problems: These problems involve optimizing cutting or
packing operations and often have natural integer constraints, requiring IP to model the
problem.

6. Comparing LP and IP for Specific Problems

Let’s compare LP and IP with a classic example:

Example 1: Knapsack Problem The 0-1 Knapsack Problem is a famous combinatorial


optimization problem where the objective is to maximize the total value of items placed into
a knapsack, subject to a weight constraint. The problem can be modeled as:

n
Maximize ∑ vi xi ​ ​ ​

i=1

subject to:
n
∑ w i xi ≤ W
​ ​ ​

i=1

xi ∈ {0, 1}
​ for all i

Here, xi is a binary decision variable representing whether item i is included in the


knapsack.

162/212
Linear Programming Relaxation: If we relax the integrality constraint and allow xi to ​

take fractional values between 0 and 1, this becomes a typical LP problem, and we can
solve it efficiently using methods like Simplex or Interior Point methods.

Integer Programming Solution: To obtain the optimal 0-1 solution, we must solve the
problem as an integer program, which involves more computational effort and may
require techniques like branch-and-bound.

7. Conclusion

Integer programming provides a powerful framework for solving optimization problems with
discrete variables, but it comes with significant computational challenges compared to linear
programming. While LP can be solved efficiently in polynomial time, IP often requires more
sophisticated techniques like branch-and-bound or cutting planes, which can be
computationally expensive. Despite these challenges, IP has a wide range of practical
applications across industries, particularly in areas like logistics, scheduling, resource
allocation, and project selection. Understanding the differences between LP and IP, as well as
the techniques used to solve IP problems, is crucial for tackling real-world optimization
problems involving integer variables.

Lecture 52: Solving an Integer Programming Problem


In this lecture, we focus on solving a specific Integer Programming (IP) problem. We will go
through the process of formulating, solving, and interpreting an integer programming
problem, demonstrating the various methods and techniques used to find an optimal
solution. This lecture aims to bridge the theory with practical implementation, showing how
integer programming can be used to solve real-world optimization problems.

1. Formulating an Integer Programming Problem

Integer programming problems are typically formulated in the following general structure:

Maximize (or Minimize) cT x

subject to:

Ax ≤ b

x ∈ Zn or x ∈ {0, 1}n

Where:

x is the vector of decision variables (which are either integer or binary).

163/212
A is a matrix representing the coefficients of the constraints.
b is the vector representing the right-hand side of the constraints.
cT is the vector representing the objective coefficients.

The decision variables are constrained to integer values, and the problem seeks to maximize
(or minimize) the objective function cT x subject to the constraints.

2. Example Problem

Let’s work through a concrete example of an integer programming problem:

Example: The Knapsack Problem

We have a knapsack with a maximum weight capacity W = 50, and we want to maximize
the total value of items that can be placed in the knapsack. Each item has a weight and a
value associated with it. The objective is to decide which items to include in the knapsack,
subject to the weight constraint.

Item Weight Value

1 10 60

2 20 100

3 30 120

3. Formulation of the Knapsack Problem as an IP

We want to maximize the total value of the selected items while ensuring the total weight
does not exceed the knapsack capacity. The problem can be formulated as:

Maximize 60x1 + 100x2 + 120x3


​ ​ ​

subject to:

10x1 + 20x2 + 30x3 ≤ 50


​ ​

x1 , x2 , x3 ∈ {0, 1}
​ ​ ​

Where:

x1 , x2 , x3 are binary decision variables representing whether each item is included in


​ ​ ​

the knapsack (1 if included, 0 if not).

The objective function maximizes the total value of the items selected.

The constraint ensures the total weight of the selected items does not exceed the
capacity of the knapsack.

164/212
4. Solving the Problem Using LP Relaxation

To start solving, we often begin by relaxing the integer constraints, allowing the decision
variables x1 , x2 , x3 to take any values between 0 and 1. This converts the problem into a
​ ​ ​

Linear Programming (LP) problem.

Maximize 60x1 + 100x2 + 120x3


​ ​ ​

subject to:

10x1 + 20x2 + 30x3 ≤ 50


​ ​

0 ≤ x1 , x2 , x3 ≤ 1
​ ​ ​

This is now a typical LP problem, which can be solved using methods such as Simplex or
Interior Point algorithms. The solution provides a fractional solution, meaning that the
values of x1 , x2 , x3 may not be integers. For example, we might get a solution where x1
​ ​ ​ ​
=
0.5, x2 = 1, and x3 = 0.2.
​ ​

5. Rounding or Branching

If the LP relaxation results in fractional values for the decision variables, we need to find a
way to obtain integer solutions. There are two main methods for this:

Rounding: This method involves rounding the fractional values to the nearest integers
(either 0 or 1). While this method is simple, it does not always yield an optimal solution.

Branch-and-Bound: If the rounding does not give a feasible solution, we use branch-
and-bound, which is an iterative procedure for systematically solving the integer
program by breaking the problem into smaller subproblems. The method divides the
feasible region into subproblems and bounds the solution space based on the relaxation
of integer constraints.

6. Branch-and-Bound Algorithm

The Branch-and-Bound algorithm involves the following steps:

1. Relaxation: Solve the LP relaxation of the IP problem, which provides a bound on the
optimal solution.

2. Branching: If the solution is fractional, branch by dividing the problem into two
subproblems, one where a decision variable is forced to be 0 and another where it is
forced to be 1.

3. Bounding: Compute upper and lower bounds for each subproblem to determine if
further branching is needed.

165/212
4. Pruning: Discard subproblems that cannot lead to a better solution than the current
best solution (this is called pruning).

The process continues until all subproblems are either solved or pruned.

7. Cutting Planes

Another technique used to solve integer programming problems is cutting planes. The idea
is to iteratively add linear inequalities (cuts) to the LP relaxation to exclude fractional
solutions without excluding any feasible integer solutions. This process refines the solution
space and can eventually lead to an integer solution.

8. Interpretation of the Solution

Once the integer programming problem is solved, we interpret the values of the decision
variables:

If the values of x1 , x2 , x3 are integers (0 or 1), then the solution is directly interpretable
​ ​ ​

as a set of items to include in the knapsack.

If the values are fractional, we may apply rounding or use branching methods to refine
the solution to integer values.

If the problem was solved using branch-and-bound or cutting planes, the final solution
provides the optimal set of items for the knapsack.

9. Summary

In this lecture, we learned how to approach solving an integer programming problem, using
the Knapsack Problem as a case study. We:

Formulated the problem as an integer program.

Relaxed the integer constraints and solved the LP relaxation.

Discussed methods like rounding, branch-and-bound, and cutting planes for handling
fractional solutions.

Understood the importance of finding an optimal solution in integer programming and


the computational methods involved.

Integer programming is a powerful tool for solving combinatorial optimization problems, but
its computational complexity requires sophisticated techniques to find the optimal solution
efficiently.

166/212
Lecture 53: Transportation and Assignment Models
In this lecture, we discuss two important applications of Linear Programming (LP): the
Transportation Problem and the Assignment Problem. Both of these are classical
optimization problems with a variety of real-world applications, including logistics, resource
allocation, and operations research.

1. Transportation Problem

The Transportation Problem is a type of optimization problem where goods must be


transported from several suppliers to several consumers in a way that minimizes
transportation costs while satisfying supply and demand constraints.

Problem Formulation:

Consider the following:

We have m suppliers, each with a certain supply si (the amount of goods available).

We have n consumers, each with a certain demand dj (the amount of goods required).

The cost of transporting one unit of goods from supplier i to consumer j is given by cij . ​

The objective is to determine the amount of goods xij to transport from supplier i to

consumer j , such that the total transportation cost is minimized while satisfying all supply
and demand constraints.

Mathematical Formulation:

Minimize the total transportation cost:

m n
Minimize Z = ∑ ∑ cij xij
​ ​ ​ ​

i=1 j=1

subject to:

Supply Constraints: The total amount supplied by each supplier must not exceed its
supply:

n
∑ xij = si
​ ​ ​ for each i = 1, 2, … , m
j=1

Demand Constraints: The total amount received by each consumer must equal its
demand:

167/212
m
∑ xij = dj
​ ​ ​ for each j = 1, 2, … , n
i=1

Non-negativity Constraints: The amount of goods transported between any pair must
be non-negative:

xij ≥ 0
​ for each i, j

Solution Approach:

The transportation problem can be solved using Linear Programming (LP) techniques, but it
can also be efficiently solved using specialized algorithms, such as the Transportation
Simplex Method or the Modified Distribution Method (MODI Method). These algorithms
exploit the structure of the transportation problem to find the optimal solution faster than
general LP solvers.

2. Assignment Problem

The Assignment Problem is a special case of the transportation problem where the number
of suppliers equals the number of consumers (i.e., m = n). The goal is to assign n workers
to n tasks in such a way that the total cost is minimized (or profit is maximized).

Problem Formulation:

Consider the following:

We have n workers and n tasks.

The cost of assigning worker i to task j is cij . ​

The objective is to assign workers to tasks in such a way that the total cost is minimized while
ensuring that each worker is assigned exactly one task and each task is assigned to exactly
one worker.

Mathematical Formulation:

Minimize the total assignment cost:

n n
Minimize Z = ∑ ∑ cij xij ​ ​ ​ ​

i=1 j=1

subject to:

Worker Constraints: Each worker is assigned exactly one task:

168/212
n
∑ xij = 1 ​ ​ for each i = 1, 2, … , n
j=1

Task Constraints: Each task is assigned to exactly one worker:

n
∑ xij = 1
​ ​ for each j = 1, 2, … , n
i=1

Binary Constraints: Each xij is either 0 or 1, indicating whether worker i is assigned to


task j :

xij ∈ {0, 1}​ for each i, j

Solution Approach:

The assignment problem is a special type of integer programming (IP) problem. However,
because of its specific structure (binary decision variables and a square matrix of costs), it
can be solved more efficiently using algorithms such as:

The Hungarian Method (Kuhn-Munkres Algorithm): This is a polynomial-time algorithm


that solves the assignment problem in O(n3 ) time.

Linear Programming Relaxation: Though the assignment problem is an integer


program, it can also be solved as a relaxed LP, where the decision variables xij are

allowed to take continuous values between 0 and 1. The relaxed LP solution can be
converted back into an integer solution using rounding or other techniques.

3. Key Differences Between Transportation and Assignment Problems

Number of Suppliers and Consumers: In the transportation problem, the number of


suppliers does not necessarily equal the number of consumers, whereas in the
assignment problem, they are always equal.

Cost Matrix Structure: The transportation problem deals with a rectangular cost matrix,
while the assignment problem deals with a square cost matrix.

Objective: Both problems aim to minimize costs, but the transportation problem also
considers supply and demand constraints for each supplier and consumer, while the
assignment problem ensures each worker is assigned exactly one task and vice versa.

4. Applications

Transportation Problem:

169/212
Logistics and supply chain management: Optimizing the distribution of goods from
warehouses to retail locations.

Resource allocation: Minimizing transportation costs in industrial settings.

Assignment Problem:

Job scheduling: Assigning workers to tasks in a way that minimizes cost or


maximizes efficiency.

Airport gate assignments: Assigning planes to gates in an airport to minimize the


cost or maximize the operational efficiency.

5. Summary

In this lecture, we:

Learned the Transportation Problem, its formulation, and solution techniques.

Discussed the Assignment Problem, its specific structure, and how it can be solved
efficiently.

Compared and contrasted the transportation and assignment models in terms of


formulation, constraints, and solution methods.

Explored real-world applications of both models in logistics, resource allocation, and


scheduling.

Both the transportation and assignment problems are essential in operations research, and
solving them efficiently can lead to significant cost savings and optimization in various fields.

Lecture 54: Solving a Transport Problem


In this lecture, we will focus on solving a Transportation Problem using practical methods.
The transportation problem is a type of Linear Programming (LP) problem that deals with
finding the optimal way to transport goods from multiple suppliers to multiple consumers,
minimizing the transportation costs while satisfying supply and demand constraints.

1. Problem Recap

Consider the following:

There are m suppliers, each with a certain supply si .​

There are n consumers, each with a certain demand dj . ​

The cost of transporting one unit of goods from supplier i to consumer j is cij . ​

170/212
The objective is to determine how much to transport from each supplier to each consumer
(represented by the decision variables xij ) such that the transportation cost is minimized

and the supply and demand constraints are satisfied.

Mathematical Formulation:

Minimize the total transportation cost:


m n
Minimize Z = ∑ ∑ cij xij ​ ​ ​ ​

i=1 j=1

subject to:

Supply Constraints: The total amount supplied by each supplier should not exceed its
supply:

n
∑ xij = si
​ ​ ​ for each i = 1, 2, … , m
j=1

Demand Constraints: The total amount received by each consumer must equal its
demand:

m
∑ xij = dj
​ ​ ​
for each j = 1, 2, … , n
i=1

Non-negativity Constraints: The amount of goods transported between any pair must
be non-negative:

xij ≥ 0
​ for each i, j

2. Methods for Solving the Transportation Problem

The transportation problem can be solved using various methods, some of which include:

The Simplex Method: Although a general linear programming method, it is often used
to solve the transportation problem directly.

The Transportation Simplex Method: A specialized version of the Simplex method


designed for the transportation problem. It works efficiently for transportation-specific
problems.

The Northwest Corner Rule: A heuristic method to find an initial feasible solution.

The Least Cost Method: Another heuristic to start with an initial feasible solution.

171/212
The Vogel Approximation Method (VAM): A more sophisticated heuristic that often
provides better initial solutions.

In this lecture, we will explore a couple of these methods for solving the transportation
problem.

3. Initial Feasible Solution Methods

To begin solving the transportation problem, we first need to find an initial feasible solution.
Several methods can be used to construct an initial basic feasible solution (BFS).

(a) Northwest Corner Rule

This is a simple and intuitive method to construct an initial feasible solution.

1. Start at the top-left corner of the transportation tableau (representing the first supplier
and the first consumer).

2. Allocate as much as possible to the first cell (x11 ), which is the minimum of the supply of

supplier 1 s1 and the demand of consumer 1 d1 . Set x11


​ ​ ​ = min(s1 , d1 ).
​ ​

3. If the supply s1 is exhausted, move to the next consumer in the same row (i.e.,

x12 , x13 , …). If the demand d1 is met, move to the next supplier in the same column.
​ ​ ​

4. Continue this process until all supplies and demands are satisfied.

(b) Least Cost Method

This method selects the cell with the lowest cost for allocation.

1. Identify the cell with the least transportation cost.

2. Allocate as much as possible to that cell (the minimum of the supply and demand for
that cell).

3. After the allocation, adjust the supply and demand and cross out the row or column that
has been completely satisfied.

4. Repeat the process until all supplies and demands are satisfied.

(c) Vogel's Approximation Method (VAM)

VAM is a more sophisticated approach that generally yields better initial solutions.

1. For each row and column, calculate the penalty cost, which is the difference between the
smallest and second smallest costs in that row or column.

172/212
2. Select the row or column with the largest penalty and allocate as much as possible to the
cell with the lowest cost in that row or column.

3. Update the supply and demand, and repeat the process until all supplies and demands
are satisfied.

4. Optimality Check and Improvement

Once an initial feasible solution is found using any of the methods above, we need to check if
it is optimal. If the solution is not optimal, we need to improve it.

The Transportation Simplex Method is typically used to iteratively improve the solution. It
works by pivoting between basic feasible solutions, much like the regular Simplex method
for linear programming.

The basic idea is to start with the initial feasible solution and look for a way to increase the
total transportation cost. This can be done by checking the reduced costs of the non-basic
variables (the cells not included in the current solution) and identifying whether any of them
can be added to the solution to improve the cost.

If a cycle is formed when adding a non-basic variable to the solution, then the solution is
optimal. If no improvement can be made, the algorithm terminates.

5. Example

Consider a simple transportation problem with three suppliers and three consumers.

Supplier / Consumer Consumer 1 Consumer 2 Consumer 3 Supply

Supplier 1 4 6 8 20

Supplier 2 2 4 6 30

Supplier 3 5 7 3 25

Demand 15 25 35

We need to minimize the transportation cost by allocating the supplies to meet the
demands. Using the Northwest Corner Rule, we start by allocating to x11 , then move

through the tableau until the total supply and demand are satisfied.

6. Summary

In this lecture, we:

Reviewed the transportation problem, its mathematical formulation, and constraints.

Explored different methods to find initial feasible solutions, including the Northwest
Corner Rule, Least Cost Method, and Vogel's Approximation Method (VAM).

173/212
Discussed the Transportation Simplex Method as a way to improve the initial solution
and find the optimal solution.

Applied these methods to a practical example to illustrate how they work.

The transportation problem is widely applicable in logistics, supply chain management, and
various industrial settings. Solving it efficiently can lead to significant cost savings and
optimization.

Lecture 55: Solving an Assignment Problem


In this lecture, we focus on solving the Assignment Problem using linear programming and
combinatorial optimization techniques. The assignment problem is a classic optimization
problem where the goal is to assign n tasks to n agents in such a way that the total cost is
minimized, subject to certain constraints.

1. Problem Definition

Given:

n agents.
n tasks.
A cost matrix C = [cij ], where cij represents the cost of assigning agent i to task j .
​ ​

The objective is to assign exactly one task to each agent, such that the total assignment cost
is minimized. The cost for the entire assignment is the sum of the individual costs cij for

each agent-task pair that is chosen.

Mathematical Formulation:

Minimize the total assignment cost:


n n
Minimize Z = ∑ ∑ cij xij
​ ​ ​ ​

i=1 j=1

subject to:

Assignment Constraints: Each agent is assigned exactly one task:

n
∑ xij = 1
​ ​ for each i = 1, 2, … , n
j=1

Task Constraints: Each task is assigned to exactly one agent:

174/212
n
∑ xij = 1
​ ​ for each j = 1, 2, … , n
i=1

Non-negativity Constraints: The decision variable xij must be binary (either 0 or 1):

xij ∈ {0, 1}
​ for each i, j

2. Methods for Solving the Assignment Problem

There are several well-known methods to solve the assignment problem:

(a) Hungarian Method (Kuhn-Munkres Algorithm)

The Hungarian method is a combinatorial optimization algorithm designed to solve the


assignment problem in polynomial time. The algorithm is based on the principle of finding a
minimum-cost matching in a bipartite graph. It works by reducing the cost matrix to a
simpler form through a series of operations, including subtracting row minima, column
minima, and performing line reductions to cover zeros in the matrix. The steps are as
follows:

1. Step 1: Subtract the smallest value in each row from every element in that row.

2. Step 2: Subtract the smallest value in each column from every element in that column.

3. Step 3: Cover all the zeros in the matrix with a minimum number of horizontal and
vertical lines.

4. Step 4: Create new zeros by reducing the uncovered elements by the smallest uncovered
value and adding it to the elements covered by two lines.

5. Step 5: Repeat steps 3 and 4 until an optimal assignment is found.

This method guarantees an optimal solution in O(n3 ) time, making it efficient for large-scale
assignment problems.

(b) Linear Programming (LP) Formulation

The assignment problem is a special case of the Integer Linear Programming (ILP) problem,
where the decision variables xij are binary. We can solve the assignment problem using the

Simplex algorithm or other LP-based methods designed for binary variables (such as
branch-and-bound or cutting planes). However, LP methods are typically less efficient than
the Hungarian method for solving the assignment problem due to the combinatorial nature
of the problem.

(c) Shortest Path Approach

175/212
The assignment problem can also be modeled as a minimum cost flow problem or shortest
path problem in a graph. This is particularly useful when the assignment problem is large,
and the constraints need to be modeled as flow conservation equations in a network graph.
This approach is often used in cases where the structure of the problem allows for easy
representation as a network flow.

3. Example

Consider a simple assignment problem with 4 agents and 4 tasks. The cost matrix is as
follows:

Agent / Task Task 1 Task 2 Task 3 Task 4

Agent 1 4 2 7 3

Agent 2 8 5 6 1

Agent 3 3 4 2 5

Agent 4 6 9 1 4

The goal is to assign each agent to a task such that the total cost is minimized.

Solution using the Hungarian Method:

1. Subtract the smallest value in each row:

Row 1: [4 − 2, 2 − 2, 7 − 2, 3 − 2] = [2, 0, 5, 1]
Row 2: [8 − 1, 5 − 1, 6 − 1, 1 − 1] = [7, 4, 5, 0]
Row 3: [3 − 2, 4 − 2, 2 − 2, 5 − 2] = [1, 2, 0, 3]
Row 4: [6 − 1, 9 − 1, 1 − 1, 4 − 1] = [5, 8, 0, 3]
2. Subtract the smallest value in each column:

Column 1: [2 − 1, 7 − 1, 1 − 1, 5 − 1] = [1, 6, 0, 4]
Column 2: [0 − 0, 4 − 0, 2 − 0, 8 − 0] = [0, 4, 2, 8]
Column 3: [5 − 0, 5 − 0, 0 − 0, 0 − 0] = [5, 5, 0, 0]
Column 4: [1 − 0, 0 − 0, 3 − 0, 3 − 0] = [1, 0, 3, 3]
3. Cover the zeros and perform necessary operations to identify the optimal assignment.

4. Summary

In this lecture, we discussed:

176/212
The Assignment Problem, its mathematical formulation, and the objective of minimizing
the total cost of assigning agents to tasks.

Various methods for solving the assignment problem, including the Hungarian method,
Linear Programming, and the Shortest Path approach.

An example demonstrating the application of the Hungarian method to solve a small


assignment problem.

The Hungarian Method is a highly efficient combinatorial optimization algorithm specifically


designed for solving the assignment problem. For larger problems, alternative approaches,
such as network flow algorithms or integer programming techniques, can be used.

Lecture 56: PERT-CPM: Diagram Representation


In this lecture, we will explore PERT (Program Evaluation and Review Technique) and CPM
(Critical Path Method), two project management techniques used to schedule, organize, and
manage tasks within a project. We focus on the diagrammatic representation of these
methods, which visually illustrate the sequence of tasks, their dependencies, and timelines.

1. Introduction to PERT and CPM

Both PERT and CPM are used to manage projects efficiently by determining the critical tasks,
the shortest time to complete a project, and the relationship between tasks.

PERT (Program Evaluation and Review Technique): PERT is primarily used in projects
with uncertain activity durations. It uses statistical techniques to account for uncertainty
in activity durations. PERT is often used in research and development projects where the
time estimates are uncertain.

CPM (Critical Path Method): CPM is used for projects where the activity durations are
known and deterministic. It focuses on identifying the critical path, i.e., the longest
sequence of dependent activities that determines the shortest time in which the project
can be completed.

2. Key Differences Between PERT and CPM

Aspect PERT CPM

Duration Probabilistic (uses optimistic, pessimistic, Deterministic (precise time


Estimates and most likely estimates) estimates)

Focus Managing uncertainty in project Managing time and cost efficiently


scheduling

177/212
Aspect PERT CPM

Project Type Research and development (R&D) projects Construction, manufacturing, and
production projects

Critical Path The longest time path through the The longest time path, but with
network exact durations

3. Diagram Representation of PERT and CPM

The diagrammatic representation of both PERT and CPM is crucial in visualizing the tasks,
their dependencies, and the overall project timeline.

(a) Network Diagram Representation

Both PERT and CPM use network diagrams to represent tasks and their relationships. These
diagrams consist of nodes (representing tasks or milestones) and arcs (representing the
relationships or dependencies between tasks).

Tasks (Nodes): Each node in the diagram represents an individual task or activity.

Dependencies (Arcs): Arcs represent the relationships between tasks. If task A must be
completed before task B, an arc is drawn from node A to node B.

(b) Types of Network Diagrams

AON (Activity on Node): In this type of diagram, nodes represent activities, and arcs
represent the dependencies between them. Most modern project scheduling techniques
(including PERT and CPM) use AON diagrams.

AOA (Activity on Arc): In this older diagram type, arcs represent activities, and nodes
represent the events or milestones of the project.

(c) Example Network Diagram (PERT/CPM)

Consider a project with the following tasks:

Task Description Duration (in days) Predecessor(s)

A Task A 4 None

B Task B 3 A

C Task C 2 A

D Task D 5 B, C

E Task E 3 D

This example can be represented in the network diagram as follows:

178/212
1. Nodes represent tasks A, B, C, D, and E.

2. Arcs represent the dependency between tasks:

Task B and Task C depend on Task A (arcs from A to B and A to C).

Task D depends on both B and C (arc from B to D and arc from C to D).

Task E depends on D (arc from D to E).

(d) Critical Path

The critical path is the longest path through the network, which determines the shortest
time to complete the entire project. In the example above, the critical path can be
determined by summing the durations of tasks along the various paths and selecting the
one with the longest total duration.

For the given tasks:

Path 1: A → B → D → E = 4 + 3 + 5 + 3 = 15 days

Path 2: A → C → D → E = 4 + 2 + 5 + 3 = 14 days

Thus, the critical path is A → B → D → E, and the project will take 15 days to complete.

(e) Slack Time

Slack time refers to the amount of time that a task can be delayed without affecting the
project’s overall completion time. Tasks on the critical path have zero slack, meaning any
delay in these tasks will delay the entire project. Other tasks may have some slack, which can
be used to absorb delays without affecting the project timeline.

4. Steps in Creating PERT/CPM Diagrams

1. List all the tasks involved in the project and determine their duration.

2. Identify task dependencies: Determine which tasks must be completed before others
can start.

3. Draw the network diagram: Create nodes for tasks and draw arcs to represent the
dependencies between tasks.

4. Determine the critical path: Identify the longest path through the network and
calculate the project duration.

5. Calculate slack time: For tasks not on the critical path, calculate the slack time.

5. Advantages of PERT and CPM

179/212
Visual Representation: Both methods provide a clear visual representation of the
project schedule and task dependencies, making it easier to manage and track progress.

Time Management: They help in identifying the critical tasks and managing time
effectively, ensuring that the project is completed on schedule.

Resource Allocation: The diagrams help in resource planning by ensuring that tasks are
scheduled in the most efficient way.

6. Applications of PERT and CPM

Project Scheduling: Used for scheduling and organizing complex projects with multiple
tasks and dependencies.

Construction Projects: Especially useful in construction and manufacturing projects


where time management is crucial.

Research and Development: PERT is often used in R&D projects where time estimates
are uncertain and need probabilistic estimation.

7. Conclusion

In this lecture, we discussed:

The concepts of PERT and CPM and their applications in project management.

How to represent projects using network diagrams.

The critical path, slack time, and the steps involved in creating PERT/CPM diagrams.

The importance of these techniques in managing time, costs, and resources in large and
complex projects.

By using PERT and CPM, project managers can efficiently plan, schedule, and control
projects, ensuring that tasks are completed on time and within budget.

Lecture 57: Details of Critical Path Calculations (Examples)


In this lecture, we will focus on the critical path method (CPM) in detail, particularly how to
calculate the critical path and understand its significance in project scheduling. We will go
over the process of identifying the critical path through concrete examples, including
calculations for task durations, slack time, and project duration. This lecture will also cover
the forward pass and backward pass methods used in CPM for critical path determination.

1. Understanding the Critical Path

180/212
The critical path in a project is the longest sequence of dependent activities that determines
the shortest time in which the project can be completed. Any delay in the tasks on the critical
path will directly lead to a delay in the project's overall completion time.

2. Steps for Calculating the Critical Path

The steps for calculating the critical path are as follows:

1. List all the tasks and their durations.

2. Draw the project network diagram, representing the tasks as nodes and the
dependencies as directed arcs (edges).

3. Identify the start and end points of the project.

4. Perform a forward pass to calculate the earliest start (ES) and earliest finish (EF) times
for each task.

5. Perform a backward pass to calculate the latest start (LS) and latest finish (LF) times
for each task.

6. Calculate slack time for each task to identify non-critical tasks.

7. Determine the critical path by identifying the tasks with zero slack time.

3. Forward Pass

The forward pass is used to determine the earliest start time (ES) and the earliest finish
time (EF) for each task. It starts from the first task and works its way through the network.

Earliest Start (ES): The earliest time a task can start, given the completion of its
predecessor tasks.

Earliest Finish (EF): The earliest time a task can finish, calculated as EF = ES +
Duration of Task.

Example:

Consider the following tasks and dependencies:

Task Duration (days) Predecessor(s)

A 4 None

B 3 A

C 2 A

D 5 B, C

E 3 D

181/212
To calculate the forward pass:

Task A has no predecessors, so its ES = 0. Its EF = 0 + 4 = 4.

Task B depends on A. The ES for B = EF of A = 4. Thus, EF of B = 4 + 3 = 7.

Task C also depends on A. The ES for C = EF of A = 4. Thus, EF of C = 4 + 2 = 6.

Task D depends on both B and C. The ES for D = max(EF of B, EF of C) = max(7, 6) = 7.


Thus, EF of D = 7 + 5 = 12.

Task E depends on D. The ES for E = EF of D = 12. Thus, EF of E = 12 + 3 = 15.

4. Backward Pass

The backward pass is used to determine the latest start (LS) and latest finish (LF) for each
task. It starts from the last task and works backward through the network.

Latest Finish (LF): The latest time a task can finish without delaying the project.

Latest Start (LS): The latest time a task can start without delaying the project, calculated
as LS = LF − Duration of Task.

To calculate the backward pass:

Task E is the last task. The LF for E = EF of E = 15. Thus, LS for E = 15 - 3 = 12.

Task D depends on E. The LF for D = LS of E = 12. Thus, LS for D = 12 - 5 = 7.

Task B depends on D. The LF for B = LS of D = 7. Thus, LS for B = 7 - 3 = 4.

Task C also depends on D. The LF for C = LS of D = 7. Thus, LS for C = 7 - 2 = 5.

Task A depends on both B and C. The LF for A = min(LS of B, LS of C) = min(4, 5) = 4.


Thus, LS for A = 4 - 4 = 0.

5. Slack Time Calculation

The slack time for each task is calculated as:

Slack Time = LS − ES = LF − EF

For Task A: Slack = LS − ES = 0 − 0 = 0.


For Task B: Slack = LS − ES = 4 − 4 = 0.
For Task C: Slack = LS − ES = 5 − 4 = 1.
For Task D: Slack = LS − ES = 7 − 7 = 0.
For Task E: Slack = LS − ES = 12 − 12 = 0.

182/212
Tasks with zero slack are on the critical path, which in this case are A, B, D, E.

6. Determining the Critical Path

The critical path is the path that includes all the tasks with zero slack. From the example:

The critical path is: A → B → D → E.

The total duration of the project is the duration of the critical path: 4 + 3 + 5 + 3 = 15
days.

7. Summary

Critical Path: The longest path in the project network, determining the project's
duration.

Forward Pass: Determines the earliest start and finish times for each task.

Backward Pass: Determines the latest start and finish times for each task.

Slack Time: The amount of time a task can be delayed without affecting the project's
duration. Tasks on the critical path have zero slack.

Critical Path Calculation: The tasks with zero slack form the critical path, which
determines the overall project duration.

8. Conclusion

In this lecture, we:

Discussed the steps to calculate the critical path using forward pass and backward
pass.

Calculated the slack time and identified tasks on the critical path.

Used an example to illustrate how to calculate the critical path, slack time, and project
duration.

This method ensures that projects are completed in the shortest possible time, helping
managers focus on critical tasks and avoid delays.

Lecture 58: Resource Levelling


Resource levelling is a project management technique used to resolve over-allocation of
resources and to ensure that the available resources are optimally distributed across all
tasks. In this lecture, we will explore the concept of resource levelling, its importance,
methods of implementation, and the trade-offs involved.

183/212
1. Introduction to Resource Levelling

Resource levelling aims to balance the resource usage across the duration of the project. The
primary goal is to avoid the situation where resources are over-allocated at certain points in
time, which can lead to delays or inefficiencies.

Over-allocation occurs when the demand for a resource exceeds its availability at a
given time.

Under-utilization occurs when the resources are not being fully used during certain
periods.

By applying resource levelling, project managers try to smooth the resource usage to ensure
that resources are distributed evenly across the entire project schedule.

2. When to Use Resource Levelling

Resource levelling is typically applied in the following scenarios:

When there are limited resources (e.g., personnel, equipment, or budget).

When tasks have flexible start and finish times (i.e., the project has slack).

When the project deadline is flexible, and task completion times can be adjusted.

Resource levelling is generally not suitable when the project deadline is fixed, as it can delay
the overall project completion.

3. Objectives of Resource Levelling

The primary objectives of resource levelling are:

Ensure resource availability: To avoid overloading resources and ensuring their


availability throughout the project.

Minimize idle time: To minimize resource downtime by making sure that they are
constantly utilized without overburdening them.

Maintain the project schedule: To adjust the schedule while still achieving the project’s
objectives, even though the timeline might be extended.

4. Methods of Resource Levelling

There are various methods to perform resource levelling, and the choice of method depends
on the project requirements, available resources, and task relationships.

a. Delaying Tasks

184/212
This method involves shifting tasks in the schedule to later dates to balance the resource
usage. Tasks that are not on the critical path and have slack can be delayed without affecting
the overall project duration.

b. Splitting Tasks

In this method, tasks are split into smaller parts and spread across the project duration to
avoid resource peaks. This technique is particularly useful when tasks can be broken into
smaller sub-tasks that require less resource allocation at any given time.

c. Resource Smoothing

Resource smoothing involves adjusting the start and finish times of tasks within the existing
project timeline, without affecting the project’s overall duration. The goal is to ensure that
resource usage is as consistent as possible while still meeting the project deadline.

d. Adding Resources (Overtime or Extra Personnel)

In cases where delays due to resource levelling cannot be avoided, additional resources can
be added to complete tasks on time. This might involve hiring temporary staff or utilizing
overtime. However, this method increases project costs and may not always be feasible.

5. Steps for Resource Levelling

1. Identify Resource Over-Allocations: Review the project schedule and identify periods
where the demand for resources exceeds the available supply.

2. Determine Resource Constraints: Identify how many resources are available and at
which times, considering the project’s resource allocation constraints.

3. Reschedule Tasks: If possible, reschedule tasks that are non-critical or have slack to
avoid overlapping with other tasks that need the same resources.

4. Apply Resource Smoothing: Adjust tasks within the project timeline without affecting
the overall project duration. For example, tasks can be extended without extending the
project deadline.

5. Assess Impact on Project Timeline: Check if the resource levelling process causes any
delays. If the deadline is fixed, some tasks may need to be shortened or optimized.

6. Example of Resource Levelling

Consider the following simplified project with two resources, Person A and Person B, and
four tasks:

185/212
Task Duration Resource Requirement Earliest Start Earliest Finish

T1 5 days Person A (1), Person B (1) Day 1 Day 5

T2 4 days Person A (1) Day 1 Day 4

T3 3 days Person B (1) Day 3 Day 5

T4 6 days Person A (1), Person B (1) Day 5 Day 10

Over-allocation:

Person A is over-allocated between Days 1-4 because they are assigned to both T1 and
T2.

Person B is over-allocated between Days 3-5 because they are assigned to both T1 and
T3.

Resource Levelling Process:

1. Identify Over-Allocations: Person A is over-allocated on Days 1-4, and Person B is over-


allocated on Days 3-5.

2. Adjust Task Schedule: We can shift T2 to start on Day 6, which would remove the overlap
for Person A. Similarly, T3 can be shifted to start on Day 6, removing the overlap for
Person B.

After levelling, the schedule might look like this:

Task Duration Resource Requirement Earliest Start Earliest Finish

T1 5 days Person A (1), Person B (1) Day 1 Day 5

T2 4 days Person A (1) Day 6 Day 9

T3 3 days Person B (1) Day 6 Day 8

T4 6 days Person A (1), Person B (1) Day 9 Day 14

By shifting tasks T2 and T3, we ensure that both Person A and Person B are not over-
allocated, and the project now has a smoother resource distribution. However, the project
duration has increased by 4 days (from 10 days to 14 days), which is a trade-off in resource
levelling.

7. Advantages and Disadvantages of Resource Levelling

Advantages:

Balanced resource utilization: Prevents overuse and underuse of resources, ensuring


efficient deployment.

186/212
Reduces resource fatigue: Ensures that resources are not overloaded during critical
phases.

Improved project planning: Helps in identifying resource bottlenecks early, leading to


better planning.

Disadvantages:

Project delay: The most common disadvantage is that resource levelling can increase
the overall duration of the project if deadlines are not flexible.

Complexity in scheduling: When the project has many tasks and resources, levelling can
become a complex task.

Increased cost: In some cases, to avoid delays, additional resources or overtime may be
needed, increasing the project cost.

8. Conclusion

In this lecture, we covered the concept of resource levelling, its significance in managing
resource allocation, and the methods for balancing resources across the project duration.
Resource levelling ensures that resources are optimally used without overloading them,
though it may lead to an increase in the overall project duration. By applying resource
levelling techniques, project managers can ensure smoother project execution while
effectively managing resource constraints.

Lecture 59: Cost Consideration in Project Scheduling


In project management, one of the primary objectives is to deliver the project within a
specified budget while meeting all the constraints, including time, resources, and quality.
Cost is an essential factor to consider when creating project schedules. This lecture focuses
on how cost considerations are incorporated into project scheduling and the methods used
to optimize costs while ensuring the timely completion of the project.

1. Introduction to Cost Consideration in Project Scheduling

Project scheduling traditionally focuses on completing tasks on time and within the allocated
resources. However, cost plays an equally critical role in scheduling. A well-constructed
project schedule must not only ensure that tasks are completed efficiently but also be
mindful of the costs associated with each task, resource usage, and potential trade-offs
between time and cost.

2. Cost Components in Project Scheduling

187/212
Cost considerations in project scheduling involve multiple components that contribute to the
total project cost. These components include:

Direct Costs: These are costs directly associated with performing project tasks. Examples
include:

Labor costs (e.g., salaries, hourly wages of workers).

Material costs (e.g., raw materials, equipment).

Equipment costs (e.g., rental costs for machinery, tools).

Indirect Costs: These are overhead costs that are not directly tied to a particular task but
are necessary for the overall project operation. Examples include:

Administrative costs (e.g., project management salaries, office supplies).

Utilities and support costs (e.g., electricity, software subscriptions).

Fixed Costs: These costs do not change regardless of the scale or duration of the project.
For example, renting office space or long-term equipment leases.

Variable Costs: These costs vary depending on the project's scope, time, and resource
requirements. For example, if the project duration is extended, labor costs and material
costs may increase.

Crash Costs: These costs are associated with accelerating the project. They are incurred
when resources are added to a task or when a task is completed in a shorter time frame
than originally planned, often by working overtime or using more expensive resources.

3. Trade-offs Between Time and Cost

A key aspect of project scheduling is the trade-off between time and cost. In many projects,
there is a direct relationship between the time required to complete a task and the cost
incurred. For example:

Expediting the Schedule (Crashing): If a project needs to be completed faster, additional


resources or overtime may be needed, leading to higher costs. This is referred to as
crashing the schedule. Crashing reduces the duration of critical path tasks but increases
their costs.

Delaying the Schedule: On the other hand, delaying tasks or extending their duration
may lead to cost savings, as it might reduce the need for overtime or additional
resources. However, this also risks project delays and can have downstream effects on
subsequent tasks or project deadlines.

188/212
To balance cost and time effectively, project managers must determine how much time to
save, and the associated costs that justify the time saved.

4. Cost-Time Trade-Off Analysis

Cost-time trade-off analysis helps project managers to make informed decisions regarding
schedule changes and resource allocation. The process involves identifying the critical path,
determining the possible duration reductions for each task, and evaluating the cost
implications of these reductions.

Steps in Cost-Time Trade-Off Analysis:

1. Identify Critical Path: Determine which tasks directly affect the project’s overall duration
and are critical to meeting the project deadline.

2. Assess Time Reduction Potential: Analyze how much the time for each task can be
reduced. Tasks that are not on the critical path may not require attention, as reducing
their duration will not affect the project completion date.

3. Calculate Crash Costs: For each task on the critical path, calculate the costs associated
with speeding up the task. These might include hiring additional labor, using faster
equipment, or working overtime.

4. Evaluate the Impact of Cost and Time: Compare the costs of reducing the time with the
benefits of completing the project earlier (e.g., improved cash flow, earlier delivery to the
client, etc.).

5. Select the Best Trade-off: Choose the optimal set of tasks to crash, balancing the
project’s time and cost constraints.

5. Methods for Cost Consideration in Project Scheduling

Several methods and tools can be used to manage cost considerations in project scheduling:

a. Cost Aggregation

Cost aggregation involves summing up all the costs associated with each activity and
resource in the project to get the total cost. This method helps identify where the most
significant cost items are and helps with forecasting and controlling costs.

b. Cost Smoothing

Cost smoothing aims to achieve a more consistent cost expenditure throughout the project
timeline by redistributing resources and adjusting the schedule. This method ensures that
the project doesn’t face large cost spikes or underutilization of resources.

189/212
c. Resource Leveling with Cost Constraints

When applying resource leveling in a project, it’s important to consider the associated costs.
Resources may be leveled by adjusting their allocation to reduce peak demand and avoid
overuse, which could lead to higher costs. For example, balancing the use of labor and
machinery to ensure that costly resources are not under- or over-utilized.

d. Earned Value Management (EVM)

EVM is a project performance management tool used to monitor project costs. By comparing
the planned progress and cost with actual progress and cost, EVM helps identify cost
overruns early. Key components of EVM include:

Planned Value (PV): The budgeted cost for the work scheduled.

Earned Value (EV): The budgeted cost for the work actually performed.

Actual Cost (AC): The actual cost incurred for the work performed.
EV
Cost Performance Index (CPI): A measure of cost efficiency calculated as CP I = AC
.

6. Example of Cost Considerations in Project Scheduling

Consider a project with the following tasks, durations, and costs:

Task Duration Cost (Per Day) Total Cost

T1 5 days $1000 $5000

T2 3 days $800 $2400

T3 4 days $1200 $4800

T4 2 days $1500 $3000

The project duration is the sum of the durations of all tasks, or 14 days.

If the project manager needs to reduce the duration, they can crash certain tasks. For
example, reducing the duration of T1 from 5 days to 3 days might increase its daily cost
to $1500.

After crashing:

T1 duration is 3 days with a cost of $1500/day, leading to a total cost of $4500 for T1.

This change reduces the overall project duration by 2 days (from 14 days to 12 days) but
increases the total cost due to the higher cost per day for crashing the task.

7. Conclusion

190/212
Cost consideration is a critical aspect of project scheduling, as it ensures that the project
remains within budget while meeting its deadlines. By understanding the relationship
between time and cost, project managers can make informed decisions to balance both
aspects effectively. Techniques such as cost-time trade-off analysis, cost smoothing, resource
leveling, and earned value management are invaluable tools in managing project costs.
Ultimately, the goal is to deliver a project that satisfies both time constraints and budgetary
limits, ensuring optimal efficiency and success.

Lecture 60: Semidefinite Programming: Matrix and Duality Theory


Semidefinite programming (SDP) is a subclass of convex optimization problems where the
optimization variable is a semidefinite matrix. This type of optimization has applications in
various areas such as control theory, structural optimization, and combinatorial optimization,
and has gained significant attention due to its broad range of real-world applications.

In this lecture, we will discuss the fundamental concepts of semidefinite programming,


introduce matrix duality theory, and demonstrate the duality properties that make SDP a
powerful optimization tool.

1. Introduction to Semidefinite Programming (SDP)

Semidefinite programming is a type of convex optimization problem that involves optimizing


a linear objective function subject to a constraint on a matrix variable. The key feature of SDP
is that the optimization variable is a matrix, and the constraints specify that this matrix must
be semidefinite. A matrix is semidefinite if all of its eigenvalues are non-negative, i.e., the
matrix is not negative in any direction.

The general form of an SDP is:

minimize Tr(CX)

subject to

X ⪰ 0, where X ∈ Sn , X is a symmetric matrix,

and possibly additional linear constraints of the form:

Ai (X) = bi ,
​ ​ i = 1, 2, ..., m.

Tr(CX): The objective is to minimize the trace of a matrix expression involving a matrix C
and the variable matrix X . The trace of a matrix is the sum of its diagonal elements.

191/212
X ⪰ 0: This constraint means that X must be semidefinite positive, meaning all of its
eigenvalues must be non-negative.

A_i(X) = b_i: These are optional affine constraints on the matrix X , where Ai are linear

operators on X , and bi are scalar constants.


This type of problem generalizes linear programming and quadratic programming. SDP has
many applications in optimization problems involving quadratic forms and matrix variables.

2. Matrix Notation and Terminology

Symmetric Matrices: A matrix X ∈ Sn belongs to the space of symmetric matrices. A


symmetric matrix satisfies X = X T , i.e., the matrix is equal to its transpose.

Positive Semidefinite Matrices: A matrix X is positive semidefinite if for any vector v ,


v T Xv ≥ 0. This condition ensures that all the eigenvalues of X are non-negative.
Trace Operator: The trace of a matrix X , denoted as Tr(X), is the sum of the diagonal
elements. In semidefinite programming, the trace of a matrix is often used as the
objective function to minimize.

3. Duality Theory in Semidefinite Programming

Semidefinite programming, like linear programming, exhibits strong duality, which is the
property that allows one to derive a dual optimization problem corresponding to the primal
problem. The concept of duality provides valuable insight into the relationship between the
primal problem and its dual.

a. Primal Problem (SDP)

The primal SDP can be written as:

minimize Tr(CX)

subject to

X ⪰ 0, Ai (X) = bi ,
​ ​ i = 1, 2, ..., m.

This problem aims to minimize the trace of CX subject to linear constraints on X .

b. Dual Problem (SDP)

The dual of a semidefinite program involves the introduction of Lagrange multipliers for the
affine constraints. By forming the Lagrangian, the dual problem is obtained by maximizing a
dual objective function under dual constraints. The dual formulation of SDP is given by:

192/212
m
maximize ∑ bi λi ​ ​ ​

i=1

subject to
m
∑ λi Ai ⪯ C,
​ ​ ​

i=1

where λi ​ ≥ 0 for each i, and ∑m


i=1 λi Ai ⪯ C ensures that the matrix inequality is satisfied.
​ ​

Dual Variables ( λi ): These are the Lagrange multipliers corresponding to the


constraints Ai (X) ​ = bi .

m
Matrix Inequality ( ∑i=1 λi Ai ⪯ C ): This condition expresses that the weighted sum
​ ​ ​

of matrices Ai must be less than or equal to the matrix C in the semidefinite ordering.

The dual problem provides a different perspective on the original (primal) problem and is
often useful for obtaining bounds or simplifying the solution.

4. Strong Duality in Semidefinite Programming

Just like linear programming, semidefinite programming also exhibits strong duality under
certain conditions. The strong duality theorem states that if both the primal and dual
problems have feasible solutions, then the optimal values of the primal and dual problems
are equal.

This means that:

Optimal value of primal problem = Optimal value of dual problem.

In the context of SDP, strong duality holds when certain regularity conditions are satisfied,
such as when the primal problem is feasible, and the dual problem has a feasible solution
(i.e., there is no duality gap).

5. Applications of Semidefinite Programming

Semidefinite programming has various applications in optimization, control, and


combinatorics. Some common applications include:

Control Theory: SDP is widely used in control theory to design stable systems, optimize
system performance, and solve problems like robust control and optimal control.

Combinatorial Optimization: SDP can be used in problems like the maximum cut
problem, where the objective is to partition a graph into two sets such that the number

193/212
of edges between the sets is maximized. The SDP relaxation of the max-cut problem
provides a powerful tool for approximating solutions to NP-hard problems.

Machine Learning: SDP has been used in machine learning for tasks such as kernel
methods, support vector machines (SVMs), and optimization of classification problems.

Quantum Computing: In quantum computing, SDP has applications in problems related


to quantum entanglement and optimization of quantum circuits.

6. Conclusion

Semidefinite programming is a powerful and versatile tool in optimization that generalizes


linear programming and has a wide range of applications in diverse fields. The strong duality
property of SDP, coupled with its matrix-based constraints and optimization criteria, makes it
an essential technique for solving large-scale and complex optimization problems. By
leveraging duality theory, one can gain deeper insights into the structure of SDP problems
and develop efficient algorithms for solving them.

Lecture 61: SDP for Maximum Cut Problem


The Maximum Cut (Max-Cut) problem is a well-known combinatorial optimization problem in
graph theory. It seeks to partition the vertices of a graph into two disjoint sets such that the
number of edges between the sets is maximized. The Max-Cut problem is NP-hard, meaning
that finding the exact optimal solution is computationally difficult for large graphs. However,
semidefinite programming (SDP) offers a powerful way to approximate the solution
efficiently.

In this lecture, we will discuss how semidefinite programming (SDP) can be applied to the
Maximum Cut problem. We will explore the relaxation of the problem using SDP and discuss
its approximation guarantees.

1. Introduction to the Maximum Cut Problem

Given a graph G = (V , E), the Maximum Cut problem is defined as follows:


Input: A graph G with vertices V and edges E .

⊆ V such that the number of edges between S and V ∖ S is


Objective: Find a cut S
maximized. This means we want to partition the graph into two sets S and V ∖ S and
maximize the number of edges that cross the partition.

Formally, the problem can be written as:

194/212
maximize ∑ xu,v , ​ ​

{u,v}∈E

subject to

xu,v = {
1 if edge {u, v} is cut
0 if edge {u, v} is not cut
​ ​ ​

where xu,v denotes whether the edge {u, v} is cut (i.e., the vertices u and v are in different

sets).

This is an NP-hard problem, and exact algorithms become inefficient for large graphs. Thus,
approximations are needed.

2. SDP Relaxation of the Max-Cut Problem

The Maximum Cut problem can be relaxed using semidefinite programming. Instead of
looking for an exact cut, we will relax the binary variables xu,v to continuous variables that

represent the "fraction" of the cut. This relaxation leads to an SDP formulation.

The SDP relaxation works by considering the cut as a vector in a higher-dimensional space.
Here's how we approach this:

Graph Representation: Represent each vertex v ∈ V as a vector in a high-dimensional


space.

Cut Definition: The idea is to define a cut using a vector representation of the vertices. If
two vertices are separated, their dot product is negative, and if they are in the same set,
their dot product is positive.

The SDP formulation of the Max-Cut problem involves finding a set of vectors such that the
cut is maximized. The relaxed problem can be written as:

1
maximize ∑ (1 − xTu xv ) ,
4
​ ​ ​ ​

{u,v}∈E

subject to

xTv xv = 1
​ ​
∀v ∈ V ,

where xv ∈ Rn is a vector corresponding to vertex v , and xTu xv is the dot product between
​ ​ ​

the vectors corresponding to vertices u and v .

3. Explanation of the SDP Formulation

195/212
Objective Function: The objective function 14
∑{u,v}∈E (1 − xTu xv ) maximizes the cut
​ ​ ​ ​

size. The term 1 − xTu xv measures the degree to which vertices u and v are separated
​ ​

by the cut, i.e., the larger the dot product, the smaller the separation between the two
vertices. The objective seeks to maximize the separation for edges in the cut.

Constraint: The constraint xTv xv ​ ​ = 1 ensures that each vector xv lies on the unit

sphere, which ensures that the vertices are mapped to unit vectors in a higher-
dimensional space.

4. Relaxation and Approximation Guarantee

The SDP relaxation of the Maximum Cut problem provides an approximation to the original
combinatorial problem. The relaxation is often much easier to solve because it is a convex
problem and can be efficiently solved using standard SDP solvers.

The key observation is that the SDP solution provides a fractional cut, but we are interested
in finding an integral (binary) solution. To recover an integral solution, we can use
randomized rounding. In randomized rounding, we round the fractional vectors to binary
values with a certain probability, which leads to an approximate cut.

a. Randomized Rounding

Randomized rounding works as follows:

1. Solve the SDP relaxation to get the vectors xv for each vertex.

2. For each vertex v , consider the sign of the dot product xTv r, where r is a random vector

chosen uniformly from the unit sphere.

3. Assign each vertex to one of two sets based on the sign of this dot product.

This process yields a cut that is expected to be close to the optimal cut, with a performance
guarantee.

b. Approximation Guarantee

Using randomized rounding, it can be shown that the expected value of the cut generated by
the SDP relaxation is at least 12 times the size of the optimal cut. Therefore, the SDP-based

approximation algorithm provides a 12 -approximation for the Maximum Cut problem.


5. Conclusion

The SDP relaxation of the Maximum Cut problem provides a powerful way to approximate
the optimal solution for large graphs. By relaxing the binary constraints to continuous ones,
we transform the combinatorial problem into a convex optimization problem that can be

196/212
solved efficiently. Randomized rounding then recovers a feasible solution that approximates
the optimal cut.

The SDP relaxation provides a convex formulation of the Max-Cut problem that is much
easier to solve.

The randomized rounding technique allows us to convert the fractional solution back to
an integral one.

The resulting approximation algorithm gives a performance guarantee of 12 - ​

approximation.

This approach has applications in combinatorial optimization and is particularly useful for
large-scale problems where exact solutions are computationally infeasible.

Lecture 62: Relaxation and Rounding in SDP: Goemans and Williamson


Algorithm and its Analysis
In this lecture, we discuss one of the most famous algorithms for approximating the
Maximum Cut (Max-Cut) problem using Semidefinite Programming (SDP) — the Goemans
and Williamson algorithm. This algorithm is a result of their work on approximation
algorithms and combines the techniques of relaxation and randomized rounding to provide
an approximation with a provable guarantee.

1. Overview of the Maximum Cut Problem

As discussed in the previous lecture, the Maximum Cut problem involves partitioning the
vertices of a graph G = (V , E) into two disjoint sets such that the number of edges
between the sets is maximized. This is an NP-hard problem, and solving it exactly for large
graphs is computationally infeasible.

To address this challenge, we use Semidefinite Programming (SDP) relaxation to


approximate the solution. In the SDP formulation of the Max-Cut problem, the objective is to
maximize the number of edges that are "cut" by a partition. This relaxation leads to an
approximation, and Goemans and Williamson's algorithm is a way to recover a good
approximation from this relaxed problem.

2. SDP Relaxation for Max-Cut

Recall from the previous lecture that the Max-Cut problem can be relaxed using semidefinite
programming. In this relaxation, we represent each vertex v ∈ V by a vector xv ∈ Rn , and

the goal is to maximize the sum of the cut values across all edges.

197/212
The relaxed problem is as follows:

1
maximize ∑ (1 − xTu xv ) ,
4
​ ​ ​ ​

{u,v}∈E

subject to

xTv xv = 1
​ ​ ∀v ∈ V .

This SDP relaxation replaces the binary decision variables (indicating whether an edge is cut)
with continuous vectors, which are constrained to lie on the unit sphere.

3. Goemans and Williamson Algorithm: Outline

The core idea behind the Goemans and Williamson algorithm is to solve the SDP relaxation,
which provides fractional solutions, and then use randomized rounding to recover an
integral solution (a cut) from these fractional vectors. The algorithm proceeds as follows:

1. Solve the SDP: First, solve the SDP relaxation to obtain the vectors xv for each vertex

v ∈ V , which lie on the unit sphere in Rn .


2. Randomized Rounding: Next, apply a randomized rounding procedure to the fractional
solutions. Specifically, for each vertex v , the sign of the dot product xTv r is computed,

where r is a randomly chosen vector from the unit sphere in Rn . Based on this sign,
vertex v is assigned to one of the two sets in the cut.

3. Construct the Cut: After rounding, the vertices are assigned to two sets, and the edges
that cross between these sets form the final cut.

The primary innovation of this algorithm is that it uses a clever randomized procedure to
ensure that the resulting cut is close to the optimal solution while maintaining an
approximation guarantee.

4. Analysis of the Goemans and Williamson Algorithm

The key to the analysis is understanding how well the randomized rounding performs. The
analysis shows that the expected value of the cut generated by this algorithm is within a
constant factor of the optimal cut. The main results are as follows:

1. Approximation Ratio: The Goemans and Williamson algorithm guarantees a 12 (1 − 1 )-


2
​ ​

approximation for the Maximum Cut problem. This result is based on the fact that the
relaxation provides a fractional solution, and the rounding technique preserves most of
the cut value.

198/212
2. Geometric Interpretation of Rounding: The vector xv for each vertex v represents the

position of the vertex in a high-dimensional space. The randomized rounding procedure


effectively "flips" the vertices based on their vector projections, and the expected
number of edges cut corresponds to the expected value of the cut, which is close to the
optimal value.

3. Performance of Randomized Rounding: The expected approximation factor of the


Goemans and Williamson algorithm is derived by analyzing the probability of an edge
being cut based on the dot product between the vectors corresponding to its endpoints.
By carefully choosing the rounding procedure, they show that the expected cut size is
close to the optimal.

4. Improved Bound: The approximation factor 12 (1 − 1 ) ≈ 0.878 is the best known


2
​ ​

approximation factor for the Maximum Cut problem, which is a significant improvement
over the naive 12 -approximation that would result from simply picking the cut randomly.

5. Key Insights from the Algorithm

SDP Relaxation: The SDP relaxation of the Max-Cut problem is a convex problem that
can be solved efficiently using standard SDP solvers. This step provides a high-quality
fractional solution.

Randomized Rounding: The rounding technique, while seemingly simple, is crucial in


ensuring that the fractional solution is transformed into an integral solution that still
approximates the optimal cut well.

Approximation Guarantee: The Goemans and Williamson algorithm provides an


approximation guarantee of ≈ 0.878, which is the best-known approximation for the
Max-Cut problem.

6. Connection to Other Problems

The Goemans and Williamson algorithm's approach of relaxing a combinatorial problem


using SDP and then applying randomized rounding has inspired similar techniques in other
combinatorial optimization problems. This method can be generalized to other problems
where relaxation and rounding can be applied to obtain approximation algorithms with
provable performance guarantees.

7. Conclusion

The Goemans and Williamson algorithm is a groundbreaking approximation algorithm for


the Maximum Cut problem. It uses the powerful tools of semidefinite programming and
randomized rounding to provide an approximation that is provably close to the optimal

199/212
solution. This algorithm is one of the best-known approximation algorithms in combinatorial
optimization, with applications in network design, circuit partitioning, and other areas where
large-scale graph cuts are required.

SDP Relaxation transforms the combinatorial Max-Cut problem into a convex


optimization problem.

Randomized Rounding provides a way to recover an integral solution from the fractional
solution of the SDP.

The algorithm achieves an approximation ratio of ≈ 0.878, which is the best-known


result for the Max-Cut problem.

This approach has broader implications in optimization theory and algorithm design,
especially in combinatorial optimization and approximation algorithms.

Lecture 63: Boolean Function Representations


In this lecture, we explore the different ways Boolean functions can be represented and the
implications of these representations in computation and complexity. Boolean functions are
a critical aspect of computational theory, especially in areas such as logic design, circuit
complexity, and optimization. We focus on several standard representations used in
computational theory, including the truth table, Boolean expressions, and algebraic normal
forms.

1. Introduction to Boolean Functions

A Boolean function is a mathematical function that takes one or more binary inputs (i.e.,
values in {0, 1}) and produces a binary output. Boolean functions are fundamental in
computer science because they are the building blocks of digital circuits, decision-making
processes, and logical operations.

For example, a simple Boolean function might take two inputs x and y , and output their
logical AND, OR, or XOR:

f (x, y) = x ∧ y (AND operation)


f (x, y) = x ∨ y (OR operation)
f (x, y) = x ⊕ y (XOR operation)

Each of these functions can be represented in different ways, depending on the context and
application.

2. Truth Table Representation

200/212
One of the most straightforward ways to represent a Boolean function is by using a truth
table. A truth table lists all possible combinations of input values and the corresponding
output for the function.

For example, the AND function for two inputs x and y can be represented by the following
truth table:

x y f (x, y) = x ∧ y
0 0 0

0 1 0

1 0 0

1 1 1

In this table, every combination of the inputs x and y is listed, and the output is calculated
according to the AND operation. Truth tables are helpful for small functions, but they
become less practical for functions with many inputs, as the number of rows grows
exponentially with the number of variables.

3. Boolean Expressions

Another common representation is through Boolean algebra expressions. A Boolean


function can often be represented as a combination of logical operations such as AND (∧),
OR (∨), NOT (¬), and others.

For example:

The AND function can be written as f (x, y) = x ∧ y.


The OR function can be written as f (x, y) = x ∨ y.
The NOT function can be written as f (x) = ¬x.

Boolean expressions can be simplified using Boolean algebra rules. Simplification is an


important step in reducing the complexity of Boolean circuits, which is crucial for designing
efficient digital circuits.

4. Algebraic Normal Form (ANF)

The Algebraic Normal Form is another representation of Boolean functions, especially in the
context of Boolean polynomials. In ANF, a Boolean function is expressed as a polynomial in
which the variables are combined using XOR (denoted ⊕) instead of addition, and the
variables are not multiplied.

For a function f (x1 , x2 , ..., xn ), the ANF is written as:


​ ​ ​

201/212
f (x1 , x2 , ..., xn ) = c0 ⊕ c1 x1 ⊕ c2 x2 ⊕ c3 x1 x2 ⊕ …
​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​

where ci ∈ {0, 1} represents the coefficients of the polynomial.


In ANF, the polynomial form uses XOR operations instead of conventional addition or
multiplication. This representation is useful in cryptography and coding theory.

5. Canonical Forms: Disjunctive and Conjunctive Normal Form

Boolean functions can also be represented in canonical forms that express them in a
standardized manner. Two important canonical forms are:

Disjunctive Normal Form (DNF): This is a disjunction (OR) of conjunctions (ANDs). In


DNF, the Boolean function is expressed as an OR of AND terms. For example:

f (x, y) = (x ∧ y) ∨ (¬x ∧ ¬y)


Conjunctive Normal Form (CNF): This is a conjunction (AND) of disjunctions (ORs). CNF
represents the Boolean function as an AND of OR terms. For example:

f (x, y) = (x ∨ ¬y) ∧ (¬x ∨ y)

These forms are often used in satisfiability problems, where the goal is to determine if a
formula is satisfiable — that is, if there exists an assignment of truth values to the variables
that makes the entire expression true.

6. Circuit Representations

In computer science, Boolean functions are often represented as Boolean circuits. A


Boolean circuit is a directed acyclic graph (DAG) where each node represents a logical gate
(AND, OR, NOT, etc.), and the edges represent the input and output of these gates. The gates
are connected in such a way that the Boolean function is computed.

For example, a simple Boolean function like f (x, y) = x ∧ y can be represented by a circuit
with a single AND gate. More complex functions, such as XOR or combinations of AND, OR,
and NOT, require more complicated circuits. The complexity of Boolean circuits is an
important factor in the study of computational complexity, as the size of the circuit can affect
the time required for evaluation.

7. Applications of Boolean Functions

Boolean functions and their representations are used in a variety of fields, including:

Digital Circuit Design: Boolean functions form the foundation of digital logic circuits
used in computer processors, memory units, and other hardware components.

202/212
Optimization: Boolean functions are key in problems like SAT (satisfiability), which has
important applications in verification and optimization.

Cryptography: In cryptographic algorithms, Boolean functions are often used to design


encryption and decryption operations that are efficient and secure.

Artificial Intelligence: Boolean functions are used in machine learning algorithms and
decision trees for classification and regression tasks.

8. Complexity of Boolean Function Representations

The choice of representation for a Boolean function has a significant impact on the
computational complexity of evaluating the function.

Truth Tables: For n variables, the truth table has 2n rows, making it exponential in size.

Boolean Expressions: A Boolean expression can be simplified, but the size of the
expression depends on the complexity of the function.

Canonical Forms (DNF, CNF): Both DNF and CNF forms can lead to an exponential size in
the worst case, though DNF and CNF representations are more efficient for satisfiability
problems.

9. Conclusion

Boolean functions are fundamental objects in computation, and their representations —


such as truth tables, Boolean expressions, and algebraic normal forms — play a crucial role
in logic design, complexity theory, and optimization. Choosing the right representation
allows for efficient computation and provides insight into the underlying structure of
computational problems. Understanding these representations is key in fields like circuit
design, cryptography, and artificial intelligence.

Lecture 64: Fourier Representation of Boolean Functions


In this lecture, we explore the Fourier representation of Boolean functions, which is a
powerful tool used in computational complexity, signal processing, and cryptography. The
Fourier transform provides a way to express Boolean functions in terms of simpler,
oscillatory components, which are easier to analyze and manipulate.

1. Introduction to Fourier Representation

The Fourier transform, typically associated with continuous functions, can be applied to
Boolean functions (discrete functions) as well. In the case of Boolean functions, this is done
using the Fourier expansion over the Boolean cube. This allows us to express a Boolean

203/212
function as a sum of weighted Fourier characters (also called Fourier basis functions), which
can be analyzed in terms of their frequency components.

Let f : {0, 1}n → {0, 1} be a Boolean function that takes n binary variables as input. The
Fourier representation expresses f as a sum of terms corresponding to different subsets of
the input variables.

2. Fourier Expansion of a Boolean Function

The Fourier representation of a Boolean function f (x1 , x2 , … , xn ) is given by:


​ ​ ​

f (x1 , x2 , … , xn ) =
​ ​ ​ ∑ ​ f^(S) ∏ xi
​ ​

S⊆{1,2,…,n} i∈S

where S is a subset of the indices {1, 2, … , n}, and the term ∏i∈S ​
xi is a product of the


∈ S . The coefficients f^(S) are the Fourier coefficients, which represent
variables xi for all i ​

the contribution of each subset S to the function f .

3. Fourier Coefficients

The Fourier coefficients f^(S) can be computed as follows:


f^(S) = Ex∼{0,1}n [f (x) ∏(−1)xi ]


​ ​ ​

i∈S

where Ex∼{0,1}n denotes the expectation over all possible inputs x


​ = (x1 , x2 , … , xn ) ∈ ​ ​ ​

{0, 1}n . This formula computes the correlation between the function f and the Fourier basis
function ∏i∈S (−1)xi .

The Fourier coefficients f^(S) give a measure of how much the Boolean function f is

influenced by the subset S of its input variables. Intuitively, the larger the Fourier coefficient
f^(S), the more the function depends on the subset of variables corresponding to S .

4. Fourier Transform and the Boolean Cube

The domain of the Boolean function f is the Boolean cube {0, 1}n , which consists of all
possible input vectors of n binary variables. The Fourier transform decomposes this function
into components that correspond to different frequencies, allowing for a deeper
understanding of its structure.

The Fourier expansion decomposes the function into these components (called Fourier
characters), which are characterized by different subsets of the variables. Each Fourier
character ∏i∈S (−1)xi can be seen as a frequency component, where S denotes the set of

variables that are involved in that particular frequency.

204/212
5. Properties of Fourier Coefficients

Several important properties and results hold for the Fourier coefficients of Boolean
functions:

Parseval’s Theorem: The sum of the squares of the Fourier coefficients equals the
expected value of the square of the function f . In other words:

∑ ​ f^(S)2 = E[f (x)2 ]


S⊆{1,2,…,n}

This implies that the Fourier coefficients f^(S) capture all the information about the

function f .

Bias of a Boolean Function: The bias of a Boolean function f is the sum of the absolute
values of the Fourier coefficients for the non-zero subsets S . The bias measures how far
the function is from being balanced (i.e., having equal numbers of 0s and 1s in its
output).

Bias(f ) = ∑ ∣f^(S)∣ ​ ​


S=∅

Fourier Spectrum: The Fourier spectrum of a Boolean function is the collection of its
Fourier coefficients f^(S). Analyzing the spectrum helps in understanding the complexity

and the structure of the Boolean function.

6. Applications of Fourier Representation

The Fourier representation of Boolean functions is useful in several areas, including:

Learning Theory: In machine learning, particularly in the study of Boolean functions, the
Fourier expansion allows for efficient learning of Boolean functions in the context of
Fourier-based algorithms like the Fourier transform method.

Complexity Theory: Fourier analysis provides insight into the complexity of Boolean
functions, helping to classify functions based on their Fourier coefficients. This is
relevant for understanding the computational complexity of Boolean function evaluation
and related problems like satisfiability.

Cryptography: Fourier analysis is extensively used in cryptography to study properties


like nonlinearity and resilience of Boolean functions used in cryptographic algorithms.
It also plays a crucial role in the analysis of stream ciphers and block ciphers.

Randomized Algorithms: The Fourier representation is used in the analysis of


randomized algorithms, particularly in the study of their performance over different

205/212
inputs and their complexity.

7. Example: Fourier Expansion of the Majority Function

Let’s consider a simple Boolean function: the majority function, which outputs 1 if the
majority of the inputs are 1, and 0 otherwise. The Fourier expansion of this function can be
derived, and its Fourier coefficients help in understanding how much each variable and its
combinations contribute to the overall majority behavior.

8. Conclusion

The Fourier representation of Boolean functions provides a rich and powerful framework for
analyzing the structure and complexity of Boolean functions. By decomposing a function into
its Fourier coefficients, we can better understand its behavior, optimize algorithms, and
apply this understanding to fields like machine learning, cryptography, and complexity
theory. The ability to express Boolean functions in terms of simpler oscillatory components is
a key tool for solving a wide range of computational problems.

Lecture 65: Approximate Degree for Boolean Functions


In this lecture, we explore the concept of approximate degree for Boolean functions, which
is a measure of the complexity of a Boolean function in terms of the minimum degree of a
polynomial that approximates it. The approximate degree plays a critical role in
understanding the computational complexity of Boolean functions, particularly in the
context of approximation algorithms, learning theory, and quantum computing.

1. Introduction to Approximate Degree

The approximate degree of a Boolean function f : {0, 1}n → {0, 1} is defined as the
smallest degree d such that there exists a real-valued polynomial p(x1 , x2 , … , xn ) of ​ ​ ​

degree at most d that approximates the Boolean function f within a certain error margin.

Mathematically, the approximate degree deg (f ) of a Boolean function f is given by:

deg∗ (f ) = min ​ max ∣f (x) − p(x)∣


p∈R[x1 ,x2 ,…,xn ],deg(p)≤d x∈{0,1}n
​ ​ ​

In this definition, R[x1 , x2 , … , xn ] represents the ring of polynomials in n variables with


​ ​ ​

real coefficients, and the degree deg(p) denotes the highest degree of any term in the
polynomial p. The goal is to find a polynomial that approximates the function f as closely as
possible, with the minimum degree required.

2. Intuition Behind Approximate Degree

206/212
The approximate degree is a measure of how "complex" a Boolean function is in terms of its
polynomial representation. A Boolean function that is highly "nonlinear" or "complicated"
may require a higher-degree polynomial to approximate it well. Conversely, functions that
are "simple" or closer to being linear may have a low-degree approximation.

For example:

Linear functions: If f is a linear Boolean function (such as f (x1 , x2 )


​ ​ = x1 ⊕ x2 , where
​ ​

⊕ denotes XOR), it can be approximated exactly by a polynomial of degree 1.


Majority functions: The majority function, which outputs 1 if more than half of the
inputs are 1, generally requires higher-degree polynomials for good approximation.

3. Applications of Approximate Degree

The approximate degree of a Boolean function has important implications in various fields:

Circuit Complexity: The approximate degree provides an upper bound on the circuit
complexity of a Boolean function. Specifically, the approximate degree of a function f is
related to the minimum size of a non-uniform Boolean circuit that computes f . A lower
approximate degree suggests a simpler circuit, while a higher degree implies a more
complex circuit.

Learning Theory: In machine learning, the approximate degree is used to understand


the difficulty of learning a Boolean function using polynomial approximations. Boolean
functions with high approximate degree may require more data or more sophisticated
models to learn effectively.

Quantum Computing: In quantum computing, the approximate degree plays a role in


the quantum query complexity of a Boolean function. The quantum query complexity
of a function is related to how many queries to the function are required to evaluate it
with quantum algorithms. The approximate degree provides a lower bound on this
complexity.

Boolean Function Analysis: The approximate degree provides insights into the structure
of Boolean functions. For example, a high approximate degree suggests that the
function has intricate dependencies between the input variables, while a low
approximate degree indicates that the function can be described by simpler interactions
among the variables.

4. Approximate Degree vs. Exact Degree

The exact degree of a Boolean function f , denoted deg(f ), is the degree of the smallest
polynomial that exactly represents the function. The approximate degree is a relaxation of

207/212
this concept, where we allow some error in the approximation.

For a Boolean function f , we typically have:

deg∗ (f ) ≤ deg(f )

In some cases, the approximate degree may be much smaller than the exact degree. This is
particularly relevant in the context of approximation algorithms, where the goal is often to
find efficient approximations of functions.

5. Relationship with Other Complexity Measures

The approximate degree is related to other complexity measures for Boolean functions, such
as:

AC0 Circuit Complexity: A Boolean function f has an AC0 circuit (constant depth,
unbounded fan-in circuits) of size s if and only if the approximate degree of f is
O(log s). Thus, the approximate degree can be used to analyze the efficiency of circuits
computing a Boolean function.

Fourier Entropy: The Fourier representation of a Boolean function allows us to express


the function as a sum of Fourier coefficients. The approximate degree is often related to
the "concentration" of these Fourier coefficients, meaning that functions with high
approximate degree tend to have a large number of significant Fourier coefficients.

Communication Complexity: Approximate degree is also connected to communication


complexity, where it can be used to establish lower bounds on the number of
communication rounds needed to compute a function in a multi-party setting.

6. Approximate Degree of Common Boolean Functions

Let’s explore the approximate degree of some well-known Boolean functions:

Linear Functions: Any linear Boolean function, such as f (x1 , x2 ) ​ ​ = x1 ⊕ x2 , has an


​ ​

approximate degree of 1, because it can be exactly represented by a polynomial of


degree 1.

Majority Function: The majority function f (x1 , x2 , … , xn )


​ ​ ​ = 1 if the majority of
x1 , x2 , … , xn are 1, and 0 otherwise, typically has a high approximate degree. In fact, it
​ ​ ​

has an approximate degree of Ω(n), which grows linearly with the number of variables.

Parity Functions: A parity function f (x1 , x2 , … , xn )


​ ​ ​ = x1 ⊕ x2 ⊕ ⋯ ⊕ xn , which
​ ​ ​

outputs 1 if the number of 1s in the input is odd and 0 otherwise, has an approximate
degree of n.

208/212
7. Techniques for Bounding Approximate Degree

Several techniques are used to bound the approximate degree of a Boolean function,
including:

Polynomial Approximation: Constructing polynomial approximations for Boolean


functions using methods such as Chebyshev polynomials or Fourier analysis.

Lower Bound Techniques: Techniques from communication complexity and circuit


complexity provide lower bounds on the approximate degree for specific classes of
functions, helping to understand the inherent difficulty of approximating certain
functions.

8. Conclusion

The concept of approximate degree provides an essential tool for understanding the
complexity of Boolean functions and their approximations. By measuring the degree of
polynomials that approximate Boolean functions, we can gain insights into their circuit
complexity, learnability, and computational hardness. Approximate degree plays a crucial
role in fields such as computational complexity, machine learning, and quantum computing,
and it remains a key concept in the study of Boolean functions and their applications.

Lecture 66: Dual Witness: Approximate Degree of OR


In this lecture, we delve into the approximate degree of the OR function. The OR function is
one of the fundamental Boolean functions, and understanding its approximate degree is
important for several areas of theoretical computer science, such as communication
complexity, circuit complexity, and polynomial approximation.

We will also introduce the idea of a dual witness in the context of approximate degree,
which will help us establish lower bounds for the approximate degree of the OR function.

1. The OR Function

The OR function for n-bit inputs, denoted as ORn (x1 , x2 , … , xn ), outputs:


​ ​ ​ ​

ORn (x1 , x2 , … , xn ) = {
1 if at least one of x1 , x2 , … , xn is 1,
​ ​ ​

0 otherwise.
​ ​ ​ ​ ​ ​

This function is a classic Boolean function, and its complexity can be characterized by the
approximate degree, which is the degree of the lowest-degree polynomial that
approximates the function well.

2. Exact Degree of the OR Function

209/212
First, it is important to understand the exact degree of the OR function. The OR function is
linear in its Boolean representation, and intuitively, it should have a low-degree polynomial
representation. However, determining the exact degree is critical for understanding the
behavior of polynomial approximations.

The exact degree of the OR function is 1. This is because the function can be expressed as:
n
ORn (x1 , x2 , … , xn ) = 1 − ∏(1 − xi )
​ ​ ​ ​ ​ ​

i=1

This is a polynomial of degree 1. The exact degree, in this case, corresponds to the fact that
the OR function can be expressed using a linear polynomial with degree 1.

3. Approximate Degree of the OR Function

The approximate degree is more nuanced. To understand the approximate degree of the OR
function, we look for a polynomial that approximates it within some error bound.

A key result here is that the approximate degree of the OR function is linear in n,
specifically:

deg∗ (ORn ) = Ω(n)


This result tells us that the approximate degree of the OR function grows linearly with the
number of variables, n. Even though the OR function is linear in its exact degree
representation, approximating it with a low-degree polynomial requires a degree that scales
with the input size.

4. Dual Witness for the Approximate Degree

To prove the lower bound on the approximate degree of the OR function, we introduce the
concept of a dual witness. This idea comes from the study of communication complexity
and is used to establish lower bounds for the complexity of Boolean functions.

A dual witness is essentially a method to prove that no polynomial of degree smaller than a
certain value can approximate the Boolean function within a given error bound. It works by
showing that any polynomial approximation fails to meet the desired error threshold for
specific input distributions or sets of inputs.

For the OR function, the dual witness argument involves showing that for certain input sets
(e.g., where many of the xi 's are 1), any polynomial of degree d will fail to approximate the

OR function well. Specifically, the idea is to demonstrate that for large inputs, the OR

210/212
function behaves in a way that requires a polynomial of degree at least n to approximate it
effectively.

5. Constructing the Approximate Degree Bound

The dual witness method can be formally framed as follows:

Suppose p(x1 , x2 , … , xn ) is a polynomial of degree d that approximates the OR


​ ​ ​

function.

We show that for a carefully chosen set of input points where many of the variables are
1, the error between p and ORn is significant unless the degree d of the polynomial is at

least n.

The error is typically maximized when a large number of xi 's are set to 1, as the OR

function flips from 0 to 1, and a low-degree polynomial cannot capture this transition
accurately.

From this, we conclude that the approximate degree of the OR function must be at least

linear in n, i.e., deg (ORn ) = Ω(n).

6. Implications of the Approximate Degree of the OR Function

The fact that the approximate degree of the OR function grows linearly with n has several
important consequences:

Circuit Complexity: Since the approximate degree provides an upper bound on the
circuit complexity, this result implies that circuits computing the OR function require
complexity that grows linearly with the number of inputs.

Communication Complexity: In communication complexity, the approximate degree is


used to establish lower bounds on the number of rounds required for a distributed
computation of a Boolean function. The OR function's linear approximate degree
suggests that a communication protocol will need at least n bits to compute the OR
function accurately.

Polynomial Approximation: For any polynomial approximation of the OR function, the


degree must be at least n to ensure an approximation that is within a small error
margin, demonstrating the inherent complexity of approximating this function with
polynomials.

7. Conclusion

The approximate degree of the OR function provides a fascinating example of how Boolean
functions can be analyzed in terms of polynomial approximations. While the exact degree of

211/212
the OR function is 1, its approximate degree is much larger, growing linearly with the number
of inputs. This result is critical for understanding the limits of polynomial approximation and
has broad implications in fields like circuit complexity, communication complexity, and
machine learning.

By utilizing the concept of a dual witness, we can rigorously prove lower bounds for the
approximate degree of Boolean functions, offering deeper insights into their computational
complexity and the limitations of polynomial representations.

212/212

You might also like