0% found this document useful (0 votes)
29 views18 pages

2 Convex

convex sets

Uploaded by

Azkam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views18 pages

2 Convex

convex sets

Uploaded by

Azkam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Lecture 2: Convex Sets and Their Properties


Rajat Mittal

IIT Kanpur

From our previous discussion, the standard form of linear programming is given by the following opti-
mization program,

min cT x
subject to Ax = b
x≥0

Using linear algebra, we know the solutions of system of equations, Ax = b. In this lecture, Let’s take
the next step and study set of x’s satisfying Ax = b and x ≥ 0. In other words, what are the properties of
the feasible region of a linear program and how can these properties help us?
The train of thought starts with the observation that if x1 and x2 are two feasible solutions of an LP
then αx1 + βx2 is also a solution (given α, β are positive and add up to 1). These are called the convex
combinations of x1 and x2 .
The first step to study convex combinations is to look at linear combinations.

1 Linear combinations
We denote the set of real numbers as R. We will mostly work with the vector space Rn ; its elements
will be called vectors. Remember that a vector space is a set of vectors closed under addition and scalar
multiplication.
While studying linear algebra, the set of equations Ax were viewed as a linear combination of columns
of x. For vectors x1 , x2 , · · · , xk , any point y is a linear combination of them iff

y = α1 x1 + α2 x2 · · · + αk xk ∀i, αi ∈ R.

Exercise 1. What do you get by taking linear combination of three or in general n points?

Hint: Will you get a full dimensional vector space?


Actually, there are multiple ways to restrict this combination of a given set of vectors. Let us look at a
few important ones. Restricting αi ’s to be positive, we get a conic combination,

y = α1 x1 + α2 x2 · · · + αk xk ∀i, αi ≥ 0 ∈ R.
Instead if we put the restriction that αi ’s sum up to 1, it is called an affine combination,
X
y = α1 x1 + α2 x2 · · · + αk xk ∀i, αi ∈ R, αi = 1.
i

When a combination is affine as well as conic, it is called a convex combination,


X
y = α1 x1 + α2 x2 · · · + αk xk ∀i, αi ≥ 0 ∈ R, αi = 1.
i

Exercise 2. What is the geometric shape obtained by taking linear/conic/affine/convex combination of two
points in R2 ?

The contents of this lecture note are inspired by the books from Boyd and Vandenberghe [1] and Papadimitriou
and Steiglitz [2].
We notice that only linear combination gives us a subspace. An affine combination gives a line, a convex
combination gives us a line segment and a conic combination gives a cone-shape in two dimensional space.
So, these combinations give rise to special subsets/subspaces of the original vector space. Notice that such
sets naturally arise while studying feasible regions of a linear program. The feasible set of a single equality
constraint is a line/hyperplane (affine set), the feasible set of the constraint x ≥ 0 is a cone and the overall
feasible set of a linear program is a convex set.
Exercise 3. Convince yourself of the above assertion for 2 dimensional case.

1.1 Affine sets


An affine combination of two points, in two dimension, gave a line. The following definition generalizes line
to higher dimension.
Definition 1. Affine set: A set S is called affine iff for any two points in the set S, the line through them
is contained in S. In other words, for any two points in S, their affine combination is in the set S too.
Exercise 4. Show that line is an affine set.
Theorem 1. A set S is affine iff any affine combination of points in the set (countable points) is also in S.
Proof. Exercise (use induction).
Exercise 5. What is the affine combination of three points?
We will try to answer this question methodically. Suppose, the three given points are x1 ,x2 and x3 . Then,
any affine combination of these three points can be written as
X
α1 x1 + α2 x2 + α3 x3 , αi = 1.
i

We can simplify the expression to,

α1 (x1 − x3 ) + α2 (x2 − x3 ) + x3 .

Notice that there are no constraints on α1 , α2 . We can think of this sum as,

x3 + (plane/space generated by (x1 − x3 ) and (x2 − x3 )) .

In other words, it is a linear subspace (constructed from the linear combination of x1 − x3 and x2 − x3 ),
which is shifted by x3 .
Exercise 6. What is the dimension of this linear subspace? Do you have a better way to visualize this affine
set?
Generalizing the previous argument, any affine set S can thought of as an offset x added to some vector
space V . Hence,

S = {v + x : v ∈ V }.
You can also prove that such a set is affine (exercise). Not just that, given an affine set S and a point x
inside it, the set
V = {s − x : s ∈ S}
is a vector space. In the assignment, you will show that the vector space associated with an affine set does
not depend upon the point chosen. So, we can define the dimension of the affine set as the dimension of the
subspace.
Exercise 7. Show that the feasible region of a set of linear equations is an affine set.

2
1.2 Convex set
From the definition of affine sets, we can guess the definition of convex sets.
Definition 2. A set is called convex iff any convex combination of a subset is also contained in the set itself.
Theorem 2. A set is convex iff for any two points in the set their convex combination (line segment) is
contained in the set.
We can prove this using induction. It is left as an exercise.

Su
bs
pa
c e
t
se
Off

Affine set Convex sets Non-convex sets

Fig. 1. Different kind of sets

Convex hull: Convex hull of a set of points S, denoted Conv (S), is the set of all possible convex combina-
tions of the subsets of S. From the definition, it is clear that the convex hull is a convex set.
Theorem 3. Conv (S) is the smallest convex set containing S.
Proof. Suppose there is a smaller convex set C containing S, then C contains all possible convex combinations
of S. Hence, C contains Conv (S). Then C is bigger than Conv (S), contradiction. This implies C = Conv (S).

Convex hull of S can also be viewed as the intersection of all convex sets containing S (Prove it).
Exercise 8. What is the convex hull of three vertices of a triangle? What if I add a point inside the triangle?
What about outside?

Intersections and unions of convex sets Suppose we are given two convex sets S1 and S2 . What happens
when we take their intersection or union?
Intersection of two convex sets is convex too. To prove it, consider two points in the intersection S1 ∩ S2 .
Since both points are in S1 , S2 , the line segment connecting them is also contained in both S1 , S2 . This
proves that the intersection is also convex. The same argument can be extended to the intersection of finite
number of sets and even too infinite number of sets.
Though, the same property does not hold true for unions. In general, union of two convex sets is not
convex. To obtain convex sets from the union, we can take convex hull of the union.
Exercise 9. Draw two sets such that there union is convex. Draw two convex sets, s.t., there union is not
convex. Draw the convex hull of the union.

3
2 Important affine and convex sets

2.1 Lines

The simplest example of a non-trivial affine set is probably a line in the space Rn . Given two points, x1 and
x2 , a line through x1 and x2 is the set of all points y of the form,

y = αx1 + (1 − α)x2 , α ∈ R.
Specifically, this gives us the unique line passing through x1 and x2 . If we constraint α to be in [0, 1], it
is called the line segment between x1 and x2 .

Exercise 10. Prove that the line segment is a convex set.

Equivalently, a point is on the line segment between x1 and x2 iff it is a convex combination of the given
two points. Note that the condition for being a convex set is weaker than the condition for being an affine
set. Hence an affine set is always convex.
Since line is an affine set, it is a convex set. A line segment will be a convex set but not an affine set.

Fig. 2. A line, line segment, points on the line and α’s corresponding to them

There is another way to think of a line in R2 . You will prove in the assignment that every point on a line
has same inner product with the vector perpendicular to x1 − x2 . In other words, the equation of line can
be written as,
aT x − b = 0.
In the previous equation, a is the vector perpendicular to x1 − x2 .

Exercise 11. What is b?

2.2 Hyperplane and Halfspaces

We extend the idea of a line to bigger dimensions, this gives rise to the concept of hyperplanes. A hyperplane
H is described by a vector a ∈ Rn and a number b ∈ R

H = {x : aT x − b = 0}

Exercise 12. Prove that a hyperplane is affine and so convex too.

Suppose, we know a point x0 on the hyperplane H. The previous equation can be changed to

aT x = b ⇒ aT x = aT x0 , x0 ∈ H.

4
We can view the hyperplane as the set of all points which have same inner product b with the vector a.
This gives a very nice geometrical picture of the hyperplane, i.e., all points in H can be expressed as the
sum of x0 and a vector orthogonal to a (say a⊥ ). We get another definition of hyperplane as
H = {x0 + a⊥ : aT a⊥ = 0}.
Notice, this definition assumes that we know a point on the hyperplane. Though, this point is not special
in any way; for any point on the hyperplane we can define the hyperplane in a similar way.
Exercise 13. What do you know about the hyperplane if b = 0?
Vector a is called the normal vector of the hyperplane and b is called the offset. Another way to think of
this hyperplane is: take the set of all vectors orthogonal to a (hyperplane, passing through origin) and offset
b
them by distance ∥a∥ (look at Fig. 3). In particular, when b = 0, hyperplane will contain origin.
A hyperplane divides the space into two parts, aT x ≥ b and aT x ≤ b. Geometrically, they are the two
sides of the hyperplane called halfspaces. Is a halfspace affine/convex?

Fig. 3. Halfspace with normal vector a and offset b .

2.3 Polytopes and Polygons


Looking at the format of a linear program, we will mostly be interested in linear equations and linear
inequalities (our constraints). Remember that the set of points which satisfy the set of all constraints is
called the feasible region. When these constraints are linear inequalities and linear equalities, the feasible
region is called a polytope.
Mathematically, S is a polytope iff
S = {x : aTi x − bi ≤ 0 i = 1, 2, · · · , m and cTi x − di = 0 i = 1, 2, · · · , p}
A bounded polytope in two dimensions is called a polygon. Informally, bounded means that there is a large
enough ball of finite radius which contains the set.

5
Exercise 14. Prove that the polytope is a convex set.
Exercise 15. What are the polytopes in one dimension?
Geometrically, polytopes are intersections of hyperplanes and halfspaces. Since intersection of convex sets
is also convex, a polytope is convex.
Remember that a polytope could be bounded or unbounded. Intuitively, polytopes have vertices, edges,
planes and hyperplanes as their bounding surface.
Exercise 16. Find a set of equalities and inequalities, s.t., the corresponding polytope is unbounded.
A bounded polytope can be thought of as the convex hull of its vertices. Alternately, a bounded polytope
can have an alternate definition as the convex hull of a finite set of points.
The statement that these two definitions (the previous one and the original definition in terms of equalities
and inequalities) are the same is known as Minkowski-Weyl theorem. The proof of this theorem is out of the
scope of this course.
We haven’t defined vertices/extremal points formally till now. It is intuitively clear that a vertex is a
corner of the polytope. Formally, a vertex of a polytope is the point which cannot be expressed as the convex
combination of two different points in the polytope. This implies if there is a line segment containing the
vertex inside the polytope, then the vertex is an end-point of that line segment.
We saw that the convex hull of a triangle and a point inside the triangle is triangle itself. Suppose we are
given a convex set and we want to find the minimal set whose convex combinations will generate the entire
set.
By the definition of vertices, all vertices should be in this minimal set. It turns out that the for a bounded
polytope, this set (set of all vertices) is enough.
Exercise 17. Suppose S = Conv (x1 , · · · , xk ). Prove that xi is not extremal/vertex if and only if it can be
written as the convex combination of other xj ’s.

2.4 Cones
We have seen sets made by vertices, lines, line segments. Now we look at sets generated by rays.
Exercise 18. What is a ray?
A set is called a cone iff every ray from origin to any element of the set is contained in the set. Hence, a
set C is a cone iff for every x ∈ C we have αx ∈ C, α ≥ 0.
Note 1. A cone is not a set which has all possible conic combinations of all its points (show an example). To
remind you the notion of conic combination, a conic combination of vectors x1 , · · · , xk ∈ Rn is any vector of
the form α1 x1 + · · · + αk xk for α1 , · · · , αk ≥ 0.
The previous paragraph implies that a cone is not necessarily convex (give example of a cone which is
not convex). A set which is a cone and is convex is called a convex cone. In this course we will mostly be
concerned with convex cones.
Mathematically, a convex cone C is a cone where,

∀x1 , x2 ∈ C and α1 , α2 ≥ 0, α1 x1 + α2 x2 ∈ C.

So, a cone is convex iff it contains all the conic combinations of its elements.
Clearly the set Cone(x1 , x2 , · · · , xk ) := {α1 x1 + · · · + αk xk : ∀i αi ≥ 0, xi ∈ Rn } is a convex cone. It is
called a finitely generated cone because it is generated by finite number of vectors.
Note 2. Some authors define cones as sets closed under positive scalar multiplication. We have defined cones
as sets closed under non-negative scalar multiplication.
A convex finitely generated cone is also a polytope, i.e., it can be represented in terms of linear equalities
and inequalities. Next theorem gives a way to construct this set of linear equalities and inequalities.

6
Theorem 4 (Weyl). A non-empty finitely generated convex cone is a polytope.

Proof. Suppose the set of generators for cone C are x1 , · · · , xk . Define a matrix X with xi as the i-th column;
X has k columns. Then cone C can also be described as,

C = {x : x = Xα, α ∈ Rk+ }.

Converting equalities into inequalities

C = {x : x − Xα ≤ 0, Xα − x ≤ 0, − α ≤ 0}

Notice that all the inequalities are of similar kind, the constant term is zero. Now, we can eliminate α
from these inequalities using something known as Fourier-Motzkin elimination (next lemma). You can think
of Fourier-Motzkin as the Gaussian elimination for inequalities. For our case, as a bonus, inequalities will
have the same structure, i.e., no constant term.

Lemma 1. Let Ax ≤ b be a system of m inequalities in n variables. This system can be converted into
another equivalent system A′ x ≤ b′ , with n − 1 variables and polynomial in m many inequalities. Here
equivalent means:

– Any solution x of old system will be a solution of the new system ignoring the removed variable.
– Given any solution x of new system (A′ x ≤ b′ ), we can find a solution (x0 , x) of old system.

Proof. Suppose the variable to be removed is x0 . We divide all the inequalities into three sets depending
upon whether the coefficient of x0 is positive (P ), negative (N ) or zero (Z). Divide the inequalities in P and
N by the modulus of the coefficient of x0 .
The inequalities in the new system are the inequalities from Z and every inequality of the form

Pi + Nj ≤ 0, Pi ∈ P, Nj ∈ N .

Exercise 19. Prove that the construction above satisfies the first condition. Also, convince yourself that if b
was zero initially, the final system will not have non-zero constant terms.

For the more complicated case, we need to show that any solution of new inequalities can be extended
(with x0 ) to a solution of previous inequalities.
Fix a solution of new inequalities, say y. Define Pi (y) := pi and Nj (y) := nj for all i ∈ P and j ∈ N .
We need to find x0 , s.t., x0 + pi ≤ 0 and −x0 + nj ≤ 0 for all i ∈ P and j ∈ N . That is,

– x0 ≤ min − pi .
i∈P
– x0 ≥ max nj .
j∈N

Such an x0 always exists because feasibility of y implies

min − pi ≥ max nj ,
i∈P j∈N

showing the second condition of the lemma.

Using the previous lemma multiple times, α can be eliminated from the system of equations defining the
cone. As a result, we get
C = {x : Ax ≤ 0},
proving that C is a polytope.

Note 3. A general polytope is Ax ≤ b and a finitely generated cone is Ax ≤ 0. Convince yourself that Ax ≤ 0
defines a convex cone.

7
2.5 Characterization of polytopes

We saw that cones and bounded polytope have two representations. One in terms of linear inequalities and
one in terms of convex combinations (conic combination). There is a similar theorem for polytope.

Theorem 5. Let there be a polytope defined by a set of inequalities, P = {x ∈ Rn : Ax ≤ b}. Then, there
exist vectors x1 , · · · , xk ∈ Rn and y1 , · · · , yl ∈ Rn , s.t.,

P = Cone(x1 , · · · , xk ) + Conv (y1 , · · · , yl )

Note 4. The sum defined here is Minkowski’s sum. In this case, S1 + S2 = {x1 + x2 : x1 ∈ S1 , x2 ∈ S2 }
This is known as affine Minkowski-Weyl theorem. We will not cover the proof of this theorem in this
course.
Notice that there are equivalent characterizations of cone/polytope/bounded polytope in terms of con-
vex/conic hulls or linear inequalities. It is instructive to remember the special forms of linear inequalities
and hulls required to make these shapes.

Linear inequalities Convexity


Polytope Ax ≤ b Cone(y1 , · · · , yn ) + Conv (z1 , · · · , zm )
Bounded polytope Bounded and Ax ≤ b Conv (y1 , · · · , yn )
Finitely generated Cone Ax ≤ 0 Cone(y1 , · · · , yn )

3 Convex functions

A function is called convex if the line segment connecting any two points on the graph lies above the graph.
Formally, a function f : Rn → R is called convex iff

f (θx + (1 − θ)y) ≤ θf (x) + (1 − θ)f (y), ∀x, y ∈ Rn , 0 < θ < 1.

In this case, domain is assumed to be entire Rn . In general, if the domain is a convex subset of Rn
satisfying the above mentioned property, it is called a convex function. The canonical example of a convex
function is f (x) = x2 . You should check that this function is convex.
Note that since the domain is convex, if we restrict the function on any line passing through the domain,
the restricted function will be convex. Conversely, if the function is convex on all the lines passing through
the domain, then it is convex on the whole domain.
A function f is called concave if it satisfies the above mentioned inequality in opposite direction,

f (θx + (1 − θ)y) ≥ θf (x) + (1 − θ)f (y), ∀x, y ∈ Rn , 0 < θ < 1.

It is clear that if f is convex then −f is concave.

Exercise 20. Characterize the functions which are both convex and concave.

Relation to convex sets A convex set is the set which contains all possible convex combinations of its
points. What is the relation between convex sets and convex functions?
The epigraph of the function is the set of all points which lie above the graph of the function. Formally,
given a function f : Rn → R, the epigraph of f is the set,

{(y, x) : y ≥ f (x), x ∈ Rn , y ∈ R}

If the function is convex, then its epigraph is a convex set and vice versa. This gives a geometric inter-
pretation of a function being convex. The set of all points lying below the graph of a function, opposite side
of the epigraph, is known as the hypograph of that function.

8
Epigraph

Convex function Non-convex function

Fig. 4. Epigraph for functions

3.1 Optimization with convex functions


Remember that our feasible set is convex (intersection of convex sets). We are optimizing a linear function,
which is both convex as well as concave.
It is comparatively easier to optimize a convex function, as compared to a non-convex function, due
to the structure present in these functions. There are essentially two properties which are very helpful for
optimization.
– A local minimum is a global minimum.
– Maximum is found on the boundaries.
A similar statement can be given for concave function optimization by changing maximum/minimum.

Global optimality and local optimality For a general function there are two kind of optimal points.
There is a global optimum, which is the general notion of being optimal as compared to any other point on
the domain. Hence, a point x∗ is globally minimum for a function f iff f (x) ≥ f (x∗ ) for any x in domain of
f . In general, it is very hard to find such points.
On the other hand, there is another concept of a local optimum, where the point has minimum value
among all values in the neighborhood. Hence, a point x∗ is a local optimum point for a function f iff for
some ϵ,
f (x∗ ) ≤ f (x), ∀x ∈ Bϵ (x∗ ).
Here, Bϵ (x∗ ) is the ball of radius ϵ around x∗ . As compared to global optimum, it is much easier to check for
a local optimum. For instance, we can check that the derivative in every direction is positive and continuous.
The special property of convex functions is that a locally minimum point is globally minimum too (feasible
region needs to be convex). This makes it much easier to optimize a convex function.
Proof idea for local minimum is a global minimum. Suppose x∗ is a local minimum, for the sake of contra-
diction, say f (y) < f (x∗ ) for a point y in the domain of f . Since the feasible region is convex, the point
tx∗ + (1 − t)y belongs to the domain for any t ∈ [0, 1].

9
Since f is convex, for any t ∈ [0, 1],

f (tx∗ + (1 − t)y) ≤ tf (x∗ ) + (1 − t)f (y) < f (x∗ ).

Where the last inequality follows from f (y) < f (x∗ ).


For a small enough t, this gives a point in the neighbourhood of x∗ , where the function value is less than
f (x∗ ), contradicting local minimality of x∗ .

Optimality at the boundary For a convex function it can be shown that the maximum always lies at
the extremal points of the feasible region. Similarly for a concave function the minimum is always attained
at the extremal points of the feasible region.

Exercise 21. Prove the above statement.

Suppose that the feasible region is bounded and convex. Given that a linear function is both convex and
concave,
– the optimal for a linear program lies on its extremal points,
– local optima is a global optima.
From the previous discussion, vertices are the extremal points of the feasible region of LP. Hence, the
optimal of an LP lies on its vertex.

Note 5. This does not mean all optimal solutions lie on vertex. Just that, there always exist at least one
optimal solution on the vertex. In general, an entire edge or a face could give us the optimal solution.

4 Simplex method: informal idea


Let us look at few of the properties discussed in the previous section about linear programs. For the sake of
simplicity, we start with the case when the feasible region is bounded. We need two facts, our feasible region
is convex, and the optimization function is convex as well as concave (linear).
– The local optimum is the global optimum.
– The feasible region of the LP is a polytope whose extremal points are vertices.
– The optimum of the LP will lie on a vertex. This is because if p is a linear combination of q and r, then
f (p) ≥ min(f (q), f (r)) (in case we want to minimize). Similar statement can be written for maximization.

Exercise 22. What is the equivalent statement for maximization? What property of optimization function
did we use?

These properties suggest a pretty simple algorithm. Say, we want to maximize an objective function.
Start with a vertex, check if it is locally optimum (just looking at directions corresponding to the edges from
the vertex is enough). How do we find a starting vertex? We will answer it later.
If the vertex is not optimal, there is a direction where objective increases, follow that direction. Moving
in the direction of the increasing objective function, find a vertex.

Exercise 23. What if there is no vertex in that direction?

Keep repeating this process until you find a local optimum. The algorithm will terminate since the number
of vertices are finite and the objective keeps increasing (we can’t cycle back to a vertex).
To clarify the algorithm more, we need to answer a few questions.
– How to find a vertex? What is the mathematical description of a vertex?
– How to check every interesting direction (corresponding to all the edges)?
– How can we find the next vertex?

10
Optimal vertex

Initial point

Fig. 5. Outline of simplex method

The simplex algorithm answers all these questions and hence solves the LP for us. In general, the number
of vertices can be exponential in the size of the input (the number of variables). We will see this later. Since,
in worst case, we might have to go through all the vertices, simplex algorithm might not be efficient.
After more than 60 years of research, we still don’t know if simplex algorithm is polynomial time. Almost
all the known implementations have counterexamples which run for exponential time.
The simplex algorithm was developed by George Dantzig in 1947. In spite of the caveat that it can’t be
proved efficient theoretically, it is considered to be one of the most important algorithmic achievements in
the 20th century. Many industries still prefer to use this method over other known polynomial time solutions.

4.1 Vertex as BFS

How do we know if a particular feasible point is a vertex? Consider the feasible region for the standard form,

R = {x : Ax = b, x ≥ 0}

We assume that there are n variables and m equality constraints, i.e., A is an m × n matrix. Since we can
only keep the linearly independent rows (why?), we can assume that number of constraints (m) is less than
n.
To simplify the notation, if B ⊆ [n] and x ∈ Rn , xB denotes the restriction of x on the co-ordinates of x.
We can similarly define AB , where we only consider columns of A indexed by elements in B.
A point p ∈ Rn is a basic feasible solution (BFS), if its co-ordinates can be divided into two parts (B
and N ), s.t.,

1. pi = 0 for all i ∈ N .
2. AB pB + AN pN = b and AB is invertible. Remember, AB , pB and AN , pN are the restrictions of A, p to
the columns of B and N respectively.

Note 6. We are allowed to permute the columns of A to get such a representation. Since AB is invertible,
pB is the unique solution of AB x = b.

Exercise 24. What is the size of B?

A basic feasible solution (BFS) provides the characterization of the vertices.

Theorem 6. A point p ∈ Rn is a vertex iff it is a BFS.

Proof. (⇐) Suppose p is a BFS but not a vertex. Then there exist q, r, s.t., Aq = Ar = b, q ≥ 0, r ≥ 0
and θq + (1 − θ)r = p for some θ ∈ [0, 1]. Since q, r, θ, 1 − θ are positive, the co-ordinates corresponding to
N for q, r should be zero. This implies there are three different solutions to AB x = b (namely pB , qB , rB ),
contradicting the uniqueness of pB .

11
(⇒) Suppose p is a vertex. Let S denote the set of columns of A for which corresponding entry in p is
not zero. We need to show that AS is invertible. Equivalently, we will show that the columns in S form a
linearly independent set. If not, there exist αi ’s, not all zero, s.t.,
X
αi ci = 0.
i∈S

Here, ci is the ith column of A.


Call the vector of these coefficients αi ’s as α. Then, p ± tα is a feasible solution for Ax = b when absolute
value of t is sufficiently small. Notice that p ± tα will always satisfy Ax = b. Though, for a big t, p ± tα might
not be non-negative.

Exercise 25. Show that there exist ϵ, s.t., p + tα is feasible if |t| ≤ ϵ.

Take t to be in that range. The point p can be written as p = 21 (p + tα) + 12 (p − tα), hence it is not a
vertex.

Exercise 26. Show that the number of independent columns in A is m.

From the previous exercise, |S| ≤ m. If |S| = m, then AS is invertible (full rank) and we are done. If
|S| < m, then just add any extra columns of A to make S full rank.

Note 7. Since |B| = m, Thm. 6 implies that any BFS will have at most m non-zero co-ordinates.
Given a handle on vertices, we will show how to progress in the simplex method using an example. The
strategy will be formalized later.

4.2 A simple example

We will start with a linear program, s.t., it is easy to find one vertex. Take the linear program,

max 2x1 − x2
s.t. x1 + x2 ≤ 2
x1 − x2 ≤ 1
x ≥ 0.

We introduce slack variables to turn inequalities into equalities,

max 2x1 − x2
s.t. x1 + x2 + x3 = 2
x1 − x2 + x4 = 1
x ≥ 0.

Luckily bi ’s are both positive and hence give us a BFS, x1 = 0, x2 = 0, x3 = 2, x4 = 1. How should we
move to another vertex with better objective value?
We will write the constraints by expressing basic variables in terms of non-basic ones.

Exercise 27. Why can we do that?

12
Using these equations, the objective function can be written just in terms of non-basic variables.

max 0 + 2x1 − x2
s.t. x3 = 2 − x1 − x2
x4 = 1 − x1 + x2
x ≥ 0.
The objective value for this case is 0 and it is clear that x1 can be increased to get a better value.
Increasing x1 is like moving on an edge.
Can we increase x1 arbitrarily? x1 is bounded by 1 because of the second constraint. So, we remove x4
from our basic variables (leaving variable) and put x1 in (entering variable). The new BFS is x1 = 1, x2 =
0, x3 = 1, x4 = 0. Let us substitute x1 = 1 − x4 + x2 to get new form of equations.

max 2 − 2x4 + x2
s.t. x3 = 1 − 2x2 + x4
x1 = 1 − x4 + x2
x ≥ 0.
The objective value is 2 for this solution and can be increased by increasing x2 . The variable x2 can be
increased up to only 21 because of the first constraint. The new BFS is x1 = 23 , x2 = 12 , x3 = 0, x4 = 0. Again,
using the value of x2 in terms of x3 , x4 (first constraint),

5 3 1
max 2 − 2 x4 − 2 x3
1 1 1
s.t. x2 = 2 − 2 x3 + 2 x4
3 1 1
x1 = 2 − 2 x4 − 2 x3
x ≥ 0.
It is clear that this is the optimal solution from the form of objective function (all coefficients of objective
function are negative). This shows that x1 = 32 , x2 = 21 , x3 = 0, x4 = 0 is optimal (why?).
Note 8. In case the feasible region is not bounded, there is a possibility that travelling in the direction of an
extremal ray takes us to an unbounded optimal solution.

5 Simplex method: formal details


Before explaining formally what simplex method is, please note that there are many simplex algorithms. All
of them follow the same ideas we discussed in the previous section. The form given below is a particular
implementation of it.

5.1 Simplex algorithm assuming a starting vertex


Let us look at the simpler case: a starting vertex is given to us to initiate simplex algorithm. We just need to
see how to move from one vertex to another (one BFS to another) and finally check if the vertex is optimal.
Remember that the LP we are trying to solve is,

max cT x
s.t. Ax = b
x ≥ 0.

13
Suppose at a particular moment the BFS given to us is, p = (pB , pN ), pN = 0. Let us look at the feasibility
equations,
AB x B + AN x n = b
Since AB is invertible, we multiply this equation with A−1
B .

xB + A−1 −1
B AN x n = AB b

In other words, all the basic variables (variables in set B) can be expressed in terms of non-basic variables
(variables in set [n] − B). Using this, the cost function can also be expressed as the function of only non-basic
variables.
Exercise 28. Why can this transformation be useful?
We assume that the BFS, feasibility equations and cost function are given in this form.
– Every basic variable is expressed in terms of non-basic variables.
– The cost vector is just a function of non-basic variables.
From this form, it is clear that the objective value for p is just the constant term in cost function. Also,
the current BFS is optimal if the coefficients of all the non-basic variables in the cost function are negative.
Hence, we just need to find a way to move from one BFS to another (and change the form of equalities) such
that the objective value does not decrease.
Suppose at any stage the set of basic variables is B and set of non basic variables is N . Say the equations
and the objective function looks like,

max C ′ + c′j xj
P
j∈N
s.t. ∀ i ∈ B xi = b′i − a′ij xj
P
j∈N
x ≥ 0.
The prime indicates that the value of these coefficients have changed from the starting iterations. How
do we move to next iteration?
Exercise 29. How to check if this is the optimal BFS?
If all the c′j are negative then we have arrived at the optimal solution. Otherwise, choose the variable
with the highest c′j (arbitrary choice, any variable with positive c′j will work). That variable will move from
non-basic to basic (it is called the entering variable).
While increasing it gradually, some basic variable will become negative. The first one to do so, say xi ,
will be the leaving variable. That means, it will move from basic to non-basic. The new values of the basic
solution can be calculated easily (How?, Exercise).
In case all the coefficients for the entering variable are positive in the constraints, then the optimal is
unbounded and algorithm stops.
Exercise 30. In terms of a′ , c and b′ , come up with the definition of entering and leaving variables.
The only task is to write the equations in the required form (in terms of non-basic variables). Take the
equation where xj is expressed as the function of xi and other non-basic variables. Change it to an equation,
where xi is expressed in terms of xj and other basic variables. Use this equation to substitute everywhere
the value of xi in terms of xj . Hence, we will get the equations and the objective function in the required
form according to the new BFS.
Exercise 31. Why is the new solution a BFS. In other words, why is new AB invertible?
There are two things missing in our description of the algorithm.
– How to find the starting vertex?
– What if one of the b′i is zero?
The next two sections will tackle these two questions.

14
5.2 Initial BFS
Suppose we are trying to solve an LP of the form,

max cT x
s.t. Ax ≤ b
x ≥ 0.
Note 9. This is not in standard form, but an LP in standard form can easily be converted into one in this
form.
Let us see how to find the initial BFS. Remember, we introduced slack variables to make inequalities into
equalities.

max cT x
s.t. Ax = b
x ≥ 0.
If all the right hand sides, bi , are already positive, then we have the simple BFS with only slack variables.
The difficulty lies in the case when bi ≤ 0. Suppose the set of equations which have bi ≤ 0 is called S. Then
introduce an auxiliary variable xs for every s ∈ S.
Look at the following LP,

P
max − s∈S xs
s.t. Ax − xS = b
x ≥ 0.
th
Where xS is the vector with xs at s position and 0 everywhere else. The original LP is feasible iff this
LP has optimal value 0.
Exercise 32. Prove it.
The good thing is, this new LP has a very simple BFS to start with. The BFS consists of all the auxiliary
variables from set S and all the slack variables from set [m] − S.
We can use the method in the previous section to figure out the optimal of this LP. If it is not zero then
the original LP does not have a feasible solution. If zero then the BFS of the new LP is the BFS of the
original LP (Why??).

Example Suppose the LP given is,

max − 2x1 − 3x2


s.t. x1 − 3x2 + 2x3 ≤ 3
x1 − 2x2 ≤ −2
x ≥ 0.
We introduce the slack variables,

max − 2x1 − 3x2


s.t. x1 − 3x2 + 2x3 + x4 = 3
x1 − 2x2 + x5 = −2
x ≥ 0. (1)

15
Note 10. Here the slack variables will not give us a BFS like last lecture’s example. The problem is one of
the bi ’s are negative.
After introducing the auxiliary variables,

max − x6
s.t. x1 − 3x2 + 2x3 + x4 = 3
x1 − 2x2 + x5 − x6 = −2
x ≥ 0.
The BFS here is x6 = 2, x4 = 3 and rest are zero. Changing the equations with respect to the set of
non-basic variables,

max − 2 − x1 + 2x2 − x5
s.t. x6 = 2 + x1 − 2x2 + x5
x4 = 3 − x1 + 3x2 − 2x3
x ≥ 0.
We can increase the variable x2 up to 1. The new BFS is x2 = 1, x4 = 6. The new set of equations,

max 0 − x6
s.t. x2 = 1 + 21 x1 − 12 x6 + 12 x5
x4 = 6 + 12 x1 − 23 x6 + 32 x5 − 2x3
x ≥ 0.
This is the optimal and we have the BFS for the starting LP 1.

6 Degeneracy
Degeneracy arises when there are multiple BFS with the same objective value. The problem is, we can keep
circling between these BFS’s without ever reaching the optimal vertex. Let us see, how this manifests in our
description of simplex algorithm. It is important to understand what is degeneracy, then the solutions are
not very difficult.
As an example of degeneracy, suppose we have this set of equations at some stage

max 2 − x1 + 2x2 − x5
s.t. x6 = 2 + x1 − 2x2 + x5
x4 = 0 − x1 − 2x2 − 2x3
x ≥ 0.
It is clear that we want to increase x2 , but increasing it even a little bit will make x4 negative. Hence
we can switch x2 to x4 in the basic set without changing the solution. What happens if this creates a cycle?
This is called degeneracy.
More formally, suppose at any instance the LP is in form,

max C ′ + c′j xj
P
j∈N
b′i a′ij xj
P
s.t. ∀ i ∈ B xi = − j∈N
x ≥ 0.

16
The degeneracy arises if one of the value of b′i is zero. Not every such case when b′i is zero is a problem.
The only troubling one is when b′i corresponding to the leaving variable is zero and the coefficient of entering
variable in that equation is negative.
Exercise 33. Show that in this case, we have at least two BFS’s with the same objective value.
Unfortunately there are known examples where the algorithm we have described, where we choose the c′j
with maximum value, will get stuck into a cycle.
Fortunately, the trick to avoid degeneracy is not that difficult and is called Bland’s rule. Remember that
any stage we have a set of choices for the leaving and entering variable (e.g., the set of all j’s, s.t., c′j ≥ 0,
for entering variable).
Definition 3. Bland’s rule: Impose an order on the variables, i.e., call them x1 , x2 , · · · , xn . For the choice
of leaving and entering variable, always choose the one with the least index from the set of choices.
It can be shown that using Bland’s rule, the simplex algorithm will never cycle and hence finish in finite
time. We will only give an idea of the proof. The full proof can be found at https://www.math.ubc.ca/
~anstee/math340/340blandsrule.pdf .
For the sake of contradiction, let there be a cycle. Every basic solution in this cycle will have the same
objective value (condition for degeneracy).
Say xt is the variable with the largest index from the list of leaving or entering variables in the cycle
(leaving variables are same as entering variables, it is a cycle). Suppose B is the BFS where xt leaves and
xs enters, we know s < t. Also, let B ′ be the BFS where xt enters the basic solution again.
Let the equations when B is basic to be,
P
max C + j ∈B / cj xj
P
s.t. xi = bi − j ∈B/ aij xj ∀i ∈ B
x ≥ 0.

Let the equations when B ′ is basic to be,

max C ′ + j ∈B ′
P
/ ′ cj xj
s.t. xi = b′i − j ∈B ′ ′
P
/ ′ aij xj ∀i ∈ B
x ≥ 0.

We will not worry about the constraints for B ′ . Let xs = δ and xi = bi − ais δ for all i ∈ B (0 otherwise)
be a solution for the equations when B is basic. We do NOT claim that x ≥ 0, just that x satisfies the set
of equations.
Since the objective function with respect to B ′ is just the objective function of B added with multiples
of constraints with respect to B, the objective function for B and B ′ should be same for this particular
solution.
Comparing the objective function, we get an r ∈ B such that c′r ars < 0 (r ∈ / B ′ otherwise c′r = 0). Since
ct ≥ 0 and ats ≥ 0 (why), r ̸= t. The variable t entered at B and not r, implies c′r ≤ 0, so ars > 0. In that
′ ′

case, r should have left B and not t, giving contradiction.

7 Assignment
Exercise 34. Show that the vector space associated with an affine set does not depend upon the point chosen.
Exercise 35. Prove that the line (in R2 ) between x1 and x2 is the set of all points p, s.t.,

aT p = aT x1 = aT x2 .

Here a is the unit vector perpendicular to x1 − x2 .

17
Exercise 36. We have given combination interpretation (affine, convex, conic) of polytope, cones and bounded
polytopes. Can you give similar interpretation for hyperplanes and halfspaces?

Exercise 37. Convex hulls and cones are closely related. Take xi ’s as row vectors. Prove,

x ∈ Conv (x1 , x2 , · · · , xk ) ⇔ (x, 1) ∈ Cone((x1 , 1), · · · , (xk , 1))

Exercise 38. Given a convex set S, show that the set

T = {x : Ax + b = y, y ∈ S}

is also convex.

Exercise 39. A Minkowski sum of two sets S1 , S2 is the set formed by taking all possible sums such that first
vector is from S1 and second vector is from S2 .

S1 + S2 = {x : x = x1 + x2 , x1 ∈ S1 , x2 ∈ S2 }.

Prove that the Minkowski sum of two convex sets is convex. You can prove a stronger result too,

Conv(S1 + S2 ) = Conv(S1 ) + Conv(S2 ).

Where Conv(S) takes the convex hull of set S.

References
1. Stephen Boyd and Lieven Vandenberghe. Convex optimization. Cambridge university press, 2004.
2. C. H. Papadimitriou and K. Steiglitz. Combinatorial Optimization: Algorithms and Complexity. Prentice-Hall
India, 1982.

18

You might also like