Lecture 02
Lecture 02
Lecture 2
Lecturer: Pablo A. Parrilo Scribe: Pablo A. Parrilo
Notation: The set of real symmetric n × n matrices is denoted S n . A matrix A ∈ S n is called positive
semidefinite if xT Ax ≥ 0 for all x ∈ Rn , and is called positive definite if xT Ax > 0 for all nonzero
x ∈ Rn . The set of positive semidefinite matrices is denoted S+n
and the set of positive definite matrices
n n
is denoted by S++ . The cone S+ is a proper cone (i.e., closed, convex, pointed, and solid).
1 PSD matrices
There are several equivalent conditions for a matrix to be positive (semi)definite. We present below
some of the most useful ones:
Proposition 1 The following statements are equivalent:
• The matrix A ∈ S n is positive semidefinite (A � 0).
• For all x ∈ Rn , xT Ax ≥ 0.
• All eigenvalues of A are nonnegative.
• All 2n − 1 principal minors of A are nonnegative.
• There exists a factorization A = B T B.
For the definite case, we have a similar characterization:
Proposition 2 The following statements are equivalent:
• The matrix A ∈ S n is positive semidefinite (A � 0).
• For all nonzero x ∈ Rn , xT Ax > 0.
• All eigenvalues of A are strictly positive.
• If T is nonsingular, A � 0 ⇔ T T AT � 0.
• Schur complement. The following conditions are equivalent:
� �
A�0 C�0
� �
A B
T �0 ⇔ ⇔
B C C − B T A−1 B � 0 A − BC −1 B T � 0
21
2 Semidefinite programming
Semidefinite programming (SDP) is a specific kind of convex optimization problem (e.g., [VB96, Tod01,
BV04]), with very appealing numerical properties. An SDP problem corresponds to the optimization of
a linear function subject to matrix inequality constraints.
An SDP problem in standard primal form is written as:
minimize C •X
subject to Ai • X = bi , i = 1, . . . , m (1)
X � 0,
where C, Ai ∈ S n , and X •Y := Tr(XY ). The matrix X ∈ S n is the variable over which the maximization
is performed. The inequality in the second line means that the matrix X must be positive semidefinite,
i.e., all its eigenvalues should be greater than or equal to zero. The set of feasible solutions, i.e., the set
of matrices X that satisfy the constraints, is always a convex set. In the particular case in which C = 0,
the problem reduces to whether or not the inequality can be satisfied for some matrix X. In this case,
the SDP is referred to as a feasibility problem. The convexity of SDP has made it possible to develop
sophisticated and reliable analytical and numerical methods to solve them.
A very important feature of SDP problems, from both the theoretical and applied viewpoints, is the
associated duality theory. For every SDP of the form (1) (usually called the primal problem), there is
another associated SDP, called the dual problem, that can be stated as
maximize bT y
�m
subject to i=1 Ai yi � C, (2)
where b = (b1 , . . . , bm ), and the vector y = (y1 , . . . , ym ) contains the dual decision variables.
The key relationship between the primal and the dual problem is the fact that feasible solutions of
one can be used to bound the values of the other problem. Indeed, let X and y be any two feasible
solutions of the primal and dual problems respectively. Then we have the following inequality:
m
�
C • X − bT y = (C − Ai yi ) • X ≥ 0, (3)
i=1
where the last inequality follows from the fact that the two terms are positive semidefinite matrices.
From (1) and (2) we can see that the left hand side of (3) is just the difference between the objective
functions of the primal and dual problems. The inequality in (3) tells us that the value of the primal
objective function evaluated at any feasible matrix X is always greater than or equal to the value of
the dual objective function at any feasible vector y. This property is known as weak duality. Thus,
we can use any feasible X to compute an upper bound for the optimum of bT y, and we can also use
any feasible y to compute a lower bound for the optimum of Tr(C · X). Furthermore, in the case of
feasibility problems (i.e., C = 0), the dual problem can be used to certify the nonexistence of solutions
of the primal. This property will be crucial in our developments.
22
We will start with two real vector spaces, S and T , and a linear mapping A : S → T . Every real
vector space has an associated dual space, which is the vector space of realvalued linear functionals. We
will denote these dual spaces by S ∗ and T ∗ , respectively, and the pairing between an element of a vector
space and one of the dual as �·, ·� (i.e., f (x) = �f, x�). The dual mapping of A is the unique linear map
A∗ : T ∗ → S ∗ defined through the property
�A∗ y, x�S = �y, Ax�T ∀x ∈ S, y ∈ T ∗ .
Notice here that the brackets on the lefthand side of the equation represent the pairing in S, and those
on the righthand side correspond to the pairing in T . We can then define the primaldual pair of (conic)
optimization problems:
�
Ax = b
min �c, x�S s.t. max �y, b�T s.t. c − A∗ y ∈ K∗ ,
x ∈ K
where b ∈ T , c ∈ S ∗ , K ⊂ S is a proper cone, and K∗ ⊂ S ∗ is the corresponding dual cone. Notice that
exactly the same proof presented earlier works here to show weak duality:
�c, x�S − �y, b�T = �c, x�S − �y, Ax�T
= �c, x�S − �A∗ y, x�S
= �c − A∗ y, x�S
≥ 0.
In the usual cases (e.g., LP and SDP), the vector spaces are finite dimensional, and thus isomorphic to
their duals. The specific correspondence between these is given through whatever inner product we use.
Among the classes of problems that can be interpreted as particular cases of the general conic
formulation we have linear programs, secondorder cone programs (SOCP), and SDP, when we take the
cone K to be the nonnegative orthant Rn+ , the second order cone in n variables, or the PSD cone S+ n
.
We have then the following natural inclusion relationship among the different optimization classes.
LP ⊆ SOCP ⊆ SDP.
23
3 Applications
There have been many applications of SDP in a variety of areas of applied mathematics and engineering.
We present here just a few, to give a flavor of what is possible. Many more will follow.
It is wellknown (and relatively simple to prove) that x(k) converges to zero for all initial conditiosn x0
iff |λi (A)| < 1, i = 1, . . . n.
There is a simple characterization of this spectral radius condition in terms of a quadratic Lyapunov
function V (x(k)) = x(k)T P x(k):
Proof
• (⇐=) Let Av = λv. Then,
Consider now the case where A is not stable, but we can use linear state feedback, i.e., A(K) =
A + BK, where K is a fixed matrix. We want to find a matrix K such that A + BK is stable, i.e., all
its eigenvalues have absolute value smaller than one.
Use Schur complements to rewrite the condition:
(A + BK)T P (A + BK) − P � 0, P �0
�
(A + BK)T P
� �
P
�0
P (A + BK) P
Condition is nonlinear in (P, K). However, we can do a congruence transformation with Q := P −1 , and
obtain:
Q(A + BK)T
� �
Q
�0
(A + BK)Q Q
Now, defining a new variable Y := KQ we have
QAT + Y T B T
� �
Q
� 0.
AQ + BY Q
This problem is now linear in (Q, Y ). In fact, it is an SDP problem. After solving it, we can recover the
controller K via K = Q−1 Y .
24
3.2 Theta function
Given a graph G = (V, E), a stable set is a subset of V with the property that the induced subgraph
has no edges. In other words, none of the selected vertices are adjacent to each other.
The stability number of a graph, usually denoted by α(G), is the cardinality of the largest stable
set. Computing the stability number of a graph is NPhard. There are many interesting applications of
the stable set problem. In particular, they can be used to provide bounds on the Shannon capacity of a
graph [Lov79], a problem of importance in coding. In fact, this was one of the first appearances of what
today is known as SDP.
The Lovász theta function is denoted by ϑ(G), and is defined as the solution of the SDP :
⎧
⎨ Tr(X) = 1
⎪
max J • X s.t. Xij = 0 (i, j) ∈ E (4)
⎪
X�0
⎩
where J is the matrix with all entries equal to one. The theta function is an upper bound on the stability
number, i.e.,
α(G) ≤ ϑ(G).
The inequality is easy to prove. Consider the indicator vector ξ(S) of any stable set S, and define the
1
matrix X := |S| ξξ T . Is is easy to see that this X is a feasible solution of the SDP, and thus the inequality
follows.
4 Software
Remark There are many good software codes for semidefinite programming. Among the most well
known, we mention the following ones:
• SeDuMi, originally by Jos Sturm, now being maintained by the optimization group at McMaster:
http://sedumi.mcmaster.ca/
• SDPT3, by KimChuan Toh, Reha T¨
ut¨
unc¨
u, and Mike Todd. http://www.math.nus.edu.sg/
~mattohkc/sdpt3.html
• SDPA, by the research group of Masakazu Kojima, http://grid.r.dendai.ac.jp/sdpa/
• CSDP, by Brian Borchers, http://infohost.nmt.edu/~borchers/csdp.html
A very convenient way of using these (and other) SDP solvers under MATLAB is through the YALMIP
parser/solver (Johan Löfberg, http://control.ee.ethz.ch/~joloef/yalmip.php).
References
[BV04] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004.
[Lov79] L. Lovász. On the Shannon capacity of a graph. IEEE Transactions on Information Theory,
25(1):1–7, 1979.
[Ram97] M. V. Ramana. An exact duality theory for semidefinite programming and its complexity
implications. Math. Programming, 77(2, Ser. B):129–162, 1997.
[RTW97] M. V. Ramana, L. Tunçel, and H. Wolkowicz. Strong duality for semidefinite programming.
SIAM J. Optim., 7(3):641–662, 1997.
25
[Tod01] M. Todd. Semidefinite optimization. Acta Numerica, 10:515–560, 2001.
[VB96] L. Vandenberghe and S. Boyd. Semidefinite programming. SIAM Review, 38(1):49–95, March
1996.
26