Parrilo LectureNotes EIDMA
Parrilo LectureNotes EIDMA
Parrilo LectureNotes EIDMA
,
2
, and
1
, respectively).
It is easy to show that the unit ball of every norm is always a convex set. Conversely, given any full-
dimensional convex set symmetric with respect to the origin, one can dene a norm via the gauge (or
Minkowski) functional.
One of the most important results about convex sets is the separating hyperplane theorem.
Theorem 3 Given two disjoint convex sets S
1
, S
2
in R
n
, there exists a nontrivial linear functional c
and a scalar d such that
c, x d x S
1
c, x d x S
2
.
Under certain additional conditions, strict separation can be guaranteed. One of the most useful cases
is when one of the sets is compact and the other one is closed.
An important class of convex sets are those that are invariant under nonnegative scalings.
Denition 4 A set S R
n
is a cone if 0, x S x S.
Denition 5 The dual of a set S is S
:= {y R
n
: y, x 0 x S}.
1-2
The dual S
1
S
2
.
If S is a closed convex cone, then S
= S. Otherwise, S
, y = 0.
Example 6 The nonnegative orthant is dened as R
n
+
:= {x R
n
: x
i
0}, and is a proper cone.
The nonnegative orthant is self-dual, i.e., we have (R
n
+
)
= R
n
+
.
A proper cone K induces a partial order
1
on the vector space, via x y if and only if y x K. We
also use x y if y x is in the interior of K. Important examples of proper cones are the nonnegative
orthant, the Lorentz cone, the set of symmetric positive semidenite matrices, and the set of nonnegative
polynomials. We will discuss some of these in more detail later in the lectures and the exercises.
Example 7 Consider the second-order cone, dened by {(x
0
, x
1
, . . . , x
n
) R
n+1
:
n
i=1
x
2
i
1
2
x
0
}.
This is a self-dual proper cone, and is also known as the ice-cream, or Lorentz cone.
An interesting physical interpretation of the partial order induced by this cone appears in the theory
of special relativity. In this case, the cone can be expressed (after an inconsequential rescaling and
reordering) as
{(x, y, z, t) R
4
: x
2
+ y
2
+ z
2
c
2
t
2
, t 0},
where c is a given constant (speed of light). In this case, the vector space is interpreted as the Minkowski
spacetime. Given a xed point x
0
, those points x for which x x
0
correspond to the absolute future,
while those for which x x
0
are in the absolute past. There are, however, many points that are neither
in the absolute future nor in the absolute past (for these, the causal order will depend on the observer).
Remark 8 Convexity has two natural denitions. The rst one is the one given above, that emphasizes
the internal aspect, in terms of convex combinations of elements of the set. Alternatively, one can look
at the external aspect, and dene a convex set as the intersection of a (possibly innite) collection of
half-spaces. The possibility of these dual descriptions is what enables many of the useful and intriguing
properties of convex sets. In the context of convex functions, for instance, these ideas are made concrete
through the use of the Legendre-Fenchel transformation.
4 Review: linear programming
Linear programming (LP) is the problem of minimizing a linear function, subject to linear inequality
constraints. An LP in standard form is written as:
min c
T
x s.t.
Ax = b
x 0
(P)
Every LP problem has a corresponding dual problem, which in this case is:
max b
T
y s.t. c A
T
y 0. (D)
There are many important features of LP. Among them, we mention the following ones:
1
A partial order is a binary relation that is reexive, antisymmetric, and transitive.
1-3
Geometry of the feasible set: The feasible set of linear programs are polyhedra. The geometry of
polyhedra is quite well understood. In particular, the Minkowski-Weyl theorem (e.g., [BT97,
Zie95]) states that every polyhedron P is nitely generated, i.e., it can be written as
P = conv(u
1
, . . . , u
r
) + cone(v
1
, . . . , v
s
),
where the u
i
, v
i
are the extreme points and extreme rays of P, respectively.
Weak duality: For any feasible solutions x, y of (P) and (D), respectively, it always holds that:
c
T
x b
T
y = x
T
c (Ax)
T
y = x
T
(c A
T
y) 0.
In other words, from any feasible dual solution we can obtain a lower bound on the primal.
Conversely, primal feasible solutions give upper bounds on the value of the dual.
Strong duality: If both primal and dual are feasible, then they achieve exactly the same value, and
there exist optimal feasible solutions x
, y
such that c
T
x
= b
T
y
.
Some of these properties (which ones?) will break down as soon as we leave LP and go the more
general case of conic or semidenite programming. These will cause some diculties, although with the
right assumptions, the resulting theory will closely parallel the LP case.
Remark 9 The software codes cdd (Komei Fukuda, http: // www. ifor. math. ethz. ch/
~
fukuda/
cdd_ home/ index. html ) and lrs (David Avis, http: // cgm. cs. mcgill. ca/
~
avis/ C/ lrs. html )
are very useful for polyhedral computations. In particular, both of them allow to convert an inequal-
ity representation of a polyhedron (usually called an H-representation) into extreme points/rays (V-
representation), and viceversa.
References
[BNO03] D. P. Bertsekas, A. Nedic, and A. E. Ozdaglar. Convex analysis and optimization. Athena
Scientic, Belmont, MA, 2003.
[BT97] D. Bertsimas and J. N. Tsitsiklis. Introduction to Linear Optimization. Athena Scientic,
1997.
[BTN01] A. Ben-Tal and A. Nemirovski. Lectures on modern convex optimization. MPS/SIAM Series
on Optimization. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA,
2001.
[BV04] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004.
[Roc70] R. T. Rockafellar. Convex Analysis. Princeton University Press, Princeton, New Jersey, 1970.
[Zie95] G. M. Ziegler. Lectures on polytopes, volume 152 of Graduate Texts in Mathematics. Springer-
Verlag, New York, 1995.
1-4
MIT 6.256 - Algebraic techniques and semidenite optimization February 5, 2010
Lecture 2
Lecturer: Pablo A. Parrilo Scribe: Pablo A. Parrilo
Notation: The set of real symmetric nn matrices is denoted S
n
. A matrix A S
n
is called positive
semidenite if x
T
Ax 0 for all x R
n
, and is called positive denite if x
T
Ax > 0 for all nonzero
x R
n
. The set of positive semidenite matrices is denoted S
n
+
and the set of positive denite matrices
is denoted by S
n
++
. As we shall prove soon, S
n
+
is a proper cone (i.e., closed, convex, pointed, and solid).
We will use the inequality signs and to denote the partial order induced by S
n
+
(usually called
the Lowner partial order).
1 PSD matrices
There are several equivalent conditions for a matrix to be positive (semi)denite. We present below
some of the most useful ones:
Proposition 1. The following statements are equivalent:
The matrix A S
n
is positive semidenite (A 0).
For all x R
n
, x
T
Ax 0.
All eigenvalues of A are nonnegative.
All 2
n
1 principal minors of A are nonnegative.
There exists a factorization A = B
T
B.
For the denite case, we have a similar characterization:
Proposition 2. The following statements are equivalent:
The matrix A S
n
is positive denite (A 0).
For all nonzero x R
n
, x
T
Ax > 0.
All eigenvalues of A are strictly positive.
All n leading principal minors of A are positive.
There exists a factorization A = B
T
B, with B square and nonsingular.
Here are some useful additional facts:
If T is nonsingular, A 0 T
T
AT 0.
Schur complement. The following conditions are equivalent:
A B
B
T
C
A 0
C B
T
A
1
B 0
C 0
ABC
1
B
T
0
We now prove the following result:
Theorem 3. The set S
n
+
of positive semidenite matrices is a proper cone.
2-1
Proof. Invariance under nonnegative scalings follows directly from the denition, so S
n
+
is a cone. By the
second statement in Proposition 1, S
n
+
is the intersection of innitely many closed halfspaces, and hence
it is both closed and convex. To show pointedness, notice that if there is a symmetric matrix A that
belongs to both S
n
+
and S
n
+
, then x
T
Ax must vanish for all x R
n
, thus A must be the zero matrix.
Finally, the cone is solid since I
n
+X is positive denite for all X provided ||X|| is small enough.
We state next some additional facts on the geometry of the cone S
n
+
of positive semidenite matrices.
If S
n
is equipped with the inner product X, Y := X Y = Tr(XY ), then S
n
+
is a self-dual cone.
The cone S
n
+
is not polyhedral, and its extreme rays are the rank one matrices.
2 Semidenite programming
Semidenite programming (SDP) is a specic kind of convex optimization problem (e.g., [VB96, Tod01,
BV04]), with very appealing numerical properties. An SDP problem corresponds to the optimization of
a linear function subject to matrix inequality constraints.
An SDP problem in standard primal form is written as:
minimize C X
subject to A
i
X = b
i
, i = 1, . . . , m (1)
X 0,
where C, A
i
S
n
, and XY := Tr(XY ). The matrix X S
n
is the variable over which the maximization
is performed. The inequality in the second line means that the matrix X must be positive semidenite,
i.e., all its eigenvalues should be greater than or equal to zero. The set of feasible solutions, i.e., the set
of matrices X that satisfy the constraints, is always a convex set. In the particular case in which C = 0,
the problem reduces to whether or not the inequality can be satised for some matrix X. In this case,
the SDP is referred to as a feasibility problem. The convexity of SDP has made it possible to develop
sophisticated and reliable analytical and numerical methods to solve them.
A very important feature of SDP problems, from both the theoretical and applied viewpoints, is the
associated duality theory. For every SDP of the form (1) (usually called the primal problem), there is
another associated SDP, called the dual problem, that can be stated as
maximize b
T
y
subject to
m
i=1
A
i
y
i
C, (2)
where b = (b
1
, . . . , b
m
), and the vector y = (y
1
, . . . , y
m
) contains the dual decision variables.
The key relationship between the primal and the dual problem is the fact that feasible solutions of
one can be used to bound the values of the other problem. Indeed, let X and y be any two feasible
solutions of the primal and dual problems respectively. We then have the following inequality:
C X b
T
y = (C
m
i=1
A
i
y
i
) X 0, (3)
where the last inequality follows from the fact that the two terms are positive semidenite matrices.
From (1) and (2) we can see that the left hand side of (3) is just the dierence between the objective
functions of the primal and dual problems. The inequality in (3) tells us that the value of the primal
objective function evaluated at any feasible matrix X is always greater than or equal to the value of
the dual objective function at any feasible vector y. This property is known as weak duality. Thus, we
can use any feasible X to compute an upper bound for the optimum of b
T
y, and we can also use any
feasible y to compute a lower bound for the optimum of C X. Furthermore, in the case of feasibility
problems (i.e., C = 0), the dual problem can be used to certify the nonexistence of solutions to the
primal problem. This property will be crucial in our later developments.
2-2
2.1 Conic duality
A general formulation, discussed briey during the previous lecture, that unies LP and SDP (as well as
some other classes of optimization problems) is conic programming. We will be more careful than usual
here (risking being a bit pedantic) in the denition of the respective spaces and mappings. It does not
make much of a dierence if we are working on R
n
(since we can identify a space and its dual), but it is
good hygiene to keep these distinctions in mind, and also useful when dealing with more complicated
spaces.
We will start with two real vector spaces, S and T, and a linear mapping A : S T. Every real
vector space has an associated dual space, which is the vector space of real-valued linear functionals.
We will denote these dual spaces by S
and T
: T
y, x
S
= y, Ax
T
x S, y T
.
Notice here that the brackets on the left-hand side of the equation represent the pairing in S, and those
on the right-hand side correspond to the pairing in T. We can then dene the primal-dual pair of (conic)
optimization problems:
minimize c, x
S
subject to
Ax = b
x K
maximize y, b
T
subject to c A
y K
,
where b T, c S
y, x
S
= c A
y, x
S
0.
In the usual cases (e.g., LP and SDP), the vector spaces are nite dimensional, and thus isomorphic to
their duals. The specic correspondence between these is given through whatever inner product we use.
Among the classes of problems that can be interpreted as particular cases of the general conic
formulation we have linear programs, second-order cone programs (SOCP), and SDP, when we take the
cone K to be the nonnegative orthant R
n
+
, the second order cone in n variables, or the PSD cone S
n
+
.
We have then the following natural inclusion relationship among the dierent optimization classes.
LP SOCP SDP.
2.2 Geometric interpretation: separating hyperplanes
We give here a simple interpretation of duality, in terms of the separating hyperplane theorem. For
simplicity, we concentrate on the case of feasibility only, i.e., where we are interested in deciding the
existence of a solution x to the equations
Ax = b, x K, (4)
where as before K is a proper cone in the vector space S.
Consider now the image A(K) of the cone under the linear mapping. Notice that feasibility of (4) is
equivalent to the point b being contained on A(K). We have now two convex sets in T, namely A(K)
and {b}, and we are interested in knowing whether they intersect or not. If these sets satisfy certain
2-3
properties (for instance, closedness and compactness) then we could apply the separating hyperplane
theorem, to produce a linear functional y that will be positive on one set, and negative on the other. In
particular, nonnegativity on A(K) implies
y, A(x) x K A
(y), x x K A
(y) K
.
Thus, under these conditions, if (4) is infeasible, there is a linear functional y satisfying
y, b < 0, A
y K
.
This yields a certicate of the infeasibility of the conic system (4).
2.3 Strong duality in SDP
Despite the formal similarities, there are a number of dierences between linear programming and general
conic programming (and in particular, SDP). Among them, we notice that in SDP optimal solutions
may not necessarily exist (even if the optimal value is nite), and there can be a nonzero duality gap.
Nevertheless, we have seen that weak duality always holds for conic programming problems. As
opposed to the LP case, strong duality can fail in general SDP. A nice example is given in [VB96, p. 65],
where both the primal and dual problems are feasible, but their optimal values are dierent (i.e., there
is a nonzero nite duality gap).
Nevertheless, under relatively mild constraint qualications (Slaters condition, equivalent to the ex-
istence of strictly feasible primal and dual solutions) that are usually satised in practice, SDP problems
have strong duality, and thus zero duality gap.
Theorem 4. Assume that both the primal and dual problems are strictly feasible. Then, both achieve
their optimal solutions, and there is no duality gap.
There are several geometric interpretations of what causes the failure of strong duality for general SDP
problems. A good one is based on the fact that the image of a proper cone under a linear transformation
is not necessarily a proper cone. This fact seems quite surprising (or even wrong!) the rst time one
encounters it, but after a little while it starts being quite reasonable. Can you think of an example where
this happens? What property will fail?
It should be mentioned that it is possible to formulate a more complicated SDP dual program
(called the Extended Lagrange-Slater Dual in [Ram97]) for which strong duality always holds. For
details, as well as a comparison with the more general minimal cone approach, we refer the reader to
[Ram97, RTW97].
3 Applications
There have been many applications of SDP in a variety of areas of applied mathematics and engineering.
We present here just a few, to give a avor of what is possible. Many more will follow.
3.1 Lyapunov stability and control
Consider a linear dierence equation (i.e., a discrete-time linear system) given by
x(k + 1) = Ax(k), x(0) = x
0
.
It is well-known (and easy to prove) that x(k) converges to zero for all initial conditions x
0
i |
i
(A)| < 1,
for i = 1, . . . , n.
There is a simple characterization of this spectral radius condition in terms of a quadratic Lyapunov
function V (x(k)) = x(k)
T
Px(k).
2-4
Theorem 5. Given an n n real matrix A, the following conditions are equivalent:
(i) All eigenvalues of A are inside the unit circle, i.e., |
i
(A)| < 1 for i = 1, . . . , n.
(ii) There exist a matrix P S
n
such that
P 0, A
T
PAP 0.
Proof. (ii) (i) : Let Av = v. Then,
0 > v
(A
T
PAP)v = (||
2
1) v
Pv
>0
,
and therefore || < 1.
(i) (ii) : Let P :=
k=0
(A
k
)
T
QA
k
, where Q 0. The sum converges by the eigenvalue assumption.
Then,
A
T
PAP =
k=1
(A
k
)
T
QA
k
k=0
(A
k
)
T
QA
k
= Q 0
Consider now the case where A is not stable, but we can use linear state feedback, i.e., A(K) =
A + BK, where K is a xed matrix. We want to nd a matrix K such that A + BK is stable, i.e., all
its eigenvalues have absolute value smaller than one.
Use Schur complements to rewrite the condition:
(A+BK)
T
P(A+BK) P 0, P 0
P (A+BK)
T
P
P(A+BK) P
0
This condition is not simultaneously convex in (P, K) (since it is bilinear). However, we can do a
congruence transformation with Q := P
1
, and obtain:
Q Q(A+BK)
T
(A+BK)Q Q
0
Now, dening a new variable Y := KQ we have
Q QA
T
+Y
T
B
T
AQ+BY Q
0.
This problem is now linear in (Q, Y ). In fact, it is an SDP problem. After solving it, we can recover the
controller K via K = Q
1
Y .
3.2 Theta function
Given a graph G = (V, E), a stable set (or independent set) is a subset of V with the property that the
induced subgraph has no edges. In other words, none of the selected vertices are adjacent to each other.
The stability number of a graph, usually denoted by (G), is the cardinality of the largest stable set.
Computing the stability number of a graph is NP-hard. There are many interesting applications of the
stable set problem. In particular, they can be used to provide upper bounds on the Shannon capacity of
a graph [Lov79], a problem of that appears in coding theory (when computing the zero-error capacity of
a noisy channel [Sha56]). In fact, this was one of the rst appearances of what today is known as SDP.
2-5
The Lovasz theta function is denoted by (G), and is dened as the solution of the SDP :
max J X s.t.
_
_
Tr(X) = 1
X
ij
= 0 (i, j) E
X 0
(5)
where J is the matrix with all entries equal to one. The theta function is an upper bound on the stability
number, i.e.,
(G) (G).
The inequality is easy to prove. Consider the indicator vector (S) of any stable set S, and dene the
matrix X :=
1
|S|
T
. Is is easy to see that this X is a feasible solution of the SDP, and it achieves an
objective value equal to |S|. As a consequence, the inequality above directly follows.
3.3 Euclidean distance matrices
Assume we are given a list of pairwise distances between a nite number of points. Under what conditions
can the points be embedded in some nite-dimensional space, and those distances be realized as the
Euclidean metric between the embedded points? This problem appears in a large number of applications,
including distance geometry, computational chemistry, and machine learning.
Concretely, assume we have a list of distances d
ij
, for (i, j) [1, n][1, n]. We would like to nd points
x
i
R
k
(for some value of k), such that ||x
i
x
j
|| = d
ij
for all i, j. What are necessary and sucient
conditions for such an embedding to exist? In 1935, Schoenberg [Sch35] gave an exact characterization
in terms of the semideniteness of the matrix of squared distances:
Theorem 6. The distances d
ij
can be embedded in a Euclidean space if and only if the n n matrix
D :=
_
_
0 d
2
12
d
2
13
. . . d
2
1n
d
2
12
0 d
2
23
. . . d
2
2n
d
2
13
d
2
23
0 . . . d
2
3n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
d
2
1n
d
2
2n
d
2
3n
. . . 0
_
_
is negative semidenite on the subspace orthogonal to the vector e := (1, 1, . . . , 1).
Proof. We show only the necessity of the condition. Assume an embedding exists, i.e., there are points
x
i
R
k
such that d
ij
= ||x
i
x
j
||. Consider now the Gram matrix G of inner products
G :=
_
_
x
1
, x
1
x
1
, x
2
. . . x
1
, x
n
x
2
, x
1
x
2
, x
2
. . . x
2
, x
n
.
.
.
.
.
.
.
.
.
.
.
.
x
n
, x
1
x
n
, x
2
. . . x
n
, x
n
_
= [x
1
, . . . , x
n
]
T
[x
1
, . . . , x
n
],
which is positive semidenite by construction. Since D
ij
= ||x
i
x
j
||
2
= x
i
, x
i
+ x
j
, x
j
2x
i
, x
j
,
we have
D = diag(G) e
T
+e diag(G)
T
2G,
from where the result directly follows.
Notice that the dimension of the embedding is given by the rank k of the Gram matrix G.
For more on this and related embeddings problems, good starting points are Schoenbergs original
paper [Sch35], as well as the book [DL97].
2-6
4 Software
Remark 7. There are many good software codes for semidenite programming. Among the most well-
known, we mention the following ones:
SeDuMi, originally by Jos Sturm, now being maintained by the optimization group at Lehigh:
http: // sedumi. ie. lehigh. edu
SDPT3, by Kim-Chuan Toh, Reha T ut unc u, and Mike Todd. http: // www. math. nus. edu. sg/
~
mattohkc/ sdpt3. html
SDPA, by the research group of Masakazu Kojima, http: // sdpa. indsys. chuo-u. ac. jp/ sdpa/
CSDP, originally by Brian Borchers, now a COIN-OR project: https: // projects. coin-or.
org/ Csdp/
A very convenient way of using these (and other) SDP solvers under MATLAB is through the YALMIP
parser/solver (Johan Lofberg, http: // users. isy. liu. se/ johanl/ yalmip/ ), or the disciplined con-
vex programming software CVX (Michael Grant and Stephen Boyd, http: // www. stanford. edu/
~
boyd/
cvx .
References
[BV04] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004.
[DL97] M. M. Deza and M. Laurent. Geometry of cuts and metrics, volume 15 of Algorithms and
Combinatorics. Springer-Verlag, Berlin, 1997.
[Lov79] L. Lovasz. On the Shannon capacity of a graph. IEEE Transactions on Information Theory,
25(1):17, 1979.
[Ram97] M. V. Ramana. An exact duality theory for semidenite programming and its complexity
implications. Math. Programming, 77(2, Ser. B):129162, 1997.
[RTW97] M. V. Ramana, L. Tuncel, and H. Wolkowicz. Strong duality for semidenite programming.
SIAM J. Optim., 7(3):641662, 1997.
[Sch35] I. J. Schoenberg. Remarks to Maurice Frechets article Sur la denition axiomatique dune
classe despace distancies vectoriellement applicable sur lespace de Hilbert. Ann. of Math.
(2), 36(3):724732, 1935.
[Sha56] C. Shannon. The zero error capacity of a noisy channel. Information Theory, IRE Transactions
on, 2(3):819, September 1956.
[Tod01] M. Todd. Semidenite optimization. Acta Numerica, 10:515560, 2001.
[VB96] L. Vandenberghe and S. Boyd. Semidenite programming. SIAM Review, 38(1):4995, March
1996.
2-7
MIT 6.256 - Algebraic techniques and semidenite optimization February 10, 2010
Lecture 3
Lecturer: Pablo A. Parrilo Scribe: Pablo A. Parrilo
In this lecture, we will discuss one of the most important applications of semidenite programming,
namely its use in the formulation of convex relaxations of nonconvex optimization problems. We will
present the results from several dierent, but complementary, points of view. These will also serve us as
starting points for the generalizations to be presented later in the course.
We will discuss rst the case of binary quadratic optimization, since in this case the notation is
simpler, and perfectly illustrates many of the issues appearing in more complicated problems. Afterwards,
a more general formulation containing arbitrary linear and quadratic constraints will be presented.
1 Binary optimization
Binary (or Boolean) quadratic optimization is a classical combinatorial optimization problem. In the
version we consider, we want to minimize a quadratic function, where the decision variables can only
take the values 1. In other words, we are minimizing an (indenite) quadratic form over the vertices
of an n-dimensional hypercube. The problem is formally expressed as:
minimize x
T
Qx
subject to x
i
{1, 1}
(1)
where Q S
n
. There are many well-known problems that can be naturally written in the form above.
Among these, we mention the maximum cut problem (MAXCUT) discussed below, the 0-1 knapsack,
the linear quadratic regulator (LQR) control problem with binary inputs, etc.
Notice that we can model the Boolean constraints using quadratic equations, i.e.,
x
i
{1, 1} x
2
i
1 = 0.
These n quadratic equations dene a nite set, with an exponential number of elements, namely all
the n-tuples with entries in {1, 1}. There are exactly 2
n
points in this set, so a direct enumeration
approach to (1) is computationally prohibitive when n is large (already for n = 30, we have 2
n
10
9
).
We can thus write the equivalent polynomial formulation:
minimize x
T
Qx
subject to x
2
i
= 1
(2)
We will denote the optimal value and optimal solution of this problem as f
and x
, respectively. It is
well-known that the decision version of this problem is NP-complete (e.g., [GJ79]). Notice that this is
true even if the matrix Q is positive denite (i.e., Q 0), since we can always make Q positive denite
by adding to it a constant multiple of the identity (this only shifts the objective by a constant).
Example 1 (MAXCUT) The maximum cut (MAXCUT) problem consists in nding a partition of
the nodes of a graph G = (V, E) into two disjoint sets V
1
and V
2
(V
1
V
2
= , V
1
V
2
= V ), in such a
way to maximize the number of edges that have one endpoint in V
1
and the other in V
2
. It has important
practical applications, such as optimal circuit layout. The decision version of this problem (does there
exist a cut with value greater than or equal to K?) is NP-complete [GJ79].
We can easily rewrite the MAXCUT problem as a binary optimization problem. A standard formu-
lation (for the weighted problem) is the following:
max
yi{1,1}
1
4
i,j
w
ij
(1 y
i
y
j
), (3)
3-1
where w
ij
is the weight corresponding to the (i, j) edge, and is zero if the nodes i and j are not connected.
The constraints y
i
{1, 1} are equivalent to the quadratic constraints y
2
i
= 1.
We can easily convert the MAXCUT formulation into binary quadratic programming. Removing the
constant term, and changing the sign, the original problem is clearly equivalent to:
min
y
2
i
=1
i,j
w
ij
y
i
y
j
. (4)
1.1 Semidenite relaxations
Computing good solutions to the binary optimization problem given in (2) is a quite dicult task, so
it is of interest to produce accurate bounds on its optimal value. As in all minimization problems, upper
bounds can be directly obtained from feasible points. In other words, if x
0
R
n
has entries equal to
1, it always holds that f
x
T
0
Qx
0
(of course, for a poorly chosen x
0
, this upper bound may be very
loose).
To prove lower bounds, we need a dierent technique. There are several approaches to do this, but
as we will see in detail in the next sections, many of them will turn out to be exactly equivalent in the
end. Indeed, many of these dierent approaches will yield a characterization of a lower bound in terms
of the following primal-dual pair of semidenite programming problems:
minimize Tr QX
subject to X
ii
= 1
X 0
maximize Tr
subject to Q
diagonal
(5)
In the next sections, we will derive these SDPs several times, in a number of dierent ways. Let us
notice here rst that for this primal-dual pair of SDP, strong duality always holds, and both achieve
their corresponding optimal solutions (why?).
1.2 Lagrangian duality
A general approach to obtain lower bounds on the value of (non)convex minimization problems is to use
Lagrangian duality. As we have seen, the original Boolean minimization problem can be written as:
minimize x
T
Qx
subject to x
2
i
1 = 0.
(6)
For notational convenience, let := diag(
1
, . . . ,
n
). Then, the Lagrangian function can be written as:
L(x, ) = x
T
Qx
n
i=1
i
(x
2
i
1) = x
T
(Q)x + Tr .
For the dual function g() := inf
x
L(x, ) to be bounded below, we need the implicit constraint that
the matrix Q must be positive semidenite. In this case, the optimal value of x is zero, yielding
g() = Tr , and thus we obtain a lower bound on f
i=1
ii
x
2
i
= Tr ,
where
The rst inequality follows from Q
The second equation holds since the matrix is diagonal
Finally, the third one holds since x
i
{+1, 1}
There is also a nice corresponding geometric interpretation. For simplicity, we assume without loss
of generality that Q is positive denite. Then, the problem (2) can be intepreted as nding the largest
value of for which the ellipsoid {x R
n
|x
T
Qx } does not contain a vertex of the unit hypercube.
Consider now the two ellipsoids in R
n
dened by:
E
1
= {x R
n
| x
T
Qx Tr}
E
2
= {x R
n
| x
T
x Tr}.
The principal axes of ellipsoid E
2
are aligned with the coordinates axes (since is diagonal), and
furthermore its boundary contains all the vertices of the unit hypercube. Also, it is easy to see that the
condition Q implies E
1
E
2
.
With these facts, it is easy to understand the related problem that the SDP relaxation is solving:
dilating E
1
as much as possible, while ensuring the existence of another ellipsoid E
2
with coordinate-
aligned axes and touching the hypercube in all 2
n
vertices; see Figure 1 for an illustration.
1.4 Probabilistic interpretation
The standard semidenite relaxation described above can also be motivated via a probabilistic argument.
For this, assume that rather than choosing the optimal x in a deterministic fashion, we want to nd
3-3
-1
-0.5
0
0.5
1
-1
-0.5
0
0.5
1
-1
-0.5
0
0.5
1
-1
-0.5
0
0.5
Figure 2: The three-dimensional spectraplex. This is the set of 3 3 positive semidenite matrices,
with unit diagonal.
instead a probability distribution that will yield good solutions on average. For symmetry reasons, we
can always restrict ourselves to distributions with zero mean. The objective value then becomes
E[x
T
Qx] = E[Tr Qxx
T
] = Tr QE[xx
T
] = Tr QX, (8)
where X is the covariance matrix of the distribution (which is necessarily positive semidenite). For the
constraints, we may require that the solutions we generate be feasible on expectation, thus having:
E[x
2
i
] = X
ii
= 1. (9)
Maximizing the expected value of the expected cost (8), under the constraint (9) yields the primal side
of the SDP relaxation presented in (5).
1.5 Lifting and rank relaxation
We present yet another derivation of the SDP relaxations, this time focused on the primal side. Recall
the original formulation of the optimization problem (2). Dene now X := xx
T
. By construction, the
matrix X S
n
satises X 0, X
ii
= x
2
i
= 1, and has rank one. Conversely, any matrix X with
X 0, X
ii
= 1, rank X = 1
necessarily has the form X = xx
T
for some 1 vector x (why?). Furthermore, by the cyclic property of
the trace, we can express the objective function directly in terms of the matrix X, via:
x
T
Qx = Tr x
T
Qx = Tr Qxx
T
= Tr QX.
As a consequence, the original problem (2) can be exactly rewritten as:
minimize Tr QX
subject to X
ii
= 1, X 0
rank(X) = 1.
This is almost an SDP problem (all the constraints are either linear or conic), except for the rank one
constraint on X. Since this is a minimization problem, a lower bound on the solution can be obtained
by dropping the (nonconvex) rank constraint, which enlarges the feasible set.
3-4
A useful interpretation is in terms of a nonlinear lifting to a higher dimensional space. Indeed, rather
than solving the original problem in terms of the n-dimensional vector x, we are instead solving for the
n n matrix X, eectively converting the problem from R
n
to S
n
(which has dimension
n+1
2
).
Observe that this line of reasoning immediately shows that if we nd an optimal solution X of the
SDP (5) that has rank one, then we have solved the original problem. Indeed, in this case the upper
and lower bounds on the solution coincide.
As a graphical illustration, in Figure 2 we depict the set of 3 3 positive semidenite matrices of
unit diagonal. The rank one matrices correspond to the four vertices of this convex set, and are in
(two-to-one) correspondence with the eight 3-vectors with 1 entries.
In general, it is not the case that the optimal solution of the SDP relaxation will be rank one.
However, as we will see in the next section, it is possible to use rounding schemes to obtain nearby
rank one solutions. Furthermore, in some cases, it is possible to do so while obtaining some approximation
guarantees on the quality of the rounded solutions.
2 Bounds: Goemans-Williamson and Nesterov
So far, our use of the SDP relaxation (5) has been limited to providing only a posteriori bounds on the
optimal solution of the original minimization problem. However, two desirable features are missing:
Approximation guarantees: is it possible to prove general properties on the quality of the bounds
obtained by SDP?
Feasible solutions: can we (somehow) use the SDP relaxations to provide not just bounds, but
actual feasible points with good (or optimal) values of the objective?
As we will see, it turns out that both questions can be answered in the positive. As it has been shown
by Goemans and Williamson [GW95] in the MAXCUT case, and Nesterov in a more general setting,
we can actually achieve both of these objectives by randomly rounding in an appropriate manner the
solution X of this relaxation. We discuss these results below.
2.1 Goemans-Williamson rounding
In their celebrated MAXCUT paper, Goemans and Williamson developed the following randomized
method for nding a good feasible cut from the solution of the SDP.
Factorize X as X = V
T
V , where V = [v
1
. . . v
n
] R
rn
, where r is the rank of X.
Then X
ij
= v
T
i
v
j
, and since X
ii
= 1 this factorization gives n vectors v
i
on the unit sphere in R
r
Instead of assigning either 1 or 1 to each variable, we have assigned to each x
i
a point on the
unit sphere in R
r
.
Now, choose a random hyperplane in R
r
, and assign to each variable x
i
either a +1 or a 1,
depending on which side of the hyperplane the point v
i
lies.
It turns out that this procedure gives a solution that, on average, is quite close to the value of the
SDP bound. We will compute the expected value of the rounded solution in a slightly dierent form
from the original G-W argument, but one that will be helpful later.
The random hyperplane can be characterized by its normal vector p, which is chosen to be uniformly
distributed on the unit sphere (e.g., by suitably normalizing a standard multivariate Gaussian random
variable). Then, according to the description above, the rounded solution is given by x
i
= sign(p
T
v
i
).
The expected value of this solution can then be written as:
E
p
[x
T
Qx] =
ij
Q
ij
E
p
[x
i
x
j
] =
ij
Q
ij
E
p
[sign(p
T
v
i
) sign(p
T
v
j
)].
3-5
1 0.5 0 0.5 1
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
t
(2/) arccos t
(1t)
Figure 3: Bound on the inverse cosine function, for 0.878.
We can easily compute the value of this expectation. Consider the plane spanned by v
i
and v
j
, and let
ij
be the angle between these two vectors. Then, it is easy to see that the desired expectation is equal
to the probability that both points are on the same side of the hyperplane, minus the probability that
they are on dierent sides. These probabilities are 1
ij
and
ij
ij
Q
ij
1
2
ij
ij
Q
ij
1
2
arccos(v
T
i
v
j
)
=
2
ij
Q
ij
arcsinX
ij
. (10)
Notice that the expression is of course well-dened, since if X is PSD and has unit diagonal, all its
entries are bounded in absolute value by 1. This result exactly characterizes the expected value of the
rounding procedure, as a function of the optimal solution of the SDP. We would like, however, to directly
relate this quantity to the optimal solution of the original optimization problem. For this, we will need
additional assumptions on the matrix Q. We discuss next two of the most important results in this
direction.
2.2 MAXCUT bound
Recall from (3) that for the MAXCUT problem, the objective function does not only include the quadratic
part, but there is actually a constant term:
1
4
ij
w
ij
(1 y
i
y
j
).
The expected value of the cut is then:
c
sdp-expected
=
1
4
ij
w
ij
1
2
arcsinX
ij
=
1
4
2
ij
w
ij
arccos X
ij
,
where we have used the identity arcsint + arccos t =
2
. On the other hand, the optimal solution of the
primal SDP gives an upper bound on the cut capacity equal to:
c
sdp-upper-bound
=
1
4
ij
w
ij
(1 X
ij
).
To relate these two quantities, we look for a constant such that
(1 t)
2
1
4
2
ij
w
ij
arccos X
ij
=
1
c
sdp-expected
Notice that here we have used the nonnegativity of the weights (i.e., w
ij
0). Thus, so far we have the
following inequalities:
c
sdp-upper-bound
1
c
sdp-expected
Also clearly c
sdp-expected
c
max
And c
max
c
sdp-upper-bound
Putting it all together, we can sandwich the value of the relaxation as follows:
c
sdp-upper-bound
c
sdp-expected
c
max
c
sdp-upper-bound
.
2.3 Nesterovs
2
result
A result by Nesterov generalizes the MAXCUT bound described above, but for a larger class of problems.
The original formulation is for the case of binary maximization, and applies to the case when the matrix
A is positive semidenite. Since the problem is homogeneous, the optimal value is guaranteed to be
nonnegative.
As we have seen, the expected value of the solution after randomized rounding is given by (10). Since
X is positive semidenite, it follows from the nonnegativity of the Taylor series of arcsin(t) t and the
Schur product theorem that
arcsin[X] X,
where the arcsin function is applied componentwise. This inequality can be combined with (10) to give
the bounds:
2
f
sdp-upper-bound
f
sdp-expected
f
max
f
sdp-upper-bound
,
where 2/ 0.636. For more details, see [BTN01, Section 4.3.4]. Among others, the paper [Meg01]
presents several new results, as well as a review of many of the available approximation schemes.
3 Linearly constrained problems
In this section we extend the earlier results, to general quadratic optimization problems under linear
and quadratic constraints. For notational simplicity, we write the constraints in homogeneous form, i.e.,
in terms of the vector x =
1 y
T
T
.
The general primal form of the SDP optimization problems we are concerned with is
minimize x
T
Qx
subject to x
T
A
i
x 0
Bx 0
x =
1
y
3-7
The corresponding primal and dual SDP relaxations are given by
minimize Q X
subject to e
T
1
Xe
1
= 1
A
i
X 0
BXe
1
0
BXB
T
0
X 0
maximize
subject to Q e
1
e
T
1
+
i
i
A
i
+
+ e
1
T
B + B
T
e
T
1
+ B
T
B
i
0
0
0
ii
= 0
(11)
Here e
1
is the n-vector with a 1 on the rst component, and all the rest being zero. The dual variables
i
can be interpreted as Lagrange multipliers associated to the quadratic constraints of the primal problem,
while the nonnegative symmetric matrix corresponds to pairwise products of the linear constraints.
References
[BTN01] A. Ben-Tal and A. Nemirovski. Lectures on modern convex optimization. MPS/SIAM Series
on Optimization. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA,
2001.
[GJ79] M. R. Garey and D. S. Johnson. Computers and Intractability: A guide to the theory of
NP-completeness. W. H. Freeman and Company, 1979.
[GW95] M. X. Goemans and D. P. Williamson. Improved approximation algorithms for maximum cut
and satisability problems using semidenite programming. Journal of the ACM, 42(6):1115
1145, 1995.
[Meg01] A. Megretski. Relaxations of quadratic programs in operator theory and system analysis.
In Systems, approximation, singular integral operators, and related topics (Bordeaux, 2000),
volume 129 of Oper. Theory Adv. Appl., pages 365392. Birkhauser, Basel, 2001.
3-8
MIT 6.256 - Algebraic techniques and semidenite optimization February 19, 2010
Lecture 4
Lecturer: Pablo A. Parrilo Scribe: Pablo A. Parrilo
In this lecture we will review some basic elements of abstract algebra. We also introduce and begin
studying the main objects of our considerations, multivariate polynomials.
1 Review: groups, rings, elds
We present here standard background material on abstract algebra. Most of the denitions are from
[Lan71, CLO97, DF91, BCR98].
Denition 1 A group consists of a set G and a binary operation dened on G, for which the
following conditions are satised:
1. Associative: (a b) c = a (b c), for all a, b, c G.
2. Identity: There exist 1 G such that a 1 = 1 a = a, for all a G.
3. Inverse: Given a G, there exists b G such that a b = b a = 1.
For example, the integers Z form a group under addition, but not under multiplication. Another example
is the set GL(n, R) of real nonsingular n n matrices, under matrix multiplication.
If we drop the condition on the existence of an inverse, we obtain a monoid. Note that a monoid
always has at least one element, the identity. As an example, given a set S, then the set of all strings of
elements of S is a monoid, where the monoid operation is string concatenation and the identity is the
empty string . Another example is given by N
0
, with the operation being addition (in this case, the
identity is the zero). Monoids are also known as semigroups with identity.
In a group we only have one binary operation (multiplication). We will introduce another operation
(addition), and study the structure that results from their interaction.
Denition 2 A commutative ring (with identity) consists of a set k and two binary operations and
+, dened on k, for which the following conditions are satised:
1. Associative: (a + b) + c = a + (b + c) and (a b) c = a (b c), for all a, b, c k.
2. Commutative: a + b = b + a and a b = b a, for all a, b k.
3. Distributive: a (b + c) = a b + a c, for all a, b, c k.
4. Identities: There exist 0, 1 k such that a + 0 = a 1 = a, for all a k.
5. Additive inverse: Given a k, there exists b k such that a + b = 0.
A simple example of a ring are the integers Z under the usual operations. After formally introducing
polynomials, we will see a few more examples of rings.
If we add a requirement for the existence of multiplicative inverses, we obtain elds.
Denition 3 A eld consists of a set k and two binary operations and +, dened on k, for which
the following conditions are satised:
1. Associative: (a + b) + c = a + (b + c) and (a b) c = a (b c), for all a, b, c k.
2. Commutative: a + b = b + a and a b = b a, for all a, b k.
3. Distributive: a (b + c) = a b + a c, for all a, b, c k.
4-1
4. Identities: There exist 0, 1 k, where 0 = 1, such that a + 0 = a 1 = a, for all a k.
5. Additive inverse: Given a k, there exists b k such that a + b = 0.
6. Multiplicative inverse: Given a k, a = 0, there exists c k such that a c = 1.
Any eld is obviously a commutative ring. Some commonly used elds are the rationals Q, the reals
R and the complex numbers C. There are also Galois or nite elds (the set k has a nite number of
elements), such as Z
p
, the set of integers modulo p, where p is a prime. Another important eld is given
by k(x
1
, . . . , x
n
), the set of rational functions with coecients in the eld k, with the natural operations.
2 Polynomials and ideals
Consider a given eld k, and let x
1
, . . . , x
n
be indeterminates. We can then dene polynomials.
Denition 4 A polynomial f in x
1
, . . . , x
n
with coecients in a eld k is a nite linear combination
of monomials:
f =
x
1
1
. . . x
n
n
, c
k, (1)
where the sum is over a nite number of n-tuples = (
1
, . . . ,
n
),
i
N
0
. The set of all polynomials
in x
1
, . . . , x
n
with coecients in k is denoted k[x
1
, . . . , x
n
].
It follows from the previous denitions that k[x
1
, . . . , x
n
], i.e., the set of polynomials in n variables with
coecients in k, is a commutative ring with identity. We also notice that it is possible (and sometimes,
convenient) to dene polynomials where the coecients belong to a ring with identity, not necessarily
to a eld.
Denition 5 A form is a polynomial where all the monomials have the same degree d :=
i
i
. In this
case, the polynomial is homogeneous of degree d, since it satises f(x
1
, . . . , x
n
) =
d
f(x
1
, . . . , x
n
).
A polynomial in n variables of degree d has
n+d
d
coecients.
A commutative ring is called an integral domain if it has no zero divisors, i.e. a = 0, b = 0 a b = 0.
Every eld is also an integral domain (why?). Two examples of rings that are not integral domains are
the set of matrices R
nn
, and the set of integers modulo n, when n is a composite number (with the
usual operations). If k is an integral domain, then so is k[x
1
, . . . , x
n
].
Remark 6 Another important example of a ring (in this case, non-commutative) appears in systems
and control theory, through the ring M(s) of stable proper rational functions. This is the set of matri-
ces (of xed dimension) whose entries are rational functions of s (i.e., in the eld C(s)), are bounded
at innity, and have all poles in the strict left-half plane. In this algebraic setting (usually called co-
prime factorization approach), the question of nding a stabilizing controller is exactly equivalent to the
solvability of a Diophantine equation ax + by = 1.
2.1 Algebraically closed and formally real elds
A very important property of a univariate polynomial p is the existence of a root, i.e., an element x
0
for
which p(x
0
) = 0. Depending on the solvability of these equations, we can characterize a particular nice
class of elds.
Denition 7 A eld k is algebraically closed if every nonconstant polynomial in k[x] has a root in k.
4-2
Formally real Not formally real
Algebraically closed C
Not algebraically closed R, Q nite elds F
p
k
Table 1: Examples of elds.
If a eld is algebraically closed, then it has an innite number of elements (why?). What can we say about
the most usual elds, C and R? The Fundamental Theorem of Algebra (every univariate polynomial
has at least one complex root) shows that C is an algebraically closed eld.
However, this is clearly not the case of R, since for instance the polynomial x
2
+1 does not have any
real root. The lack of algebraic closure of R is one of the main sources of complications when dealing
with systems of polynomial equations and inequalities. To deal with the case when the base eld is not
algebraically closed, the Artin-Schreier theory of formally real elds was introduced.
The starting point is one of the intrinsic properties of R:
n
i=1
x
2
i
= 0 = x
1
= . . . = x
n
= 0. (2)
A eld will be called formally real if it satises the above condition (clearly, R and Q are formally real,
but C is not). As we can see from the denition, the theory of formally real elds has very strong
connections with sums of squares, a notion that will reappear in several forms later in the course. For
example, an alternative (but equivalent) statement of (2) is to say that a eld is formally real if and
only if the element 1 is not a sum of squares.
The relationships between these concepts, as well as a few examples, are presented in Table 2.1. Notice
that if a eld is algebraically closed, then it cannot be formally real, since we have that (
1)
2
+1
2
= 0
(and
1 is in the eld).
A related important notion is that of an ordered eld:
Denition 8 A eld k is said to be ordered if a relation > is dened on k, that satises
1. If a, b k, then either a > b or a = b or b > a.
2. If a > b, c k, c > 0 then ac > bc.
3. If a > b, c k, then a + c > b + c.
A crucial result relating these two notions is the following:
Lemma 9 A eld can be ordered if and only if it is formally real.
For a eld to be ordered (or equivalently, formally real), it necessarily must have an innite number of
elements. This is somewhat unfortunate, since this rules out several modular methods for dealing with
real solutions to polynomial inequalities.
2.2 Ideals
We consider next ideals, which are subrings with an absorbent property:
Denition 10 Let R be a commutative ring. A subset I R is an ideal if it satises:
1. 0 I.
2. If a, b I, then a + b I.
4-3
3. If a I and b R, then a b I.
A simple example of an ideal is the set of even integers, considered as a subset of the integer ring Z.
Another important example is the set of nilpotent elements of a ring, i.e., those x R for which there
exists a positive integer k such that x
k
= 0. Also, notice that if the ideal I contains the multiplicative
identity 1, then I = R.
To introduce another important example of ideals, we need to dene the concept of an algebraic
variety as the zero set of a set of polynomial equations:
Denition 11 Let k be a eld, and let f
1
, . . . , f
s
be polynomials in k[x
1
, . . . , x
n
]. Let the set V be
V(f
1
, . . . , f
s
) = {(a
1
, . . . , a
n
) k
n
: f
i
(a
1
, . . . , a
n
) = 0 1 i s}.
We call V(f
1
, . . . , f
s
) the ane variety dened by f
1
, . . . , f
s
.
Then, the set of polynomials that vanish in a given variety, i.e.,
I(V ) = {f k[x
1
, . . . , x
n
] : f(a
1
, . . . , a
n
) = 0 (a
1
, . . . , a
n
) V },
is an ideal, called the ideal of V .
By Hilberts Basis Theorem [CLO97], k[x
1
, . . . , x
n
] is a Noetherian ring, i.e., every ideal I
k[x
1
, . . . , x
n
] is nitely generated. In other words, there always exists a nite set f
1
, . . . , f
s
k[x
1
, . . . , x
n
]
such that for every f I, we can nd g
i
k[x
1
, . . . , x
n
] that verify f =
s
i=1
g
i
f
i
.
We also dene the radical of an ideal:
Denition 12 Let I k[x
1
, . . . , x
n
] be an ideal. The radical of I, denoted
I, is the set
{f | f
k
I for some integer k 1}.
It is clear that I
I.
2.3 Associative algebras
Another important notion, that we will encounter at least twice later in the course, is that of an asso-
ciative algebra.
Denition 14 An associative algebra A over C is a vector space with a C-bilinear operation : AA
A that satises
x (y z) = (x y) z, x, y, z A.
In general, associative algebras do not need to be commutative (i.e., x y = y x). However, that is
an important special case, with many interesting properties. We list below several examples of nite
dimensional associative algebras.
Full matrix algebra C
nn
, standard product.
The subalgebra of square matrices with equal row and column sums.
The diagonal, lower triangular, or circulant matrices.
The n-dimensional algebra generated by a single n n matrix.
The incidence algebra of a partially ordered nite set.
The group algebra: formal C-linear combination of group elements.
Polynomial multiplication modulo a zero dimensional ideal.
The Bose-Mesner algebra of an association scheme.
We will discuss the last three in more detail later in the course.
4-4
3 Questions about polynomials
There are many natural questions that we may want to answer about polynomials, even in the univariate
case. Among them, we mention:
When does a univariate polynomial have only real roots?
What conditions must it satisfy for all roots to be real?
When does a polynomial satisfy p(x) 0 for all x?
We will answer many of these next week.
References
[BCR98] J. Bochnak, M. Coste, and M-F. Roy. Real Algebraic Geometry. Springer, 1998.
[CLO97] D. A. Cox, J. B. Little, and D. OShea. Ideals, varieties, and algorithms: an introduction to
computational algebraic geometry and commutative algebra. Springer, 1997.
[DF91] D. S. Dummit and R. M. Foote. Abstract algebra. Prentice Hall Inc., Englewood Clis, NJ,
1991.
[Lan71] S. Lang. Algebra. Addison-Wesley, 1971.
4-5
MIT 6.256 - Algebraic techniques and semidenite optimization February 24, 2010
Lecture 5
Lecturer: Pablo A. Parrilo Scribe: Pablo A. Parrilo
In this lecture we study univariate polynomials, particularly questions regarding the existence of real
roots and nonnegativity conditions. For instance:
When does a univariate polynomial have only real roots?
What conditions must it satisfy for all roots to be real?
When is a polynomial nonnegative, i.e., it satises p(x) 0 for all x R?
1 Univariate polynomials
A univariate polynomial p(x) R[x] of degree n has the form:
p(x) = p
n
x
n
+ p
n1
x
n1
+ + p
1
x + p
0
, (1)
where the coecients p
k
are real. We normally assume p
n
= 0, and occasionally we will normalize it to
p
n
= 1, in which case we say that p(x) is monic.
As we have seen, the eld C of complex numbers is algebraically closed:
Theorem 1 (Fundamental theorem of algebra). Every nonzero univariate polynomial of degree n has
exactly n complex roots (counted with multiplicity). Furthermore, we have the unique factorization
p(x) = p
n
n
k=1
(x x
k
),
where x
k
C are the roots of p(x).
If all the coecients p
k
are real, if x
k
is a root, then so its complex conjugate x
k
. In other words, all
complex roots appear in complex conjugate pairs.
2 Counting real roots
How many real roots does a polynomial have? There are many options, ranging from all roots being
real (e.g., (x 1)(x 2) . . . (x n)), to all roots being complex (e.g., x
2d
+ 1). We will give a couple of
dierent characterizations of the location of the roots of a polynomial, both of them in terms of some
associated symmetric matrices.
2.1 The companion matrix
A very well-known relationship between univariate polynomials and matrices is given through the so-
called companion matrix.
Denition 2. The companion matrix C
p
associated with the polynomial p(x) in (1) is the n n real
matrix
C
p
:=
0 0 0 p
0
/p
n
1 0 0 p
1
/p
n
0 1 0 p
2
/p
n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 1 p
n1
/p
n
.
5-1
Lemma 3. The characteristic polynomial of C
p
is (up to a constant) equal to p(x). Formally, det(xI
C
p
) =
1
pn
p(x).
From this lemma, it directly follows that the eigenvalues of C
p
are exactly equal to the roots x
i
of p(x),
including multiple roots the appropriate number of times. In other words, if we want to obtain the roots
of a polynomial, we can do this by computing instead the eigenvalues of the associated (nonsymmetric)
companion matrix. In fact, that is exactly the way that MATLAB computes roots of polynomials; see
the source le roots.m.
Left and right eigenvectors The companion matrix C
p
is diagonalizable if and only the polynomial
p(x) has no multiple roots. What are the corresponding diagonalizing matrices (equivalently, the right
and left eigenvectors)?
Dene the n n Vandermonde matrix
V =
1 x
1
. . . x
n1
1
1 x
2
. . . x
n1
2
.
.
.
.
.
.
.
.
.
.
.
.
1 x
n
. . . x
n1
n
(2)
where x
1
, . . . , x
n
C. It can be shown that the matrix V is nonsingular if and only if all the x
i
are
distinct. We have then the identity
V C
p
= diag[x
1
, . . . , x
n
] V, (3)
and thus the left eigenvectors of C
p
are the rows of the Vandermonde matrix.
The right eigenvectors are of course given by the columns of V
1
, as can be easily seen by left-
and right-multiplying (3) by this inverse. A natural interpretation of this dual basis (i.e., the columns
of V
1
) is in terms of the Lagrange interpolating polynomials of the points x
i
. These are a set of n
univariate polynomials that satisfy the property L
j
(x
i
) =
ij
, where is the Kronecker delta. It is
easy to verify that the columns of V
1
are the coecients (in the monomial basis) of the corresponding
Lagrange interpolating polynomials.
Example 4. Consider the polynomial p(x) = (x 1)(x 2)(x 5). Its companion matrix is
C
p
=
0 0 10
1 0 17
0 1 8
,
and it is diagonalizable since p has simple roots. Ordering the roots as {1, 2, 5}, the corresponding
Vandermonde matrix and its inverse are:
V =
1 1 1
1 2 4
1 5 25
, V
1
=
1
12
30 20 2
21 24 3
3 4 1
.
From the columns of V
1
, we can read the coecients of the Lagrange interpolating polynomials; e.g.,
L
1
(x) = (30 21x + 3x
2
)/12 = (x 2)(x 5)/4.
Symmetric functions of roots For any A C
nn
, we always have TrA =
n
i=1
i
(A), and
i
(A
k
) =
i
(A)
k
. Therefore, it follows that
n
i=1
x
k
i
= Tr [C
k
p
]. As a consequence of linearity, we have that if
q(x) =
m
j=0
q
j
x
j
is a univariate polynomial,
n
i=1
q(x
i
) =
n
i=1
m
j=0
q
j
x
j
i
=
m
j=0
q
j
Tr[C
j
p
] = Tr[
m
j=0
q
j
C
j
p
] = Tr [q(C
p
)], (4)
5-2
where the expression q(C
p
) indicates the evaluation of the polynomial q(x) on the companion matrix of
p(x). Note that if p is monic, then the nal expression in (4) is a polynomial in the coecients of p.
This is an identity that we will use several times in the sequel.
Remark 5. Our presentation of the companion matrix has been somewhat unmotivated, other than
noticing that it just works. After presenting some additional material on Gr obner bases, we will
revisit this construction, where we will give a natural interpretation of C
p
as representing a well-dened
linear operator in the quotient ring R[x]/p(x). This will enable a very appealing extension of many
results about companion matrices to multivariate polynomials, in the case where the underlying system
has only a nite number of solutions (i.e., a zero dimensional ideal). For instance, the generalization
of the diagonalizability of the companion matrix C
p
when p(x) has only simple roots will be the fact that
the multiplication algebra associated with a zero-dimensional ideal is semisimple if and only if the ideal
is radical.
2.2 Inertia and signature
Denition 6. Consider a symmetric matrix A. The inertia of A, denoted I(A), is the integer triple
(n
+
, n
0
, n
), where n
+
, n
0
, n
.
Notice that, with the notation above, the rank of A is equal to n
+
+ n
. A symmetric positive
denite n n matrix has inertia (n, 0, 0), while a positive semidenite one has (n k, k, 0) for some
k 0. The inertia is an important invariant of a quadratic form, since it holds that I(A) = I(T
AT),
where T is nonsingular. This invariance of the inertia of a matrix under congruence transformations is
known as Sylvesters law of inertia; see for instance [HJ95].
2.3 The Hermite form
While the companion matrix is quite useful, we will present now a dierent characterization of the roots
of a polynomial. Among others, an advantage of this formulation is the fact that we will be using
symmetric matrices.
Let q(x) be a xed auxiliary polynomial. Consider the following n n symmetric Hankel matrix
H
q
(p) with entries dened by
[H
q
(p)]
jk
=
n
i=1
q(x
i
)x
j+k2
i
. (5)
Like every symmetric matrix, H
q
(p) denes an associated quadratic form via
f
T
H
q
(p)f =
f
0
f
1
.
.
.
f
n1
n
i=1
q(x
i
)
n
i=1
q(x
i
)x
i
n
i=1
q(x
i
)x
n1
i
n
i=1
q(x
i
)x
i
n
i=1
q(x
i
)x
2
i
n
i=1
q(x
i
)x
n
i
.
.
.
.
.
.
.
.
.
.
.
.
n
i=1
q(x
i
)x
n1
i
n
i=1
q(x
i
)x
n
i
n
i=1
q(x
i
)x
2n2
i
f
0
f
1
.
.
.
f
n1
=
n
i=1
q(x
i
)(f
0
+ f
1
x
i
+ + f
n1
x
n1
i
)
2
= Tr[(qf
2
)(C
p
)].
Although not immediately obvious from the denition (5), the expression above shows that when p(x)
is monic, the entries of H
q
(p) are actually polynomials in the coecients of p(x). Notice that we have
used (4) in the derivation of the last step.
5-3
Recall that a Vandermonde matrix denes a linear transformation mapping the coecients of a degree
n 1 polynomial f to its values
f(x
1
), . . . , f(x
n
)
j=1
q(x
j
)(f
0
+ f
1
x
j
+ + f
n1
x
n1
j
)
2
=
xjR
q(x
j
)f(x
j
)
2
+
xj,x
j
C\R
q(x
j
)f(x
j
)
2
+ q(x
j
)f(x
j
)
2
=
xjR
q(x
j
)f(x
j
)
2
+ 2
xj,x
j
C\R
f(x
j
)
f(x
j
)
T
q(x
j
) q(x
j
)
q(x
j
) q(x
j
)
f(x
j
)
f(x
j
)
.
Notice that an expression of the type f(x
i
) is a linear form in [f
0
, . . . , f
n1
]. Because of the assumption
that all the roots x
j
are distinct, the linear forms {f(x
j
)}
j=1,...,n
are linearly independent (the corre-
sponding Vandermonde matrix is nonsingular), and thus so are {f(x
j
)}
xjR
{f(x
j
), f(x
j
)}
xjC\R
.
Therefore, the expression above gives a congruence transformation of H
q
(p), and we can obtain its sig-
nature by adding the signatures of the scalar elements q(x
j
) and the 2 2 blocks. The signature of the
2 2 blocks is always zero (they have zero trace), and thus the result follows.
In particular, notice that if we want to count the number of real roots, we can just use q(x) = 1. The
matrix corresponding to this quadratic form (called the Hermite form) is:
H
1
(p) = V
T
V =
s
0
s
1
s
n1
s
1
s
2
s
n
.
.
.
.
.
.
.
.
.
.
.
.
s
n1
s
n
s
2n2
, s
k
=
n
j=1
x
k
j
.
The s
k
are known as the power sums and can be computed using (4) (although there are much more
ecients ways, such as the Newton identities). When p(x) is monic, the s
k
are polynomials of degree k
in the coecients of p(x).
Example 8. Consider the monic cubic polynomial
p(x) = x
3
+ p
2
x
2
+ p
1
x + p
0
.
Then, the rst ve power sums are:
s
0
= 3
s
1
= p
2
s
2
= p
2
2
2p
1
s
3
= p
3
2
+ 3p
1
p
2
3p
0
s
4
= p
4
2
4p
1
p
2
2
+ 2p
2
1
+ 4p
0
p
2
.
5-4
Lemma 9. The signature of H
1
(p) is equal to the number of real roots. The rank of H
1
(p) is equal to
the number of distinct complex roots of p(x).
Corollary 10. If p(x) has odd degree, there is always at least one real root.
Example 11. Consider p(x) = x
3
+ 2x
2
+ 3x + 4. The corresponding Hermite matrix is:
H(p) =
3 2 2
2 2 2
2 2 18
This matrix has one negative and two positive eigenvalues, all distinct (i.e., its inertia is (2, 0, 1)). Thus,
p(x) has three simple roots, and exactly one of them is real.
Sylvesters law of inertia guarantees that this result is actually coordinate independent.
3 Nonnegativity
An important property of a polynomial is whether it only takes nonnegative values. As we will see, this
is of interest in a wide variety of applications.
Denition 12. A univariate polynomial p(x) is positive semidenite or nonnegative if p(x) 0 for all
real values of x.
Clearly, if p(x) is nonnegative, then its degree must be an even number. The set of nonnegative
polynomials has very interesting properties. Perhaps the most appealing one for our purposes is the
following:
Theorem 13. Consider the set P
n
of nonnegative univariate polynomials of degree less than or equal
to n (n is even). Then, identifying a polynomial with its n + 1 coecients (p
n
, . . . , p
0
), the set P
n
is a
proper cone (i.e., closed, convex, pointed, solid) in R
n+1
.
An equivalent condition for the (nonconstant) univariate polynomial (1) to be strictly positive, is
that p(x
0
) > 0 for some x
0
, and it that has no real roots. Thus, we can use Theorem 7 to write explicit
conditions for a polynomial p(x) to be nonnegative in terms of the signature of the associated Hermite
matrix H
1
(p).
4 Sum of squares
Denition 14. A univariate polynomial p(x) is a sum of squares (SOS) if there exist q
1
, . . . , q
m
R[x]
such that
p(x) =
m
k=1
q
2
k
(x).
If a polynomial p(x) is a sum of squares, then it obviously satises p(x) 0 for all x R. Thus, a
SOS condition is a sucient condition for global nonnegativity.
Interestingly, in the univariate case, the converse is also true:
Theorem 15. A univariate polynomial is nonnegative if and only if it is a sum of squares.
5-5
Proof. () Obvious. If p(x) =
k
q
2
k
(x) then p(x) 0.
() Since p(x) is univariate, we can factorize it as
p(x) = p
n
j
(x r
j
)
nj
k
(x a
k
+ ib
k
)
m
k
(x a
k
ib
k
)
m
k
,
where r
j
and a
k
ib
k
are the real and complex roots, respectively, of multiplicities n
j
and m
k
. Because
p(x) is nonnegative, then p
n
> 0 and the multiplicies of the real roots are even, i.e., n
j
= 2s
j
.
Notice that (x a + ib)(x a ib) = (x a)
2
+ b
2
. Then, we can write
p(x) = p
n
j
(x r
j
)
2sj
(x a
k
)
2
+ b
2
k
m
k
,
Since products of sums of squares are sums of squares, and all the factors in the expression above are
SOS, it follows that p(x) is SOS.
Furthermore, the two-squares identity (
2
+
2
)(
2
+
2
) = ( )
2
+ ( + )
2
allows us to
combine every partial product as a sum of only two squares.
Notice that the proof shows that if p(x) is SOS, then there exists a representation p(x) = q
2
1
(x)+q
2
2
(x).
As we will see very soon, we can decide whether a univariate polynomial is a sum of squares (equiv-
alently, if it is nonnegative) by solving a semidenite optimization problem.
5 Positive semidenite matrices
Recall from Lecture 2 the (apparent) disparity between the stated conditions for a matrix to be positive
denite versus the semidenite case. In the former, we could use a test (Sylvesters criterion) that
required the calculation of only n minors, while for the semidenite case apparently we needed a much
larger number, 2
n
1.
If the matrix X is positive denite, Sylvesters criterion requires the positivity of the leading principal
minors, i.e.,
det X
1,1
> 0, det X
12,12
> 0, . . . , det X > 0.
For positive semideniteness, it is not enough to replace strict positivity with the nonstrict inequality;
a simple counterexample is the matrix
0 0
0 1
,
for which the leading minors vanish, but is not PSD. As mentioned, an alternative approach is given by
the following classical result:
Lemma 16. Let A S
n
be a symmetric matrix. Then A 0 if and only if all 2
n
1 principal minors
of A are nonnegative.
Although the condition above requires the nonnegativity of 2
n
1 expressions, it is possible to do
the same by checking only n inequalities:
Theorem 17 (e.g. [HJ95, p. 403]). A real nn symmetric matrix A is positive semidenite if and only
if all the coecients c
i
of its characteristic polynomial p() = det(IA) =
n
+p
n1
n1
+ +p
1
+p
0
alternate in sign, i.e., they satisfy p
i
(1)
ni
0.
We prove this below, since we will use a slightly more general version of this result when discussing
hyperbolic polynomials. Note that in the n = 2 case, Theorem 17 is the familiar result that A S
2
is
positive semidenite if and only if det A 0 and TrA 0.
5-6
Lemma 18. Consider a monic univariate polynomial p(t) = t
n
+
n1
k=0
p
k
t
k
, that has only real roots.
Then, all roots are nonpositive if and only if all coecients are nonnegative (i.e., p
k
0, k = 0, . . . , n1).
Proof. Since all roots of p(t) are real, this can be obtained from a direct application of Descartes rules
of signs; see e.g. [BPR03]. For completeness, we present here a direct proof.
If all roots t
i
are nonpositive (t
i
0), from the factorization
p(t) =
n
k=1
(t t
i
)
it follows directly that all coecients p
k
are nonnegative.
For the other direction, from the nonnegativity of the coecients it follows that p(0) 0 and p(t)
is nondecreasing. If there exists a t
i
> 0 such that p(t
i
) = 0, then the polynomial must vanish in the
interval [0, t
i
], which is impossible since it is monic and hence nonzero.
Denition 19. A set S R
n
is basic closed semialgebraic if it can be written as
S = {x R
n
| f
i
(x) 0, h
j
(x) = 0}
for some nite set of polynomials {f
i
, h
j
}.
Theorem 20. Both the primal and dual feasible sets of a semidenite program are basic closed semial-
gebraic.
Proof. The condition X 0 is equivalent to n nonstrict polynomial inequalities in the entries of X. This
can be conveniently shown applying Lemma 18 to the characteristic polynomial of X, i.e.,
p() = det(I + X) =
n
+
n1
k=0
p
k
(X)
k
.
where the p
k
(X) are homogeneous polynomials of degree n k in the entries of X. For instance, we
have p
0
(X) = det X, and p
n1
(X) = TrX.
Since X is symmetric, all its eigenvalues are real, and thus p() has only real roots. Positive semidef-
initeness of X is equivalent to p() having no roots that are strictly positive. It then follows than the
two following statements are equivalent:
X 0 p
k
(X) 0 k = 0, . . . , n 1.
Remark 21. These inequalities correspond to the elementary symmetric functions e
k
evaluated at the
eigenvalues of the matrix X.
As we will see in subsequent lectures, the same inequalities will reappear when we consider a class
of optimization problems known as hyperbolic programs.
References
[BPR03] S. Basu, R. Pollack, and M.-F. Roy. Algorithms in real algebraic geometry, volume 10 of
Algorithms and Computation in Mathematics. Springer-Verlag, Berlin, 2003.
[HJ95] R. A. Horn and C. R. Johnson. Matrix Analysis. Cambridge University Press, Cambridge,
1995.
5-7
MIT 6.256 - Algebraic techniques and semidenite optimization February 26, 2010
Lecture 6
Lecturer: Pablo A. Parrilo Scribe: ???
Last week we learned about explicit conditions to determine the number of real roots of a univariate
polynomial. Today we will expand on these themes, and study two mathematical objects of fundamental
importance: the resultant of two polynomials, and the closely related discriminant.
The resultant will be used to decide whether two univariate polynomials have common roots, while
the discriminant will give information about the existence of multiple roots. Furthermore, we will see the
intimate connections between discriminants and the boundary of the cone of nonnegative polynomials.
Besides the properties described above, a direct consequence of their denitions, there are many
other interesting applications of resultants and discriminant. We describe a few of them below, and we
will encounter them again in later lectures, when studying elimination theory and the construction of
cylindrical algebraic decompositions. For much more information about resultants and discriminants,
particularly their generalizations to the sparse and multipolynomial case, we refer the reader to the very
readable introductory article [Stu98] and the books [CLO97, GKZ94].
1 Resultants
Consider two polynomials p(x) and q(x), of degree n, m, respectively. We want to obtain an easily
checkable criterion to determine whether they have a common root, that is, there exists an x
0
C for
which p(x
0
) = q(x
0
) = 0. There are several approaches, seemingly dierent at rst sight, for constructing
such a criterion:
Sylvester matrix: If p(x
0
) = q(x
0
) = 0, then we can write the following (n+m) (n+m) linear
system:
p
n
p
n1
. . . p
1
p
0
p
n
.
.
.
.
.
.
.
.
.
.
.
.
p
1
p
0
p
2
p
1
p
0
q
m
q
m1
. . . q
0
q
m
.
.
.
.
.
.
.
.
.
q
1
q
0
q
2
q
1
q
0
x
n+m1
0
x
n+m2
0
.
.
.
x
n
0
x
n1
0
.
.
.
x
0
1
p(x
0
)x
m1
0
p(x
0
)x
m2
0
.
.
.
p(x
0
)x
0
p(x
0
)
q(x
0
)x
n1
0
q(x
0
)x
n2
0
.
.
.
q(x
0
)x
0
q(x
0
)
= 0.
This implies that the matrix on the left-hand side, called the Sylvester matrix Syl
x
(p, q) associated
to p and q, is singular and thus its determinant must vanish. It is not too dicult to show that
the converse is also true; if det Syl
x
(p, q) = 0, then there exists a vector in the kernel of Syl
x
(p, q)
of the form shown in the matrix equation above, and thus a common root x
0
.
Root products and companion matrices: Let
j
,
k
be the roots of p(x) and q(x), respectively.
By construction, the expression
n
j=1
m
k=1
(
j
k
)
vanishes if and only if there exists a root of p that is equal to a root of q. Although the computation
of this product seems to require explicit access to the roots, this can be avoided. Multiplying by
6-1
a convenient normalization factor, we have:
p
m
n
q
n
m
n
j=1
m
k=1
(
j
k
) = p
m
n
n
j=1
q(
j
) = p
m
n
det q(C
p
)
= (1)
nm
q
n
m
m
k=1
p(
k
) = (1)
nm
q
n
m
det p(C
q
)
(1)
Kronecker products: Using a well-known connection to Kronecker products, we can also write
(1) as
p
m
n
q
n
m
det(C
p
I
m
I
n
C
q
).
Bezout matrix: Given p(x) and q(x) as before, consider the bivariate function
B(s, t) :=
p(s)q(t) p(t)q(s)
s t
.
It is easy to see that this is actually a polynomial in the variables s, t, and is invariant under
the interchange s t. Let d := max(n, m), and Bez
x
(p, q) be the symmetric d d matrix that
represents this polynomial in the standard monomial basis, i.e.,
B(s, t) =
1
s
.
.
.
s
d1
T
Bez
x
(p, q)
1
t
.
.
.
t
d1
.
The Bezout matrix is singular if and only p and q have a common root.
Notice the dierences with the Sylvester matrix: while that approach requires a non-symmetric
(n+m) (n+m) matrix depending linearly on the coecients, in the Bezout approach the matrix
is smaller and symmetric, but with entries that depend bilinearly on the p
i
, q
i
.
If can be shown that all these constructions are equivalent. They dene exactly the same polynomial,
called the resultant of p and q, denoted as Res
x
(p, q):
Res
x
(p, q) = det Syl
x
(p, q)
= p
m
n
det q(C
p
)
= (1)
nm
q
n
m
det p(C
q
)
= p
m
n
q
n
m
det(C
p
I
m
I
n
C
q
)
=
(1)
(
n
2
)
p
nm
n
det Bez
x
(p, q).
The resultant is a homogeneous multivariate polynomial, with integer coecients, and of degree n + m
in the n + m + 2 variables p
j
, q
k
. It vanishes if and only if the polynomials p and q have a common
root. Notice that the denition is not symmetric in its two arguments, Res
x
(p, q) = (1)
nm
Res(q, p) (of
course, this does not matter in checking whether it is zero).
Remark 1. To compute the resultant of two polynomials p(x) and q(x) in Maple, you can use the
command resultant(p,q,x). In Mathematica, use instead Resultant[p,q,x].
6-2
2 Discriminants
As we have seen, the resultant allows us to write an easily checkable condition for the simultaneous
vanishing of two univariate polynomials. Can we use the resultant to produce a condition for a polynomial
to have a double root? Recall that if a polynomial p(x) has a double root at x
0
(which can be real or
complex), then its derivative p
p(x),
dp(x)
dx
.
Similar to what we did in the resultant case, the discriminant can also be obtained by writing a
natural condition in terms of the roots
i
of p(x):
Dis
x
(p) = p
2n2
n
j<k
(
j
k
)
2
.
If p(x) has degree n, its discriminant is a homogeneous polynomial of degree 2n2 in its n+1 coecients
p
n
, . . . , p
0
.
Example 3. Consider the quadratic univariate polynomial p(x) = ax
2
+ bx + c. Its discriminant is:
Dis
x
(p) =
1
a
Res
x
(ax
2
+ bx + c, 2ax + b) = b
2
4ac.
For the cubic polynomial p(x) = ax
3
+ bx
2
+ cx + d we have
Dis
x
(p) = 27a
2
d
2
+ 18adcb + b
2
c
2
4b
3
d 4ac
3
.
3 Applications
3.1 Polynomial equations
One of the most natural applications of resultants is in the solution of polynomial equations in two
variables. For this, consider a polynomial system
p(x, y) = 0, q(x, y) = 0, (2)
with only a nite number of solutions (which is generically the case). Consider a xed value of y
0
, and
the two univariate polynomials p(x, y
0
), q(x, y
0
). If y
0
corresponds to the y-component of a root, then
these two univariate polynomials clearly have a common root, hence their resultant vanishes.
Therefore, to solve (2), we can compute Res
x
(p, q), which is a univariate polynomial in y. Solving
this univariate polynomial, we obtain a nite number of points y
i
. Backsubstituting in p (or q), we
obtain the corresponding values of x
i
. Naturally, the same construction can be used by computing rst
the univariate polynomial in x given by Res
y
(p, q).
Example 4. Let p(x, y) = 2xy +3y
3
2x
3
x3x
2
y
2
, and q(x, y) = 2x
2
y
2
4y
3
x
2
+4y +x
2
y. The
resultant (with respect to the x variable) is
Res
x
(p, q) = y(y + 1)
3
(72y
8
252y
7
+ 270y
6
145y
5
+ 192y
4
160y
3
+ 28y + 4).
One particular root of this polynomial is y
1.3853.
6-3
3.2 Implicitization of plane rational curves
Consider a plane curve parametrized by rational functions, i.e.,
x(t) =
p
1
(t)
q
1
(t)
, y(t) =
p
2
(t)
q
2
(t)
.
What is the implicit equation of the curve, i.e., what constraint h(x, y) = 0 must the points (x, y) R
2
that lie on the curve satisfy? The corresponding equation can be easily obtained by computing a resultant
to eliminate the parametrizing variable t, i.e.,
h(x, y) = Res
t
(q
1
(t) x p
1
(t), q
2
(t) y p
2
(t)).
Example 5. Consider the curve described by the parametrization.
x(t) =
t(1 + t
2
)
1 + t
4
, y(t) =
t(1 t
2
)
1 + t
4
. (3)
Its implicit equation can be computed by the resultant:
Res
t
((1 + t
4
)x t(1 + t
2
), (1 + t
4
)y t(1 t
2
)) = 4y
4
+ 8y
2
x
2
+ 4x
4
+ 4y
2
4x
2
.
Remark 6. The inverse problem (given an implicit polynomial equation for a curve, nd a rational
parametrization) is not always solvable. In fact, there is a full characterization of when this is possible,
in terms of a topological invariant of the curve called the genus (the rationally parametrizable curves are
exactly those of genus zero).
3.3 Eigenvalue distribution of random matrices
This section is based on the results in [RE08]. The eigenvalues of a random symmetric matrix belonging
to a given ensemble can be characterized in terms of the asymptotic eigenvalue distribution F(x) (e.g., the
semi-circle law, Marcenko-Pastur, etc). Often, rather than the actual distribution, it is more convenient
to use instead some other equivalent object, such as its moment generating function, Stieltjes transform,
R-transform, etc. For many ensembles of interest, these auxiliary transforms
F(z) are algebraic functions,
in the sense that they satisfy an equation of the form (
F(z), z) = 0, where (s, t) is a bivariate
polynomial, and furthermore they can all be derived from each other. As a consequence, to each given
random ensemble of this class we can associate a bivariate polynomial that uniquely describes the limiting
eigenvalue distribution.
A natural question arises: given two matrices M
1
, M
2
, belonging to random ensembles with associated
polynomials
1
(s, t) and
2
(s, t), what can be said about the eigenvalue distribution of the sum M
1
+M
2
(or the product M
1
M
2
)? Voiculescu has shown that under a certain natural independence condition
(freeness), the R-transform of the sum is the sum of the individual transforms (this is somewhat akin
to the well-known fact that the pdf of the sum of independent random variables is the convolution of the
individual pdfs, or the additivity of the moment generating function). Under the freeness condition, the
bivariate polynomial associated with the ensemble M
3
= M
1
+M
2
can be computed from the individual
polynomials
1
,
2
via:
3
(s, t) = Res
u
(
1
(s u, t),
2
(u, t)).
Similar expressions are also possible for the product M
1
M
2
, also in terms of resultants. This allows the
computation of the spectra of arbitrary random ensembles, that can be built from individual building
blocks with known eigenvalue distributions.
We cannot provide a full description here of this area, and the very interesting connections with free
probability. We refer the reader to [RE08] for a more complete account.
6-4
-2 -1.5 -1 -0.5 0.5 1 1.5 2
a
-1
-0.5
0.5
1
1.5
2
b
0 0 4
2
-2 -1.5 -1 -0.5 0.5 1 1.5 2
a
-1
-0.5
0.5
1
1.5
2
b
Figure 1: The shaded region corresponds to the polynomial x
4
+ 2ax
2
+ b being nonnegative. The
numbers indicate how many real roots p(x) has.
4 The set of nonnegative polynomials
One of the main reasons why nonnegativity conditions about polynomials are dicult is because these
sets can have a quite complicated structure, even though they are always convex.
Recall from last lecture that we have dened P
n
R
n+1
as the set of nonnegative polynomials of
degree n. It is easy to see that if p(x) lies on the boundary of the set P
n
, then it must have a real
root, of multiplicity at least two. Indeed, if there is no real root, then p(x) is in the strict interior of P
(small enough perturbations will not create a root), and if it has a simple real root it clearly cannot be
nonnegative.
Thus, on the boundary of P
n
, the discriminant of p(x) must necessarily vanish. However, it turns
out that Dis
x
(p) does not vanish only on the boundary, but it also vanishes at points inside the set.
Why is this?
Example 7. Consider the univariate polynomial p(x) = x
4
+ 2ax
2
+ b. For what values of a, b does it
hold that p(x) 0 x R? Since the leading term x
4
has even degree and is strictly positive, p(x) is
strictly positive if and only if it has no real roots. The discriminant of p(x) is equal to 256 b (a
2
b)
2
.
The set of (a, b) for which p(x) is nonnegative is shown in Figure 1.
Here is a slightly dierent example, showing the same phenomenon.
Example 8. As another example, consider now p(x) = x
4
+4ax
3
+6bx
2
+4ax+1. Its discriminant, in
factored form, is equal to 256(1 +3b +4a)(1 +3b 4a)(1 +2a
2
3b)
2
. The corresponding nonnegativity
region and number of real roots are presented in Figure 2.
As we can see, this creates some diculties. For instance, even though we have a perfectly valid
analytic expression for the boundary of the set, we cannot get a good sense of how far we are from
the boundary by looking at the absolute value of the discriminant.
From the mathematical viewpoint, there are a couple of (unrelated?) reasons with these sets cannot
be directly handled by standard optimization, at least if we want to keep the polynomial structure.
One has to do with its algebraic structure, and the other one with convexity, and in particular its facial
structure.
Lemma 9 (e.g., [And03]). The set described in Figure 1 is not basic closed semialgebraic.
Remark 10. Notice that the convex sets described in Figures 1 and 2 both have an uncommon feature.
They both have proper faces that are not exposed, i.e., they cannot be isolated by a supporting hyper-
6-5
-3 -2 -1 1 2 3
a
-1
1
2
3
b
0
2 2
0
4
4 4
-3 -2 -1 1 2 3
a
-1
1
2
3
b
Figure 2: Region of nonnegativity of the polynomial x
4
+ 4ax
3
+ 6bx
2
+ 4ax + 1, and number of real
roots.
2
0
2
4
6
0
2
4
0
1
2
3
4
5
a
b
t
Figure 3: A three-dimensional convex set, described by one quadratic and one linear inequality, whose
projection on the (a, b) plane is equal to the set in Figure 1.
plane
1
. Indeed, in Figure 1 the origin (0, 0) is a non-exposed zero-dimensional face, while in Figure 2
the point (1, 1) has the same property. A non-exposed face is a known obstruction for a convex set to be
the feasible set of a semidenite program, see [RG95].
Even though these sets have these complicating features, it turns out that we can often provide some
good representations. These are normally given as a projection from higher dimensional spaces, where
the object upstairs is much more smooth and well-behaved. For instance, as a graphical illustration,
in Figure 3 we can see the three-dimensional convex set {(a, b, t) R
3
: b (a t)
2
, t 0}, whose
projection on the plane (a, b) is exactly the set discussed in Example 7 and Figure 1.
The presence of extraneous components of the discriminant inside the feasible set is an important
roadblock for the availability of easily computable barrier functions. Indeed, every polynomial that
vanishes on the boundary of the set P
n
must necessarily have the discriminant as a factor. This is an
striking dierence with the case of the case of the nonnegative orthant or the PSD cone, where the
standard barriers are given (up to a logarithm) by products of the linear constraints or a determinant
(which are polynomials). The way out of this problem is to produce non-polynomial barrier functions,
either by partial minimization from a higher-dimensional barrier (i.e., projection) or other constructions
such as the universal barrier function introduced by Nesterov and Nemirovski [NN94].
1
A face of a convex set S is a convex subset F S, with the property that x, y S,
1
2
(x + y) F x, y F. A face
F is exposed if it can be written as F = S H, where H is a supporting hyperplane of S.
6-6
Figure 4: The discriminant of the polynomial x
4
+ 4ax
3
+ 6bx
2
+ 4cx + 1. The convex set inside the
bowl corresponds to the region of nonnegativity. There is an additional one-dimensional component
inside the set.
In this direction, there have been several research eorts that aim at directly characterizing barrier
functions for the set of nonnegative polynomials (or related modications). Among them, we mention
the work of Kao and Megretski [KM02] and Faybusovich [Fay02], both of which produce barriers that
rely on the computation of one or more integral expressions. Given the fact that these integrals must
be computed numerically, there is no clear consensus yet on how useful this approach is in practical
optimization problems.
References
[And03] C. Andradas. Characterization and description of basic semialgebraic sets. In Algorithmic
and quantitative real algebraic geometry (Piscataway, NJ, 2001), volume 60 of DIMACS Ser.
Discrete Math. Theoret. Comput. Sci., pages 112. Amer. Math. Soc., Providence, RI, 2003.
[CLO97] D. A. Cox, J. B. Little, and D. OShea. Ideals, varieties, and algorithms: an introduction to
computational algebraic geometry and commutative algebra. Springer, 1997.
[Fay02] L. Faybusovich. Self-concordant barriers for cones generated by Chebyshev systems. SIAM J.
Optim., 12(3):770781, 2002.
[GKZ94] I. M. Gelfand, M. Kapranov, and A. Zelevinsky. Discriminants, Resultants, and Multidimen-
sional Determinants. Birkh auser, 1994.
[KM02] C. Y. Kao and A. Megretski. A new barrier function for IQC optimization problems. In
Proceedings of the American Control Conference, 2002.
6-7
[NN94] Y. E. Nesterov and A. Nemirovski. Interior point polynomial methods in convex programming,
volume 13 of Studies in Applied Mathematics. SIAM, Philadelphia, PA, 1994.
[RE08] N. Raj Rao and Alan Edelman. The polynomial method for random matrices. Found. Comput.
Math., 8(6):649702, 2008.
[RG95] M. Ramana and A. J. Goldman. Some geometric results in semidenite programming. J.
Global Optim., 7(1):3350, 1995.
[Stu98] B. Sturmfels. Introduction to resultants. In Applications of computational algebraic geometry
(San Diego, CA, 1997), volume 53 of Proc. Sympos. Appl. Math., pages 2539. Amer. Math.
Soc., Providence, RI, 1998.
6-8
MIT 6.256 - Algebraic techniques and semidenite optimization March 5, 2010
Lecture 7
Lecturer: Pablo A. Parrilo Scribe: ???
In this lecture we introduce a special class of multivariate polynomials, called hyperbolic. These
polynomials were originally studied in the context of partial dierential equations. As we will see, they
have many surprising properties, and are intimately linked with convex optimization problems that have
an algebraic structure. A few good references about the use of hyperbolic polynomials in optimization
are [G ul97, BGLS01, Ren06].
1 Hyperbolic polynomials
Consider a homogeneous multivariate polynomial p R[x
1
, . . . , x
n
] of degree d. Here homogeneous of
degree d means that the sum of degrees of each monomial is constant and equal to d, i.e.,
p(x) =
||=d
c
,
where = (
1
, . . . ,
n
),
i
N {0}, and || =
1
+ +
n
. A homogeneous polynomial satises
p(tw) = t
d
p(w) for all real t and vectors w R
n
. We denote the set of such polynomials by H
n
(d).
By identifying a polynomial with its vector of coecients, we can consider H
n
(d) as a vector space of
dimension
n+d1
d
.
Denition 1. Let e be a xed vector in R
n
. A polynomial p H
n
(d) is hyperbolic with respect to e if
p(e) > 0 and, for all vectors x R
n
, the univariate polynomial t p(x te) has only real roots.
A natural geometric interpretation is the following: consider the hypersurface in R
n
given by p(x) = 0.
Then, hyperbolicity is equivalent to the condition that every line in R
n
parallel to the direction e
intersects this hypersurface at exactly d points (counting multiplicities), where d is the degree of the
polynomial.
Example 2. The polynomial x
1
x
2
x
n
is hyperbolic with respect to the vector e = (1, 1, . . . , 1), since
the univariate polynomial t (x
1
t)(x
2
t) (x
n
t) has roots x
1
, x
2
, . . . , x
n
.
Hyperbolic polynomials enjoy a very surprising property, that connects in an unexpected way algebra
with convex analysis. Given a hyperbolic polynomial p(x), consider the set dened as:
++
:= {x R
n
: p(x te) = 0 t > 0}.
Geometrically, this condition says that if we start at the point x R
n
, and slide along a line in the
direction parallel to e, then we will never encounter the hypersurface p(x) = 0, while if we move in the
opposite direction, we will cross it exactly d times. Figure 1 illustrates a particular hyperbolicity cone.
It is immediate from homogeneity and the denition above that > 0, x
++
x
++
.
Thus, we call
++
the hyperbolicity cone associated to p, and denote its closure by
+
. As we will see
shortly, it turns out that these cones are actually convex cones. We prove this following the arguments
in Renegar [Ren06]; the original results are due to Garding [Gar59].
Lemma 3. The hyperbolicity cone
++
is the connected component of p(x) > 0 that includes e.
Example 4. The hyperbolicity cone
++
associated with the polynomial x
1
x
2
x
n
discussed in Exam-
ple 2 is the open positive orthant {x R
n
| x
i
> 0}.
The rst step is to show that we can replace e with any vector in the hyperbolicity cone.
Lemma 5. If p(x) is hyperbolic with respect to e, then it is also hyperbolic with respect to every direction
v
++
. Furthermore, the hyperbolicity cones are the same.
7-1
Figure 1: Hyperbolicity cone corresponding to the polynomial p(x, y, z) = 4xyz + xz
2
+ yz
2
+ 2z
3
x
3
3zx
2
y
3
3zy
2
. This polynomial is hyperbolic with respect to (0, 0, 1).
Proof. By Lemma 3 we have p(v) > 0. We need to show that for every x R
n
, the polynomial
p(v + x) has only real roots if v
++
.
Let > 0 be xed, and consider the polynomial p(ie+v +x), where i is the imaginary unit.
We claim that if 0, this polynomial has only roots in the lower half-plane. Lets look at the = 0
case rst. It is clear that p(ie +v) cannot have a root at = 0, since p(ie) = (i)
d
p(e) = 0. If
= 0, we can write
p(ie +v) = 0 p(
1
ie + v) = 0
1
i < 0 iR
,
and thus the roots of this polynomial are on the strict negative imaginary axis (we have used v
++
in
the second implication). If by increasing there is ever a root in the upper half-plane, then there must
exist a
v +
x) = 0. However,
this contradicts hyperbolicity, since
v+
x R
n
. Thus, for all 0, the roots of p(ie+v+x)
are in the lower half-plane.
The conclusion above was true for any > 0. Letting 0, by continuity of the roots we have that
the polynomial p(v +x) must also have its roots in the lower closed half-plane. However, since it
is a polynomial with real coecients (and therefore its roots always appear in complex-conjugate pairs),
then all the roots must actually be real. Taking now = 1, we have that p(v + x) has real roots
for all x, or equivalently, p is hyperbolic in the direction v.
The following result shows that this set is actually convex:
Theorem 6 ([Gar59]). The hyperbolicity cone
++
is convex.
Proof. We want to show that u, v
++
, , > 0 implies that u + v
++
. The previous result
implies that it is enough to show hyperbolicity of p with respect to v (instead of e), i.e., to analyze the
polynomial t p(x tv). Notice that the roots of t p(u + v tv) are just a nonnegative ane
scaling of the roots of t p(u tv), since
p(u t
v) = 0 p(u +v (t
+)v) = 0,
7-2
and u
++
this implies that t
> 0, hence t
+ > 0. As a consequence, u +v
++
.
Hyperbolic polynomials are of interest in convex optimization, because they unify in a quite appealing
way many facts about the most important tractable classes: linear, second order, and semidenite
programming.
Example 7 (SOCP). Let p(x) = x
2
n+1
n
k=1
x
2
k
. This is a homogeneous quadratic polynomial, hyper-
bolic in the direction e = (0, . . . , 0, 1), since
p(x te) = (x
n+1
t)
2
k=1
x
2
k
= t
2
2tx
n+1
+
x
2
n+1
k=1
x
2
k
,
and the discriminant of this quadratic equation is equal to
4x
2
n+1
4
x
2
n+1
k=1
x
2
k
= 4
n
k=1
x
2
k
,
which is always nonnegative, so the polynomial t p(x te) has only real roots. The corresponding
hyperbolicity cone is the Lorentz or second order cone given by
+
=
x R
n+1
| x
n+1
0,
n
k=1
x
2
k
x
2
n+1
.
Example 8 (SDP). Consider the homogeneous polynomial
p(x) = det(x
1
A
1
+ + x
n
A
n
),
where A
i
S
d
are given symmetric matrices, with A
1
0. The polynomial p(x) is homogeneous of
degree d. Letting e = (1, 0, . . . , 0), we have
p(x te) = det
k=1
x
k
A
k
tA
1
= det A
1
det
k=1
x
k
A
1
2
1
A
k
A
1
2
1
tI
,
and as a consequence the roots of p(xte) are always real since they are the eigenvalues of a symmetric
matrix. Thus, p(x) is hyperbolic with respect to e. The corresponding hyperbolicity cone is
++
= {x R
n
| x
1
A
1
+ + x
n
A
n
0}.
Thus, by Lemma 5, p(x) is hyperbolic with respect to every x
++
.
Based on the results discussed earlier regarding the number of real roots of a univariate polynomial,
we have the following lemma.
Lemma 9. The polynomial p(x) is hyperbolic with respect to e if and only if the Hermite matrix H
1
(p)
S
n
[x] is positive semidenite for all x R
n
.
As we will see later in the course, this observation will allow us to give an exact characterization in
terms of semidenite programming of the hyperbolicity of trivariate polynomials [Par].
Lemma 10. The hyperbolicity cone
+
is basic closed semialgebraic, i.e., it can be described by unquan-
tied polynomial inequalities.
The two following results are of importance in optimization and the formulation of interior-point
methods.
7-3
Theorem 11 ([Ren06]). A hyperbolic cone
+
is facially exposed.
Theorem 12 ([G ul97]). The function log p(x) is a logarithmically homogeneous self-concordant bar-
rier
1
for the hyperbolicity cone
++
, with barrier parameter equal to d.
One of the main open issues regarding hyperbolic cones is about their generality. As Example 8 shows,
the cone associated with a semidenite program is a hyperbolic cone. An open question (known as the
generalized Lax conjecture) is whether the converse holds, more specically, whether every hyperbolic
cone is a slice of the semidenite cone, i.e., it can be represented as the intersection of an ane
subspace and S
n
+
. As we will see in the next lecture, a special case of the conjecture has been settled
recently.
2 SDP representability
Recall that in the previous lecture, we encountered a class of convex sets in R
2
that lacked certain
desirable properties (namely, being basic semialgebraic, and facially exposed). As we will see, hyperbolic
polynomials will play a fundamental role in the characterization of the properties a set in R
2
must satisfy
for it to be the feasible set of a semidenite program.
References
[BGLS01] H. H. Bauschke, O. G uler, A. S. Lewis, and H. S. Sendov. Hyperbolic polynomials and convex
analysis. Canad. J. Math., 53(3):470488, 2001.
[BV04] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004.
[Gar59] L. Garding. An inequality for hyperbolic polynomials. J. Math. Mech., 8:957965, 1959.
[G ul97] O. G uler. Hyperbolic polynomials and interior point methods for convex programming. Math.
Oper. Res., 22(2):350377, 1997.
[NN94] Y. E. Nesterov and A. Nemirovski. Interior point polynomial methods in convex programming,
volume 13 of Studies in Applied Mathematics. SIAM, Philadelphia, PA, 1994.
[Par] P. A. Parrilo. Hyperbolic polynomials and SOS matrices. Manuscript in preparation, 2007.
[Ren06] J. Renegar. Hyperbolic programs, and their derivative relaxations. Found. Comput. Math.,
6(1):5979, 2006.
1
A function f : R R is self-concordant if it satises f
(x) |
1
2
f
(x)|
2
3 . A function f : R
n
R is self-concordant if
the univariate function obtained when restricting to any line is self-concordant. Self-concordance implies strict convexity,
and is a crucial property in the analysis of the polynomial-time global convergence of Newtons method; see [NN94] or
[BV04, Section 9.6] for more details.
7-4
MIT 6.256 - Algebraic techniques and semidenite optimization March 10, 2010
Lecture 8
Lecturer: Pablo A. Parrilo Scribe: ???
1 SDP representability
A few lectures ago, when discussing the set of nonnegative polynomials, we encountered convex sets in
R
2
that lacked certain desirable properties (namely, being basic semialgebraic, and facially exposed). As
we will see, hyperbolic polynomials will play a fundamental role in the characterization of the properties
a set in R
2
must satisfy for it to be the feasible set of a semidenite program.
2 Convex sets in R
2
In this lecture we will study conditions that a set S R
2
must satisfy for it to be semidenite repre-
sentable, i.e., to admit a characterization of the type
{(x, y) R
2
| I + xB + yC 0}, (1)
where B, C S
d
. Notice that we have assumed (without loss of generality) that 0 int S, and normalized
the rst matrix in the matrix pencil to be an identity matrix (this can always be achieved by left- and
right-multiplying by an appropriate factor).
Remark 1. We should not confuse the notion of semidenite representability described above, with the
much more general lifted SDP representability, that allows the representation of the original set as a
projection of a higher-dimensional SDP set. In other words, here we are not allowed to use additional
variables.
Clearly, from (1), we have the following necessary conditions for SDP representability:
Closed: Every set of the form (1) is closed, in the standard topology.
Convex: Every set of the form (1) is necessarily convex, since it is (the projection of) the inter-
section of an ane subspace and the convex set of PSD matrices. Of course, this is also easy to
prove directly.
Basic semialgebraic: As we have discussed, the boundary of the set (1) is dened by d unquan-
tied polynomial inequalities of degree at most equal to d. In fact, the interior of this set exactly
corresponds to the connected component of det(I + xB + yC) > 0 that contains the origin.
There is a less obvious additional condition, which we have also seen already:
Exposed faces: Every convex set of the form (1) has proper faces that are exposed. In other
words, every face F must have a representation as F = S H, where H is a supporting hyperplane
of the convex set S.
A natural question, then, is the following: are the conditions listed above sucient for SDP repre-
sentability? If a set S R
2
satises these four conditions, do there always exist matrices B, C, for which
the set (1) is exactly equal to S? To ask a concrete question: does the set in Figure 1 admit an SDP
representation? Before settling this issue, let us discuss rst an apparently dierent question, involving
hyperbolic polynomials.
8-1
-2 -1.5 -1 -0.5 0.5 1 1.5 2
x
-1.5
-1
-0.5
0.5
1
1.5
y
-2 -1.5 -1 -0.5 0.5 1 1.5 2
x
-1.5
-1
-0.5
0.5
1
1.5
y
Figure 1: Convex set dened by x
4
+ y
4
1.
3 Hyperbolicity and the Lax conjecture
Recall from the previous lecture that a hyperbolic polynomial is a homogeneous polynomial p(x) of
degree d, with the property that when restricted to lines parallel to a particular direction e, the resulting
univariate polynomial has all its d roots real.
Furthermore, we have also seen that every polynomial of the form
p(x) = det(x
1
A
1
+ + x
n
A
n
), (2)
where A
i
S
d
and A
1
0, is hyperbolic with respect to the (1, 0, . . . , 0) direction.
A 1958 conjecture by Peter Lax [Lax58], asks whether the converse is true in the case n = 3 (i.e.,
trivariate polynomials). In other words, is it true that for every hyperbolic polynomial p(x) in three
variables of degree d, there exist three symmetric matrices {A
1
, A
2
, A
3
} S
d
for which (2) holds?
As a rst step towards answering this question, let us verify that this at least makes sense in terms
of dimension counting. As we have seen, the dimension of the set of hyperbolic polynomials in three
variables (n = 3) and degree d is equal to
n+d1
d
=
d+2
2
d+1
2
, which is
exactly equal to
d+2
2
. Of course, this by itself does not prove the result, but it shows that it is certainly
possible.
4 Relating SDP-representable sets and hyperbolic polynomials
As we will see shortly, these two apparently dierent problems are in fact one and the same. Before
showing this, let us consider one additional necessary condition for a set in R
2
to be SDP-representable.
For later reference, we rst dene the following notion:
Denition 2. A polynomial p R[x] is a real zero polynomial if for every x R
n
, p(tx) = 0 implies
that t is real.
Recall that the boundary of a set described by (1) is determined by the zero set of the polynomial
det(I + xB + yC). Consider now any line passing through the origin, i.e., of the form (x, y) = (t, t).
We have then
det[I + (B + C)t] = 0,
and this univariate polynomial in t has exactly d real roots (namely, the negative inverse of the eigenvalues
of B + C). In terms of the notation just introduced, the polynomial dened by det(I + xB + yC) is
8-2
a real zero polynomial. Equivalently, for every set of the form (1), is it always the case that every line
through the origin intersects (the Zariski closure
1
of) the boundary of the set exactly d times.
In the preceding, our starting point was directly a determinantal representation as in (1). It can be
shown (see [HV07]) that if we start directly from a given set that admits an SDP representation, we can
precisely characterize a unique minimal polynomial that denes the boundary of the set.
Hence, this gives us an additional necessary condition ([HV07]) for SDP representability:
Rigid convexity: Consider a set in R
2
, with the origin in the interior. Every line that passes
through the origin must intersect the polynomial dening the boundary exactly d times (counting
multiplicities, and points at innity), where d is the degree of the boundary polynomial.
This additional requirement is quite strong, and immediately allows us to discard sets for which the
previous conditions were satised.
Example 3. Consider the set described by x
4
+ y
4
1; see Figure 1. It clearly satises the rst four
necessary conditions. However, it we consider any line through the origin, it will intersect the dening
polynomial only two times, instead of the four required by the rigid convexity condition. Thus, this set
is not rigidly convex, and hence does not admit a (non-lifted) semidenite representation.
5 Characterization
It should be apparent that the rigid convexity condition looks very similar to the hyperbolicity property
of a polynomial. In fact, they are exactly the same condition, provided we redene things accordingly
[LPR05]. As we will see, this equivalence will make explicit the connection between the Helton &
Vinnikov characterization of SDP-representable sets and the Lax conjecture described earlier.
Theorem 4 ([LPR05]). If p R[x, y, z] is a polynomial of degree d, hyperbolic with respect to e = (0, 0, 1)
and that satises p(e) = 1, then the polynomial in R[x, y] dened by q(x, y) = p(x, y, 1) is a real zero
polynomial of degree no more than d, and satisfying q(0, 0) = 1.
Conversely, if q R[x, y] is a real zero polynomial of degree d satisfying q(0, 0) = 1, then the
polynomial dened by
p(x, y, z) = z
d
q
x
z
,
y
z
x + 1 0 y
0 2 x 1
y x 1 2
0, (3)
with the corresponding determinantal representation of the hyperbolic polynomial being:
p(x, y, z) = det
x + z 0 y
0 2z x z
y x z 2z
. (4)
References
[HV07] J. W. Helton and V. Vinnikov. Linear matrix inequality representation of sets. Comm. Pure
Appl. Math., 60(5):654674, 2007.
8-4
Figure 3: The polynomial 3z
3
+ xz
2
x
3
3x
2
z 2y
2
z = 0 and corresponding hyperbolicity cone.
[Lax58] P. D. Lax. Dierential equations, dierence equations and matrix theory. Comm. Pure Appl.
Math., 11:175194, 1958.
[LPR05] A. S. Lewis, P. A. Parrilo, and M. V. Ramana. The Lax conjecture is true. Proc. Amer. Math.
Soc., 133(9):24952499, 2005.
8-5
MIT 6.256 - Algebraic techniques and semidenite optimization March 12, 2010
Lecture 9
Lecturer: Pablo A. Parrilo Scribe: ???
In this lecture, we study rst a relatively simple type of polynomial equations, namely binomial
equations. As we will see, in this case there exists a quite ecient solution method. We dene next an
important geometric and combinatorial object associated with every multivariate polynomial, called the
Newton polytope. Finally, we put together these two notions in the formulation of a family of bounds
on the number of solutions of systems of polynomial equations. Our presentation of the material here is
inspired by [Stu02, Chapter 3] and [CLO98].
1 Binomial equations
We introduce in this section a particular kind of polynomial equations, that have nice computational
properties. A binomial system of polynomial equations is one where each equation has only two terms.
We also assume that the system has only a nite number of complex solutions, i.e., the solution set is a
nite set of points in C
n
. We are interested in determining the exact number of solutions, and in ecient
computational procedures for solving the system.
Lets start with an example. Consider the binomial system given by
8x
2
y
3
1 = 0
2x
3
y
2
yx = 0.
(1)
If we assume that the solutions satisfy x = 0, y = 0, then we can put these equations in the more
symmetric form
8x
2
y
3
= 1
2x
2
y = 1.
(2)
Now, by dividing the rst equation by the second one, we obtain 4y
2
= 1, which has two solutions (y =
1
2
and y =
1
2
). Substituting into the resulting equations for every value of y we have two corresponding
values of x, so the system has a big total of four complex solutions.
Lets try to understand in a big more detail what manipulations we where performing here. For this,
lets dene the integer matrix
B =
2 3
2 1
corresponding to the exponents in (2). Notice that when we divided the two equations, that is equivalent
to an elementary row operation in the matrix B, namely subtracting the second row of B from the rst
one. Thus, the operations we have done can be understood as the matrix multiplication UB = C, where
U =
0 1
1 1
, C =
2 1
0 2
.
The fact that the matrix C is lower triangular, is what allows us to start solving the system for y, and
then backsolving for the other variable.
It is not too dicult to understand from this example how to generalize this. Let C
= C \ {0},
and consider a system of binomial equations in n variables, where we are interested in computing (or
bounding the number of) solutions in (C
)
n
. We can always put the system in the normalized form in
(2). Notice that, in general, the entries of the integer B could be either positive or negative (i.e., we
write polynomials in x
i
and x
1
i
, which is ne since x
i
= 0).
Then, a well-known result in integer linear algebra (the Hermite normal form of an integer matrix)
guarantees the existence of a unimodular matrix U SL
n
(Z) (an integer matrix, with determinant
9-1
0 1 2 3 4 5
0
1
2
3
x degree
y
d
e
g
r
e
e
Figure 1: Newton polytope of the polynomial p(x, y) = 5 xy x
2
y
2
+ 3y
2
+ x
4
.
equal to one), such that C = UB is a lower triangular matrix. We can then use this expression to obtain
values for the last variable, and backsolve to obtain all solutions.
How can we determine the number of solutions from this factorization? When backsubstituting
using C, at each step we have to solve an equation of the type x
cii
i
= d
i
, and thus the current number
of possible solutions is multiplied by |c
ii
|. Therefore, the total number of solutions in (C
)
n
will then be
equal to | det(C)| = | det(U) det(B)| = | det(B)|.
Remark 1. To compute the Hermite normal form of an integer matrix in Maple, you can use the
command ihermite. In Mathematica, use instead HermiteNormalForm.
2 Newton polytopes
Many of the polynomial systems that appear in practice are far from being generic, but rather present a
number of structural features that, when properly exploited, allow for much more ecient computational
techniques. This is quite similar to the situation in numerical linear algebra, where there is a big dierence
in performance between algorithms that take into account the sparsity structure of a matrix and those
that do not. For matrices, the standard notion of sparsity is relatively straightforward, and relates
mostly to the number of nonzero coecients. In computational algebra, however, there exists a much
more rened notion of sparsity that refers not only to the number of zero coecients of a polynomial,
but also to the underlying combinatorial structure.
This notion of sparsity for multivariate polynomials is usually presented in terms of the Newton
polytope of a polynomial, dened below.
Denition 2. Consider a multivariate polynomial p(x
1
, . . . , x
n
) =
2
n
.
Although not obvious from its denition, the mixed volume is always a nonnegative number. It
further satises a number of very interesting properties, such as the Alexandrov-Fenchel inequality.
Although computing the mixed volume is dicult in general, in certain cases it can be approximated
via convex optimization methods with strong relationships to hyperbolic polynomials [Gur].
One of the main result in this area, with dierent versions due to Bernstein, Kouchnirenko, and
Khovanskii, relates the number of solutions of a sparse polynomial system with the mixed volume of the
Newton polytopes of the individual equations. Formally, we have
Theorem 7 (BKK bound). The number of solutions in (C
)
n
of a sparse polynomial system of n
equations and n unknowns is less than or equal to the mixed volume of the n Newton polytopes. If the
coecients are generic enough, then the upper bound is achieved.
The basic idea behind the derivation of the theorem is to introduce an additional parameter t in
the equations, in such a way that for t = 1 we have the original system, while for t = 0 the system is
binomial, which as we have seen can be solved in an ecient manner. This process is usually called a
toric deformation, and is somewhat similar in spirit to the homotopies used in interior point methods.
To make our words a bit more precise, an important fact is that we will not deform to just one binomial
system, but actually to a collection of them, given by what is called a mixed subdivision of the sum
of Newton polytopes. The important fact is that the sum of the number of roots of all these binomial
systems is exactly equal to the mixed volume of the collection of polytopes.
9-3
Example 8. Consider the univariate polynomial
p(x) = a
n
x
n
+ a
n1
x
n1
+ + a
m
x
m
,
where n m. It is clear that the Newton polytope is the line segment with endpoints in n and m. The
mixed volume (in this case, just the volume) is equal to nm. Thus, the BKK bound for this polynomial
is equal to n m, which is clearly exact for generic choices of the coecients.
Example 9. Let us consider again the example discussed in (1). The Newton polytope of the rst
polynomial is the line segment with endpoints (0, 0) and (2, 3), while the second one has endpoints (1, 1)
and (3, 2). If we denote these by P
1
and P
2
, it is easy to see that
Vol(
1
P
1
+
2
P
2
) = 4
1
2
,
and thus the mixed volume of (P
1
, P
2
) is equal to 4, which is the number of solutions of (1).
Example 10. Consider now the example in equation (3). The Newton polytope of the rst polyno-
mial is the triangle with vertices {(0, 0), (1, 0), (0, 2)}, and the second one is the triangle with vertices
{(1, 0), (0, 1), (1, 1)}. Its not hard to show (how?) that
Vol(
1
P
1
+
2
P
2
) =
2
1
+
1
2
2
2
+ 3
1
2
,
and thus the MV (P
1
, P
2
) = 3, which as we have seen, is the number of solutions of (3) when the
coecients are generic.
4 Application: Nash equilibria
We can use the results described, to give a bound on the number of isolated Nash equilibria of a
game. For simplicity, consider the three-player case, where each player has two pure strategies. We
are interested here only in totally mixed equilibria, i.e., those where the players randomize among all
their pure strategies with nonzero probability (if this is not the case, then by eliminating the never
played strategies we can reduce the game to the totally mixed case). Thus, the mixed strategies can be
parametrized in terms of three variables a, b, c (0, 1), representing the probabilities with which they
play their dierent strategies.
It can be shown that the Nash equilibrium condition result in a polynomial system of the structure
p
11
bc + p
12
b + p
13
c + p
14
= 0
p
21
ca + p
22
c + p
23
a + p
24
= 0
p
31
ab + p
32
a + p
33
b + p
34
= 0,
(4)
where the coecients p
ij
are explicit linear functions of the payos. The mixed volume of the Newton
polytopes of these three equations is equal to 2, so the maximum number of totally mixed Nash equilibria
that a three-player, two-strategy game can have is equal to two.
The same argument can be generalized to the case of n players, obtaining the following result:
Theorem 11 ([MM97],[Stu02, p.82]). The maximum number of isolated totally mixed Nash equilibria
for an n-person game where each player has two pure strategies is equal to the mixed volume of the n
facets of the n cube.
This mixed volume can be computed explicitly, and is equal to the number of derangements (xed-
point free permutations) of a set with n elements. This number is also the permanent
1
of the matrix
E
n
I
n
, where E
n
is the all-ones matrix. It can be shown that this number is the closest integer to n!/e.
1
The permanent of a square matrix A R
nn
is dened as per(A) :=
n
i=1
a
i,(i)
, where n is the set of all
permutations in n elements. The formula is quite similar to that of the determinant (except that the signs of all terms are
always positive). In contrast to the determinant, which can easily be obtained in polynomial time via Gaussian elimination,
it is believed that the permanent is hard to compute (in fact, it is #P-hard).
9-4
There are extensions of this result to the case of graphical games; see [Stu02] and the references
therein for details.
References
[CLO98] D. A. Cox, J. B. Little, and D. OShea. Using Algebraic Geometry, volume 185 of Graduate
Texts in Mathematics. Springer-Verlag, 1998.
[Gur] L. Gurvits. Polynomial time algorithms to approximate mixed volumes within a simply expo-
nential factor. Preprint, available at arxiv.org/abs/cs/0702013.
[MM97] R.D. McKelvey and A. McLennan. The maximal number of regular totally mixed Nash equi-
libria. Journal of Economic Theory, 72(2):411425, 1997.
[Stu02] B. Sturmfels. Solving Systems of Polynomial Equations. AMS, Providence, R.I., 2002.
9-5
MIT 6.256 - Algebraic techniques and semidenite optimization March 17, 2010
Lecture 10
Lecturer: Pablo A. Parrilo Scribe: ???
In this lecture we begin our study of one of the main themes of the course, namely the relationships
between polynomials that are sums of squares and semidenite programming.
1 Nonegativity and sums of squares
Recall from a previous lecture the denition of a polynomial being a sum of squares.
Denition 1. A univariate polynomial p(x) is a sum of squares (SOS) if there exist q
1
, . . . , q
m
R[x]
such that
p(x) =
m
k=1
q
2
k
(x). (1)
If a polynomial p(x) is a sum of squares, then it obviously satises p(x) 0 for all x R. Thus, a
SOS condition is a sucient condition for global nonnegativity.
As we have seen, in the univariate case, the converse is also true:
Theorem 2. A univariate polynomial is nonnegative if and only if it is a sum of squares.
As we will see, there is a very direct link between sum of squares conditions on polynomials and
semidenite programming. We study rst the univariate case.
2 Sums of squares and semidenite programming
Consider a polynomial p(x) of degree 2d that is a sum of squares, i.e., it can be written as in (1). Notice
that the degree of the polynomials q
k
is at most equal to d, since the highest term of each q
2
k
is positive,
and thus there cannot be any cancellation in the highest power of x. Then, we can write
_
_
q
1
(x)
q
2
(x)
.
.
.
q
m
(x)
_
_
= V
_
_
1
x
.
.
.
x
d
_
_
, (2)
where V R
m(d+1)
, and its kth row contains the coecients of the polynomial q
k
. For future reference,
let [x]
d
be the vector in the right-hand side of (2). Consider now the matrix Q = V
T
V . We then have
p(x) =
m
k=1
q
2
k
(x) = (V [x]
d
)
T
(V [x]
d
) = [x]
T
d
V
T
V [x]
d
= [x]
T
d
Q[x]
d
.
Conversely, assume there exists a symmetric positive denite Q, for which p(x) = [x]
T
d
Q[x]
d
. Then, by
factorizing Q = V
T
V (e.g., via Choleski, or square root factorization), we arrive at a SOS decomposition
of p.
We formally express this in the following lemma, that gives a direct relation between positive semidef-
inite matrices and a sum of squares condition.
Lemma 3. Let p(x) be a univariate polynomial of degree 2d. Then, p(x) is nonnegative (or SOS) if and
only if there exists Q S
d+1
+
that satises
p(x) = [x]
T
d
Q[x]
d
.
10-1
Indexing the rows and columns of Q by {0, . . . , d}, we have:
[x]
T
d
Q[x]
d
=
d
j=0
d
k=0
Q
jk
x
j+k
=
2d
i=0
_
_
j+k=i
Q
jk
_
_
x
i
Thus, for this expression to be equal to p(x), it should be the case that
p
i
=
j+k=i
Q
jk
, i = 0, . . . , 2d. (3)
This is a system of 2d + 1 linear equations between the entries of Q and the coecients of p(x). Thus,
since Q is simultaneously constrained to be positive semidenite, and to belong to a particular ane
subspace, a SOS condition is exactly equivalent to a semidenite programming problem.
Lemma 4. A polynomial p(x) =
2d
i=0
p
i
x
i
is a sum of squares if and only if there exists Q S
d+1
+
satisfying (3). This is a semidenite programming problem.
3 Applications and extensions
We discuss rst a few applications of the SDP characterization of nonnegative polynomials, followed by
several extensions.
3.1 Optimization
Our rst application concerns the global optimization of a univariate polynomial p(x). Rather than
focusing on computing an x
).
Furthermore, using Lemma 4, we can easily write this as a semidenite programming problem. We can
thus obtain the global minimum of a univariate polynomial, by solving an SDP problem. Notice also
that at optimality, we have 0 = p(x
=
m
k=1
q
2
k
(x
. If we let p(x) =
+=
Q
, Q 0. (4)
We have exactly
n+2d
2d
linear equations, one per each coecient of p(x). As before, these conditions are
ane conditions relating the entries of Q and the coecients of p(x). Thus, we can decide membership
to, or optimize over, the set of SOS polynomials by solving a semidenite programming problem.
4.2 Using the Newton polytope
Recall that we have dened in a previous lecture the Newton polytope of a polynomial p(x) R[x
1
, . . . , x
n
]
as the convex hull of the set of exponents appearing in p. This allowed us to introduce a notion of
sparseness for a polynomial, related to the size of its Newton polytope. Sparsity (in this algebraic sense)
allows a notable reduction in the computational cost of checking sum of squares conditions of multivariate
polynomials. The reason is the following theorem due to Reznick:
Theorem 7 ([Rez78], Theorem 1). If p(x) =
q
i
(x)
2
, then New(q
i
)
1
2
New(p).
In other words, this theorem allows us, without loss of generality, to restrict the set of monomials
appearing in the representation (4) to those in the Newton polytope of p, scaled by a factor of
1
2
. This
reduces the size of the corresponding matrix Q, thus simplifying the SDP problem.
Example 8. Consider the following polynomial:
p = (w
4
+ 1)(x
4
+ 1)(y
4
+ 1)(z
4
+ 1) + 2w + 3x + 4y + 5z.
The polynomial p has degree 2d = 16, and four independent variables (n = 4). A naive approach, along
the lines described earlier, would require a matrix Q of size
n+d
d
V
x
T
f(x) < 0
for all x R
n
\ {0}, where without loss of generality we have assumed that the system (1) has an
equilibrium at the origin (see, e.g., [Kha92]). Then the condition that the Lyapunov function be positive,
and that its Lie derivative be negative, are both directly imposed as sum-of-squares constraints in terms
of the coecients of the Lyapunov function.
As an example, consider the following system:
x = x + (1 + x)y
y = (1 + x)x.
Using SOSTOOLS [PPP05] we easily nd a quartic polynomial Lyapunov function, which after rounding
(for purely cosmetic reasons) is given by
V (x, y) = 6x
2
2xy + 8y
2
2y
3
+ 3x
4
+ 6x
2
y
2
+ 3y
4
.
It can be readily veried that both V (x, y) and (
x
y
x
2
xy
y
2
6 1 0 0 0
1 8 0 0 1
0 0 3 0 0
0 0 0 6 0
0 1 0 0 3
x
y
x
2
xy
y
2
V =
x
y
x
2
xy
10 1 1 1
1 2 1 2
1 1 12 0
1 2 0 6
x
y
x
2
xy
,
and the matrices in the expression above are positive denite. Similar approaches may also be used for
nding Lyapunov functionals for certain classes of hybrid systems.
11-1
1.2 Entangled states in quantum mechanics
The state of a nite-dimensional quantum system can be described in terms of a positive semidenite
Hermitian matrix, called the density matrix. An important property of a bipartite quantum state is
whether or not it is separable, which means that it can be written as a convex combination of tensor
products of rank one matrices, i.e.,
=
i
p
i
(x
i
x
T
i
) (y
i
y
T
i
), p
i
0,
i
p
i
= 1,
where for simplicity we have restricted , x
i
, y
i
to be real. Here x
i
R
n1
, y
i
R
n2
, and S
n1n2
+
. If
the state is not separable, then it is said to be entangled.
A question of interest is the following: Given the density matrix of a quantum state, how to
recognize whether the state is entangled or not? How can we certify that the state is entangled? It has
been shown by Gurvits that in general this is an NP-hard question [Gur03].
A natural mathematical object to study in this context is the set of positive maps, i.e., the linear
operators : S
n1
S
n2
that map positive semidenite matrices into positive semidenite matrices.
Notice that to any such , we can associate a unique observable L S
n1n2
, that satises y
T
(xx
T
)y =
(x y)
T
L(x y). Furthermore, if is a positive map, then the pairing between the observable L and
any separable state will always give a nonnegative number, since
L, = Tr L (
i
p
i
(x
i
x
T
i
) (y
i
y
T
i
)) =
i
p
i
TrL (x
i
y
i
) (x
i
y
i
)
T
=
i
p
i
(x
i
y
i
)
T
L(x
i
y
i
) =
i
p
i
y
T
i
(x
i
x
T
i
)y
i
0.
In other words, every positive map yields a separating hyperplane for the convex set of separable states.
It can further be shown that this is in fact a complete characterization (and thus, these sets are dual to
each other).
The set of positive maps can be exactly characterized in terms of a multivariate polynomial non-
negativity condition, since the map : S
n1
S
n2
is positive if and only if the polynomial p(x, y) =
y
T
(xx
T
)y is nonnegative for all x, y (why?). Replacing nonnegativity with sum of squares based con-
ditions, we can obtain a family of eciently computable criteria that certify entanglement.
For more background and details about this problem, see [DPS02, DPS04] and the references therein.
2 Moments
Consider a nonnegative measure on R (or if you prefer, a real-valued random variable X). We can
then dene the moments, which are the expectation of powers of X.
k
:= E[X
k
] =
x
k
d (2)
What constraints, if any, should the
k
satisfy? Is is true that for any set of numbers
0
,
1
, . . . ,
k
,
there always exists a nonnegative measure having exactly these moments?
It should be apparent that some conditions are required. For instance, consider (2) for an even value
of k. Since the measure is nonnegative, it is clear that in this case we have
k
0.
However, thats clearly not enough, and more restrictions should hold. A simple one can be derived by
recalling the relationship between the rst and second moments and the variance of a random variable,
i.e., var(X) = E[X
2
] E[X]
2
=
2
2
1
. Since the variance is always nonnegative, we should have
2
1
0.
11-2
How to systematically derive conditions of this kind? Notice that the previous inequality can be
obtained by noticing that for all a, b,
0 E[(a + bX)
2
] = a
2
+ 2abE[X] + b
2
E[X
2
] =
a
b
T
1
1
1
2
a
b
,
which implies that the 2 2 matrix above must be positive semidenite. Interestingly, the inequality
obtained earlier is exactly equal to the determinant of this matrix.
Exactly the same procedure can be done for higher-order moments. Proceeding this way, we have
that the higher order moments must always satisfy:
1
1
2
d
1
2
3
d+1
2
3
4
d+2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
d
d+1
d+2
2d
0. (3)
Notice that the diagonal elements correspond to the even-order moments, which should obviously be
nonnegative.
As we will see below, this condition is almost necessary and sucient in the univariate case. In the
multivariate case, however, there will be more serious problems (just like for polynomial nonnegativity
vs. sums of squares).
Remark 1. For unbounded intervals, the SDP conditions characterize the closure of the set of moments,
but not necessarily the whole set. As an example, consider the set of moments given by = (1, 0, 0, 0, 1),
corresponding to the Hankel matrix
1 0 0
0 0 0
0 0 1
.
Although the matrix above is PSD, it is not hard to see that there is no nonnegative measure corresponding
to those moments. However, the parametrized atomic measure given by
=
4
2
(x +
1
) + (1
4
) (x) +
4
2
(x
1
)
has as rst ve moments (1, 0,
2
, 0, 1), and thus as 0 the corresponding Hankel matrix is the one
given above.
2.1 Nonnegative measures on intervals
Just like we did for the case of polynomials nonnegative on intervals, we can similarly obtain a necessary
and sucient characterization for moments. For simplicity, we present below only one particular case,
corresponding to the interval [1, 1].
Lemma 2. There exists a nonnegative measure in [1, 1] with moments (
0
,
1
, . . . ,
2d+1
) if and only
if
0
1
2
d
1
2
3
d+1
2
3
4
d+2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
d
d+1
d+2
2d
1
2
3
d+1
2
3
4
d+2
3
4
5
d+3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
d+1
d+2
d+3
2d+1
0. (4)
11-3
1
0.5
0
0.5
1
0
0.5
1
1
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
1
3
Figure 1: Set of valid moments (
1
,
2
,
3
) of a probability measure on [1, 1]. This is the convex hull
of the moment curve (t, t
2
, t
3
), for 1 t 1. An explicit SDP representation is given in (4).
Notice that the necessity is clear, since it follows from consideration of the quadratic form (in the
a
i
):
0 E
(1 X)(
d
i=0
a
i
X
i
)
2
=
d
j=0
d
k=0
(
j+k
j+k+1
)a
j
a
k
,
where the rst inequality follows since 1 X is always nonnegative, since X is supported on [1, 1].
Notice the similarities (in fact, the duality) with the conditions for polynomial nonnegativity.
2.2 The moment curve
An appealing geometric interpretation of the set of valid moments is in terms of the so-called moment
curve, which is the parametric curve in R
d+1
given by t (1, t, t
2
, . . . , t
d
). Indeed, it is easy to see that
every point on the curve corresponds to a Dirac measure, where all the probability is concentrated on a
given point. Thus, every nite (or innite) measure on the interval corresponds to a point in the convex
hull. In Figure 1 we present an illustration of the set of valid moments, for the case d = 3.
3 Bridging the gap
What to do in the cases where the set of nonnegative polynomials is no longer equal to the SOS ones? As
we will see in much more detail later, it turns out that we can approximate any semialgebraic problem
(including the simple case of a single polynomial being nonnegative) by sum of squares techniques.
As a preview, and a hint at some of the possibilities, lets consider how to prove nonnegativity of a
particular polynomial which is not a sum of squares. Recall that the Motzkin polynomial was dened
as:
M(x, y) = x
4
y
2
+ x
2
y
4
+ 1 3x
2
y
2
.
and is a nonnegative polynomial that is not SOS. We can try multiplying it by another polynomial which
is known to be positive, and check whether the resulting product is SOS. In this case, multiplying by
11-4
the factor (x
2
+ y
2
), we can nd the decomposition
(x
2
+ y
2
) M(x, y) = y
2
(1 x
2
)
2
+ x
2
(1 y
2
)
2
+ x
2
y
2
(x
2
+ y
2
2)
2
,
which clearly certies that M(x, y) 0.
More details will follow...
References
[DPS02] A. C. Doherty, P. A. Parrilo, and F. M. Spedalieri. Distinguishing separable and entangled
states. Physical Review Letters, 88(18), 2002.
[DPS04] A. C. Doherty, P. A. Parrilo, and F. M. Spedalieri. Complete family of separability criteria.
Physical Review A, 69:022308, 2004.
[Gur03] L. Gurvits. Classical deterministic complexity of Edmonds problem and quantum entangle-
ment. In STOC 03: Proceedings of the thirty-fth annual ACM symposium on Theory of
computing, pages 1019, New York, NY, USA, 2003. ACM.
[Kha92] H. Khalil. Nonlinear Systems. Macmillan Publishing Company, 1992.
[Par00] P. A. Parrilo. Structured semidenite programs and semialgebraic geometry methods in robust-
ness and optimization. PhD thesis, California Institute of Technology, May 2000. Available at
http://resolver.caltech.edu/CaltechETD:etd-05062004-055516.
[PP02] A. Papachristodoulou and S. Prajna. On the construction of Lyapunov functions using the
sum of squares decomposition. In Proceedings of the 41
th
IEEE Conference on Decision and
Control, 2002.
[PPP05] S. Prajna, A. Papachristodoulou, and P. A. Parrilo. SOSTOOLS: Sum of squares optimization
toolbox for MATLAB, 2002-05. Available from http://www.cds.caltech.edu/sostools and
http://www.mit.edu/~parrilo/sostools.
11-5
MIT 6.256 - Algebraic techniques and semidenite optimization March 31, 2010
Lecture 12
Lecturer: Pablo A. Parrilo Scribe: ???
In previous lectures, we have described necessary conditions for the existence of a nonnegative measure
with given moments. In the univariate case, these conditions were also sucient. We revisit rst a
classical algorithm to eectively obtain this measure.
1 Recovering a measure from its moments
We review next a classical method for producing a univariate atomic measure with a given set of moments
(e.g., [ST43, Dev86]). Other similar variations of this method are commonly used in signal processing,
e.g., Pisarenkos harmonic decomposition method, where we are interested in producing a superposition
of sinusoids with a given covariance matrix. This technique (or essentially similar ones) is known under
a variety of names, such as Pronys method, or the Vandermonde decomposition of a Hankel matrix.
Consider the set of moments (
0
,
1
, . . . ,
2n1
) for which we want to nd an associated nonnegative
measure, supported on the real line. The resulting measure will be discrete, of the form
n
i=1
w
i
(xx
i
).
For this, consider the linear system
_
0
1
n1
1
2
n
.
.
.
.
.
.
.
.
.
.
.
.
n1
n
2n2
_
_
_
_
c
0
c
1
.
.
.
c
n1
_
_
=
_
n+1
.
.
.
2n1
_
_
. (1)
The Hankel matrix on the left-hand side of this equation is the one that appeared earlier as a sucient
condition for the moments to represent a nonnegative measure. The linear system in (1) has a unique
solution if the matrix is positive denite. In this case, we let x
i
be the roots of the univariate polynomial
x
n
+ c
n1
x
n1
+ + c
1
x + c
0
= 0,
which are all real and distinct (why?). We can then obtain the corresponding weights w
i
by solving the
nonsingular Vandermonde system given by
n
i=1
w
i
x
j
i
=
j
(0 j n 1).
In the exercises, you will have to prove that this method actually works (i.e., the x
i
are real and distinct,
the w
i
are nonnegative, and the moments are the correct ones).
Example 1. Lets nd a nonnegative measure whose rst six moments are given by (1, 1, 2, 1, 6, 1). The
solution of the linear system (1) yields the polynomial
x
3
4x
2
9x + 16 = 0,
whose roots are 2.4265, 1.2816, and 5.1449. The corresponding weights are 0.0772, 0.9216, and 0.0012,
respectively.
Example 2. We outline here a stylized application of these results. Consider a time-domain signal
that is the sum of k Dirac functions, i.e., f(x) :=
k
i=1
w
i
(x x
i
), where the 2k parameters w
i
, x
i
are
unknown. By the results above, it is enough to obtain 2k linear functionals on the signal (namely, the
moments
i
:=
x
i
f(x)dx) to fully recover it from the measurements. Indeed, the signal can always be
exactly reconstructed from these 2k moments, by using the algorithm described above. Notice that the
nonnegativity assumption on the weights w
i
is not critical, and can easily be removed.
12-1
More realistic, but essentially similar results can be obtained by considering signals that are sums
of (possibly damped) sinusoids of dierent frequencies. This viewpoint has a number of interesting
connections with error-correcting codes (in particular, interpolation-based codes such as Reed-Solomon),
as well as the recent compressed sensing results.
Remark 3. As described, the measure recovery method described always works correctly, provided the
computations are done in exact arithmetic. In most practical applications, it is necessary or convenient
to use oating-point computations. Furthermore, in many settings such as optimization the moment
information may be noisy, and therefore the matrices may contain some (hopefully small) perturbations
from their nominal values. For these reasons, it is of interest to understand sensitivity issues, both at
the level of what is intrinsic about the problem (conditioning), and about the specic algorithm used
(numerical stability).
As described, the technique described above can run into numerical diculties. On the conditioning
side, it is well-known that from the numerical viewpoint, the monomial basis (with respect to which we
are taking moments) is a bad basis for the space of polynomials. On the numerical stability side, the
algorithm above does a number of inecient calculations, such as explicitly computing the coecients
c
i
of the polynomial corresponding to the support of the measure. A better approach involves directly
computing the nodes x
i
as the generalized eigenvalues of a matrix pencil. Some of these issues will be
explored in more detail in the exercises.
2 A probabilistic interpretation
We also mention here an appealing probabilistic interpretation of the dual (2), commonly used in integer
and quadratic programming or game theory, and developed by Lasserre in the polynomial optimization
case [Las01]. Consider as before the problem of minimizing a polynomial. Now, rather than looking for
a minimizer x in R
n
, lets relax our notion of solution to allow for probabilities densities on R
n
,
and replace the objective function by its natural generalization
p(x)d. It clearly holds that the new
objective is never larger than the original one, since we are making the feasible set bigger.
This change makes the problem trivially convex, although innite-dimensional. To produce a nite
dimensional approximation (which may or may not be exact), we rewrite the objective function in terms
of the moments of the measure , and write valid semidenite contraints for the moments
k
.
3 Duality and complementary slackness
What is the relationship between this classical method and semidenite programming duality? Recall
our approach to minimizing a polynomial p(x) by computing
max s.t. p(x) is SOS.
If this relaxation is exact (i.e., the optimal is equal to the optimal value of the polynomial) then at
optimality, we necessarily have p(x
=
i
g
2
i
(x
_
Q
00
+ = p
0
j+k=i
Q
jk
= p
i
i = 1, . . . , 2d
Q 0
and its dual
min
2d
i=0
p
i
i
s.t. M() :=
_
0
1
d
1
2
d+1
.
.
.
.
.
.
.
.
.
.
.
.
d
d+1
2d
_
_
0,
0
= 1. (2)
At optimality, complementarity slackness holds, i.e., the product of the primal and dual matrices
vanishes. We have then M() Q = 0. Assume that the leading k k submatrix of M() is nonsingular.
Then, the procedure described in Section 1 gives a k-atomic measure, with support in the minimizers
of p(x). Generically, this matrix M() will be rank one, which will correspond to the case of a unique
optimal solution.
Remark 4. Unlike the univariate case, a multivariate polynomial that is bounded below may not achieve
its minimum. A well-known example is p(x, y) = x
2
+(1xy)
2
, which clearly satises p(x, y) 0. Since
p(x, y) = 0 would imply x = 0 and 1 xy = 0 (which is impossible), this value cannot be achieved.
However, we can get arbitrarily close, since p(, 1/) =
2
, for any > 0.
4 Multivariate case
We have seen previously that in the multivariate case, it is no longer the case that nonnegative poly-
nomials are always sums of squares. The corresponding result on the dual side is that the set of valid
moments is no longer described by the obvious semidenite constraints, obtained by considering the
expected value of squares (even if we require strict positivity).
Example 5 (Dual Motzkin). Consider the existence of a probability measure on R
2
, that satises the
moment constraints:
E[1] = E[X
4
Y
2
] = E[X
2
Y
4
] = 1,
E[X
2
Y
2
] = 2,
E[XY ] = E[XY
2
] = E[X
2
Y ] = E[X
2
Y
3
] = E[X
3
Y
2
] = E[X
3
Y
3
] = 0.
(3)
The obvious nonnegativity constraints are satised, since
E[(a + bXY + cXY
2
+ dX
2
Y )
2
] = a
2
+ 2b
2
+ c
2
+ d
2
0.
However, it turns out that these conditions are only necessary, but not sucient. This can be seen by
computing the expectation of the Motzkin polynomial (which is nonnegative), since in this case we have
E[X
4
Y
2
+ X
2
Y
4
+ 1 3X
2
Y
2
] = 1 + 1 + 1 6 = 3,
thus proving that no nonnegative measure with the given moments can exist.
12-3
5 Density results
Recent results by Blekherman [Ble06] give quantitative bounds on the relative density of the cone of
sum of squares versus the cone of nonnegative polynomials. Concretely, in [Ble06] it is proved that a
suitably normalized section of the cone of positive polynomials
P
n,2d
satises
c
1
n
1
2
Vol
P
n,2d
Vol B
M
1
D
M
c
2
n
1
2
,
while the corresponding expression for the section of the cone of sum of squares
n,2d
is
c
3
n
d
2
Vol
n,2d
Vol B
M
1
D
M
c
4
n
d
2
,
where c
1
, c
2
, c
3
, c
4
depend on d only (explicit expressions are available), D
M
=
n+2d
2d
1, and B
M
is
the unit ball in R
D
M
.
These expressions show that for xed d, as n the volume of the set of sum of squares becomes
vanishingly small when compared to the nonnegative polynomials.
Show the values of the actual bounds, for reasonable dimensions ToDo
References
[Ble06] G. Blekherman. There are signicantly more nonegative polynomials than sums of squares.
Israel Journal of Mathematics, 153(1):355380, 2006.
[Dev86] L. Devroye. Nonuniform random variate generation. Springer-Verlag, New York, 1986.
[Las01] J. B. Lasserre. Global optimization with polynomials and the problem of moments. SIAM J.
Optim., 11(3):796817, 2001.
[ST43] J.A. Shohat and J.D. Tamarkin. The Problem of Moments. American Mathematical Society
Mathematical surveys, vol. II. American Mathematical Society, New York, 1943.
12-4
MIT 6.256 - Algebraic techniques and semidenite optimization April 2, 2010
Lecture 13
Lecturer: Pablo A. Parrilo Scribe: ???
Today we introduce the rst basic elements of algebraic geometry, namely ideals and varieties over the
complex numbers. This dual viewpoint (ideals for the algebra, varieties for the geometry) is enormously
powerful, and will help us later in the development of methods for solving polynomial equations. We
also present the notion of quotient rings, which are very natural when considering functions dened on
algebraic varieties (e.g., in polynomial optimization problems with equality constraints). Finally, we
begin our study of Groebner bases, by dening the notion of term orders. A superb introduction to
algebraic geometry, emphasizing the computational aspects, is the textbook of Cox, Little, and OShea
[CLO97]. Another recommended introductory-level book is the one by Hassett [Has07].
1 Polynomial ideals
For notational simplicity, we use C[x] to denote the polynomial ring in n variables C[x
1
, . . . , x
n
]. Spe-
cializing the general denition of an ideal to a polynomial ring, we have the following:
Denition 1. A subset I C[x] is an ideal if it satises:
1. 0 I.
2. If a, b I, then a + b I.
3. If a I and b C[x], then a b I.
The two most important examples of polynomial ideals for our purposes are the following:
The set of polynomials that vanish in a given set S C
n
, i.e.,
I(S) := {f C[x] : f(a
1
, . . . , a
n
) = 0 (a
1
, . . . , a
n
) S},
is an ideal, called the vanishing ideal of S.
The ideal generated by a nite set of polynomials {f
1
, . . . , f
s
}, dened as
f
1
, . . . , f
s
:= {f | f = g
1
f
1
+ + g
s
f
s
, g
i
C[x]}. (1)
An ideal is nitely generated if it can be written as in (1) for some nite set of polynomials {f
1
, . . . , f
s
}.
An ideal is called principal if it can be generated by a single polynomial. The intersection of two ideals
is again an ideal. What about the union of ideals?
Example 2. In the univariate case (i.e., the polynomial ring is C[x]), every ideal is principal.
One of the most important facts about polynomial ideals is Hilberts niteness theorem:
Theorem 3 (Hilbert Basis Theorem). Every polynomial ideal in C[x] is nitely generated.
We will present a proof of this after learning about Groebner bases.
From the computational viewpoint, two very natural questions about ideals are the following:
Given a polynomial p(x), how to decide if it belongs to a given ideal?
How to nd a convenient representation of an ideal? What does convenient mean?
13-1
-2 -1.5 -1 -0.5 0.5 1 1.5 2
x
-2
-1.5
-1
-0.5
0.5
1
1.5
2
y
-2
0
2
-2
0
2
-2
-1
0
1
2
-2
0
2
-2
0
2
Figure 1: Two algebraic varieties. The one on the left is dened by the equation (x
2
+ y
2
1)(3x +
6y 4) = 0. The one on the right is a quartic surface, dened by 1 x
2
y
2
2z
2
+ z
4
= 0.
2 Algebraic varieties
An (ane) algebraic variety is the zero set of a nite collection of polynomials (see formal denition
below). The word ane here means that we are working in the standard ane space, as opposed to
projective space, where we identify x, y C
n
if x = y for some = 0.
Denition 4. Let f
1
, . . . , f
s
be polynomials in C[x]. Let the set V be
V(f
1
, . . . , f
s
) := {(a
1
, . . . , a
n
) C
n
: f
i
(a
1
, . . . , a
n
) = 0 1 i s}.
We call V(f
1
, . . . , f
s
) the ane variety dened by f
1
, . . . , f
s
.
A simple example of a variety is a (complex) ane subspace, that corresponds to the vanishing of a
nite collection of ane polynomials. A few additional examples of varieties are shown in Figure 1.
It is not too hard to show that nite unions and intersections of algebraic varieties are again algebraic
varieties. What about the innite case?
Remark 5. Recall our previous encounter with the Zariski topology, whose closed sets where dened to
be the algebraic varieties, i.e., the vanishing set of a nite set of polynomial equations. To prove that
this is actually a topology, we need to show that arbitrary intersections of closed sets are closed. Hilberts
basis theorem precisely guarantees this fact.
Perhaps the most natural question about algebraic varieties is the following:
Given a variety V , how to decide it is nonempty?
Lets start connecting ideals and varieties. Consider a nite set of polynomials {f
1
, . . . , f
s
}. We
already know how to generate an ideal, namely f
1
, . . . , f
s
. However, we can also look at the corre-
sponding variety V(f
1
, . . . , f
s
). Since this variety is a subset of C
n
, we can form the corresponding
vanishing ideal, I(V(f
1
, . . . , f
s
)). How do these two ideals related to each other? Is it always the case
that
f
1
, . . . , f
s
= I(V(f
1
, . . . , f
s
)),
and if it is not, what are the reasons? The answer to these questions (and more) will be given by another
famous result by Hilbert, known as the Nullstellensatz.
13-2
3 Quotient rings
Whenever we have an ideal in a ring, we can immediately dene a notion of equivalence classes, where
we identify two elements in the ring if and only if their dierence is in the ideal.
Example 6. Recall that a simple example of an ideal in the ring Z was the set of even integers. By
identifying two integers if their dierence is even, we partition Z into two equivalence classes, namely
the even and the odd numbers. More generally, if the ideal is given by the integer multiples of a given
number m, then Z can be partitioned into m equivalence classes.
We can do this for the polynomial ring C[x], and any ideal I.
Denition 7. Let I C[x] be an ideal, and let f, g C[x]. We say f and g are congruent modulo I,
written
f g mod I,
if f g I.
It is easy to show that this is an equivalence relation, i.e., it is reexive, symmetric, and transi-
tive. Thus, this partitions C[x] into equivalence classes, where two polynomials are the same if their
dierence belongs to the ideal. This allows us to dene the quotient ring:
Denition 8. The quotient C[x]/I is the set of equivalence classes for congruence modulo I.
The quotient C[x]/I inherits the ring structure of C[x], with the natural operations. Thus, with
these operations now dened between equivalence classes, C[x]/I becomes a ring, known as the quotient
ring.
Quotient rings are particularly useful when considering a polynomial function p(x) over the algebraic
variety dened by g
i
(x) = 0. Notice that if we dene the ideal I = g
i
, then any polynomial q that is
congruent with p modulo I takes exactly the same values in the variety.
4 Monomial orderings
In order to begin studying nice bases for ideals, we need a way of ordering monomials. In the univariate
case, this is straightforward, since we can dene x
a
x
b
as being true if and only if a > b. In the
multivariate case, there are a lot more options.
We also want the ordering structure to be consistent with polynomial multiplication. This is formal-
ized in the following denition.
Denition 9. A monomial ordering on C[x] is a relation on Z
n
+
(i.e., the monomial exponents), such
that:
1. The relation is a total ordering.
2. If , and Z
n
+
, then + +.
3. The relation is a well-ordering (every nonempty subset has a smallest element).
One of the simplest examples of a monomial ordering is the lexicographic ordering, where
lex
if
the left-most nonzero entry of is positive. We will see some other examples of monomial orderings
later in the course.
References
[CLO97] D. A. Cox, J. B. Little, and D. OShea. Ideals, varieties, and algorithms: an introduction to
computational algebraic geometry and commutative algebra. Springer, 1997.
[Has07] B. Hassett. Introduction to algebraic geometry. Cambridge University Press, 2007.
13-3
MIT 6.256 - Algebraic techniques and semidenite optimization April 7, 2010
Lecture 14
Lecturer: Pablo A. Parrilo Scribe: ???
After a brief review of monomial orderings, we develop the basic ideas of Groebner bases, followed
by examples and applications. For background and much more additional material, we recommend the
textbook of Cox, Little, and OShea [CLO97]. Other good, more specialized references are [AL94, BW93,
KR00].
1 Monomial orderings
Recall from last lecture the notion of a monomial ordering:
Denition 1. A monomial ordering on C[x] is a relation on Z
n
+
(i.e., the monomial exponents), such
that:
1. The relation is a total ordering.
2. If , and Z
n
+
, then + +.
3. The relation is a well-ordering (every nonempty subset has a smallest element).
There are several term orderings of interest in computational algebra. Among them, we mention:
Lexicographic (dictionary). Here
lex
if the left-most nonzero entry of is positive.
Notice that a particular order of the variables is assumed, and by changing this, we obtain n!
nonequivalent lexicographic orderings.
Graded lexicographic. Sort rst by total degree, then lexicographic, i.e.,
grlex
if || > ||, or
if || = || and
lex
.
Graded reverse lexicographic. Here
grevlex
if || > ||, or if || = || and the right-most
nonzero entry of is negative. This ordering, although somewhat nonintuitive, has some
desirable computational properties.
General matrix orderings. Described by a weight matrix W R
kn
(k n), where
W
if
(W)
lex
(W). For W to correspond to a monomial ordering as dened, the rst nonzero entry
on each column must be positive.
It turns out that every monomial ordering can be described by an associated matrix W, i.e., every
monomial ordering is a matrix ordering. What are the matrices corresponding to the rst three orderings
described?
Example 2. Consider the polynomial ring C[x, y]. In the lexicographic ordering (
lex
) discussed, we
have:
1 y y
2
x xy xy
2
x
2
x
2
y x
2
y
2
,
while for the other two orderings (
grlex
and
grevlex
), which in the special case of two variables coincide,
we have:
1 y x y
2
xy x
2
y
3
xy
2
x
2
y x
3
.
Picture comparing different orderings ToDo
Example 3. Consider the monomials = x
3
y
2
z
8
and = x
2
y
9
z
2
. If the variables are ordered as
(x, y, z), we have
lex
,
grlex
,
grevlex
.
Notice that x y z for all three orderings.
14-1
2 Groebner bases
2.1 Monomial ideals
Before studying general ideals, it is convenient to introduce rst a special class, known as monomial
ideals.
Denition 4. A monomial ideal is a polynomial ideal that can be generated by monomials.
What are the possible monomials that belong to a given monomial ideal? Since x
I x
+
I
for 0, we have that these sets are closed upwards.
Picture of monomial ideals ToDo
Furthermore, a polynomial belongs to a monomial ideal I if and only if it all its terms are in I.
Theorem 5 (Dicksons lemma). Every monomial ideal is nitely generated.
We consider next a special monomial ideal, associated to every polynomial ideal I. From now on, we
assume a xed monomial ordering (e.g., graded reverse lexicographic), and denote by in(f) the largest
monomial appearing in the polynomial f = 0.
Denition 6. Consider an ideal I C[x], and a xed monomial ordering. The initial ideal of I,
denoted in(I), is the monomial ideal generated by the leading terms of all the elements in I, i.e.,
in(I) := in(f) : f I \ {0}.
A monomial x
I. Since
I is also an ideal, we could compute a Groebner basis for
it, and then reduce the problem to the previous one. However, it is often more ecient to instead
use the following result (Rabinowitchs trick):
p
I 1 f
1
, . . . , f
s
, 1 yp,
where y is a (new) additional variable.
Consistency of polynomial equations. Consider a nite set of polynomial equations {f
i
= 0}, and
let I = f
i
be the corresponding ideal. By the Nullstellensatz, the given equations are infeasible
if and only if {1} is the reduced Groebner basis of I.
Elimination. For notational simplicity, consider an ideal I C[x, y, z]. Suppose that we want
to compute all the polynomials in I, that do not depend on the variable z, i.e., I C[x, y].
Geometrically, this elimination of variables corresponds to (the Zariski closure of) the projection
of the corresponding variety into (x, y). This intersection (or projection) can be easily obtained,
by computing a Groebner basis G of I with respect to a lexicographic (or elimination) ordering.
The corresponding ideal is then generated by G C[x, y].
14-3
4 Zero-dimensional ideals
In practice, we are often interested in polynomial systems that have only a nite number of solutions
(the zero-dimensional case), and many interesting things happen in this case. Among other properties,
the quotient ring C[x]/I is now a nite dimensional vector space, with its dimension being equal to the
number of standard monomials. Furthermore, Groebner bases can be used to fully reduce their solution
to a classical eigenvalue problem, generalizing the companion matrix notion from the univariate case.
All this, and much more, next time...
References
[AL94] W.W. Adams and P. Loustaunau. An introduction to Grobner bases, volume 3 of Graduate
Studies in Mathematics. American Mathematical Society, Providence, RI, 1994.
[BW93] T. Becker and V. Weispfenning. Grobner bases, volume 141 of Graduate Texts in Mathematics.
Springer-Verlag, New York, 1993.
[CLO97] D. A. Cox, J. B. Little, and D. OShea. Ideals, varieties, and algorithms: an introduction to
computational algebraic geometry and commutative algebra. Springer, 1997.
[CoC] CoCoATeam. CoCoA: a system for doing Computations in Commutative Algebra. Available
at http://cocoa.dima.unige.it.
[GPS05] G.-M. Greuel, G. Pster, and H. Schonemann. Singular 3.0. A Computer Algebra System for
Polynomial Computations, Centre for Computer Algebra, University of Kaiserslautern, 2005.
http://www.singular.uni-kl.de.
[GS] D.R. Grayson and M. E. Stillman. Macaulay 2, a software system for research in algebraic
geometry. Available at http://www.math.uiuc.edu/Macaulay2/.
[KR00] M. Kreuzer and L. Robbiano. Computational commutative algebra. 1. Springer-Verlag, Berlin,
2000.
14-4
MIT 6.256 - Algebraic techniques and semidenite optimization April 9, 2010
Lecture 15
Lecturer: Pablo A. Parrilo Scribe: ???
Today we will see a few more examples and applications of Groebner bases, and we will develop the
zero-dimensional case.
1 Zero-dimensional ideals
In practice, we are often interested in polynomial systems that have only a nite number of solutions
(the zero-dimensional case), and as we will see, many interesting things happen in this case.
Denition 1. An ideal I is zero-dimensional if the associated variety V (I) is a nite set.
Given a system of polynomial equations, how to decide if it has a nite number of solutions (i.e.,
if the corresponding ideal is zero-dimensional)? We can state a simple criterion for this in terms of a
Groebner basis.
Lemma 2. Let G be a Groebner basis of the ideal I C[x
1
, . . . , x
n
]. The ideal I is zero-dimensional if
and only if for each i (1 i n), there exists an element in the Groebner basis whose initial term is a
pure power of x
i
.
Among other important consequences, when I is a zero-dimensional ideal the quotient ring C[x]/I is
a nite dimensional vector space, with its dimension being equal to the number of standard monomials.
These are the monomials that are not in the initial ideal in(I) (i.e., the monomials under the staircase).
Furthermore, we can use Groebner bases to reduce the eective calculation of the solutions of a zero-
dimensional polynomial system to an eigenvalue problem, generalizing the companion matrix notion
from the univariate case. We sketch this below.
Recall that in this case, the quotient C[x]/I is a nite dimensional vector space. The main idea is
to consider the homomorphisms given by the n linear maps M
xi
: C[x]/I C[x]/I, f
(x
i
f) (that
is, multiplication by the coordinate variables, followed by normal form). Choosing as a basis the set
of standard monomials, we can eectively compute a matrix representation of these linear maps. This
denes n matrices M
xi
, that commute with each other (why?).
Assume for simplicity that all the roots have single multiplicity. Then, all the M
xi
can be simulta-
neously diagonalized by a single matrix V , and the kth diagonal entry of V M
xi
V
1
contains the ith
coordinate of the kth solution, for 1 k #{V (I)}.
(In general, we can block-diagonalize this commutative algebra, splitting into its semisimple and
nilpotent components. The nilpotent part is trivial if and only if the ideal is radical.)
Remark 3. In practice, a better alternative to a full diagonalization (which is in general numerically
unstable) is a Schur-like approach, where we nd a unitary matrix U that simultaneously triangularizes
the matrices in the commuting family; see [CGT97] for details.
To understand these ideas a bit better, lets recall the univariate case.
Example 4. Consider the ring C[x] of polynomials in a single variable x, and an ideal I C[x]. Since
every ideal in this ring is principal, I can be generated by a single polynomial p(x) = p
n
x
n
+ +p
1
x+p
0
.
Then, we can write I = p(x), and {p(x)} is a Groebner basis for the ideal (why?). The quotient C[x]/I
is an n-dimensional vector, with a suitable basis given by the standard monomials {1, x, . . . , x
n1
}.
Consider as before the linear map M
x
: C[x]/I C[x]/I. The matrix representation of this linear
15-1
map in the given basis is given by
_
_
0 0 0 p
0
/p
n
1 0 0 p
1
/p
n
0 1 0 p
2
/p
n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 p
n1
/p
n
_
_
,
which is the standard companion matrix C
p
associated with p(x). Its eigenvalues are exactly the roots of
p(x).
We present next a multivariate example.
Example 5. Consider the ideal I C[x, y, z] given by
I = xy z, yz x, zx y.
Choosing a term ordering (e.g., lexicographic, where x y z), we obtain the Groebner basis
G = {x
3
x, yx
2
y, y
2
x
2
, z yx}.
We can directly see from this that I is zero-dimensional (why?). A basis for the quotient space is
given by {1, x, x
2
, y, yx}. Consider the maps M
x
, M
y
, and M
z
, we have that their corresponding matrix
representations are given by
M
x
=
_
_
0 0 0 0 0
1 0 1 0 0
0 1 0 0 0
0 0 0 0 1
0 0 0 1 0
_
_
, M
y
=
_
_
0 0 0 0 0
0 0 0 0 1
0 0 0 1 0
1 0 1 0 0
0 1 0 0 0
_
_
, M
z
=
_
_
0 0 0 0 0
0 0 0 1 0
0 0 0 0 1
0 1 0 0 0
1 0 1 0 0
_
_
.
It can be veried that these three matrices commute. A simultaneous diagonalizing transformation is
given by the matrix:
V =
_
_
1 0 0 0 0
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
1 1 1 1 1
_
_
, V
1
=
1
4
_
_
4 0 0 0 0
0 1 1 1 1
4 1 1 1 1
0 1 1 1 1
0 1 1 1 1
_
_
.
The corresponding transformed matrices are:
V M
x
V
1
= diag(0, 1, 1, 1, 1)
V M
y
V
1
= diag(0, 1, 1, 1, 1)
V M
z
V
1
= diag(0, 1, 1, 1, 1)
,
from where the coordinates of the ve roots can be read.
In the general (radical) case, the matrix V is a generalized Vandermonde matrix, with rows indexed
by roots (points in the variety) and columns indexed by the standard monomials. The V
ij
entry contains
the j-th monomial evaluated at the ith root. Since V V
1
= I, we can also interpret the jth column
of V
1
as giving the coecients of a Lagrange interpolating polynomial p
j
(x), that vanishes at all the
points in the variety, except at r
j
, where it takes the value 1 (i.e., p
j
(r
k
) =
jk
).
Generalize Hermite form, etc ToDo
15-2
x1 x2
x3
x5 x4 x6
Figure 1: A six-node graph.
2 Hilbert series
Consider an ideal I C[x] and the corresponding quotient ring C[x]/I. We have seen that, once a
particular Groebner basis is chosen, we could associate to every element of C[x]/I a unique representative,
namely a C-linear combination of standard monomials, obtained as the remainder after division with
the corresponding Groebner basis. We are interested in studying, for every integer k, the dimension of
the vector space of remainders of degree less than or equal to k. Expressed in a simpler way, we want
to know how many standard monomials of degree k there are, for any given k.
Rather than studying this for dierent values of k separately, it is convenient to collect (or bundle) all
these numbers together in a single object (this general technique is usually called generating function).
The Hilbert series of I, denoted H
I
(t), is then dened as the generating function of the dimension of
the space of residues of degree k, i.e.,
H
I
(t) =
k=0
dim(C[x]/I P
n,k
) t
k
, (1)
where P
n,k
denotes the set of homogeneous polynomials is n variables of degree k.
Notice that, if the ideal is zero-dimensional, the corresponding Hilbert series is actually a nite sum,
and thus a polynomial. The number of solutions is then equal to H
I
(1).
Example 6. For the ideal I in Example 5, the corresponding Hilbert function is H
I
(t) = 1 + 2t + 2t
2
.
In general, the Hilbert series does depend on the specic Groebner basis chosen, not only on the
ideal I. However, almost all of the relevant algebraic and geometric properties (e.g., its degree, if it is a
polynomial) are actually invariants associated only with the ideal.
3 Examples
3.1 Graph ideals
Consider a graph G = (V, E), and dene the associated edge ideal I
G
= x
i
x
j
: (i, j) E. Notice that
I
G
is a monomial ideal. For instance, for the graph in Figure 1, the corresponding ideal is given by:
I
G
:= x
1
x
2
, x
2
x
3
, x
3
x
4
, x
4
x
5
, x
5
x
6
, x
1
x
6
, x
1
x
5
, x
3
x
5
.
One of the motivations for studying this kind of ideals is that many graph-theoretic properties (e.g.,
bipartiteness, acyclicity, connectedness, etc) can be understood in terms of purely algebraic properties
of the corresponding ideal. This enables the extension and generalization of these notions to much more
abstract settings (e.g., simplicial complexes, resolutions, etc).
15-3
For our purposes here, rather than studying I
G
directly, we will instead study the ideal obtained
when restricting to zero-one solutions
1
. For this, consider the ideal I
b
dened as
I
b
:= x
2
1
x
1
, . . . , x
2
n
x
n
. (2)
Clearly, this is a zero-dimensional radical ideal, with the corresponding variety having 2
n
distinct points,
namely {0, 1}
n
. Its corresponding Hilbert series is H
I
b
(t) = (1 + t)
n
=
n
k=0
n
k
t
k
.
Since we want to study the intersection of the corresponding varieties, we must consider the sum
of the ideals, i.e., the ideal I := I
G
+ I
b
. It can be shown that the given set of generators (i.e., the
ones corresponding to the edges, and the quadratic relations in (2)) are always a Groebner basis of the
corresponding ideal. What are the standard monomials? How can they be interpreted in terms of the
graph?
The Hilbert function of the ideal I can be obtained from the Groebner basis. In this case, the
corresponding Hilbert function is given by
H
I
(t) = 1 + 6t + 7t
2
+ t
3
,
and we can read from the coecient of t
k
the number of stable sets of size k. In particular, the degree
of the Hilbert function (which is actually a polynomial, since the ideal is zero-dimensional) indicates the
size of the maximum stable set, which is equal to three in this example (for the subset {x
2
, x
4
, x
6
}).
3.2 Integer programming
Another interesting application of Groebner bases deals with integer programming. For more details,
see the papers [CT91, ST97, TW97].
Consider the integer programming problem
minc
T
x s.t.
_
_
Ax = b
x 0
x Z
n
(3)
where A Z
mn
, b Z
m
, and c Z
n
. For simplicity, we assume that A, c 0, and that we know a
feasible solution x
0
. These assumptions can be removed.
The main idea to solve (3) will be to interpret the nonnegative integer decision variables x as the
exponents of a monomial.
Complete ToDo
Example 7. Consider the problem data given by
A =
4 5 6 1
1 2 7 3
, b =
750
980
, c
T
=
1 2 3 4
.
An initial feasible solution is given by x
0
= [0, 30, 80, 120]
T
. We will work on the ring C[z
1
, z
2
, w
1
, w
2
, w
3
, w
4
].
Thus, we need to compute a Groebner basis G of the binomial ideal
z
4
1
z
2
w
1
, z
5
1
z
2
2
w
2
, z
6
1
z
7
2
w
3
, z
1
z
3
2
w
4
,
for a term ordering that combines elimination of the z
i
with the weight vector c. To obtain the solu-
tion, we compute the normal form of the monomial given by the initial feasible point, i.e., w
30
2
w
80
3
w
120
4
.
This reduction process yields the result w
8
2
w
106
3
w
74
4
, and thus the optimal solution is [0, 8, 106, 74]. The
corresponding costs of the initial and optimal solutions are c
T
x
0
= 780 and c
T
x
opt
= 630.
1
There are more ecient ways of doing this, that would not require adding generators. We adopt this approach to keep
the discussion relatively straightforward.
15-4
We should remark that there are more ecients ways of implementing this than the one described.
Also, although this basic method cannot currently compete with specialized techniques used in integer
programming, there are some particular cases where it is very ecient, mostly related with the solution
of parametric problems.
References
[CGT97] R. M. Corless, P. M. Gianni, and B. M. Trager. A reordered Schur factorization method for
zero-dimensional polynomial systems with multiple roots. In ISSAC 97: Proceedings of the
1997 international symposium on Symbolic and algebraic computation, pages 133140, New
York, NY, USA, 1997.
[CT91] P. Conti and C. Traverso. Buchberger algorithm and integer programming. In Applied algebra,
algebraic algorithms and error-correcting codes (New Orleans, LA, 1991), volume 539 of Lecture
Notes in Comput. Sci., pages 130139. Springer, Berlin, 1991.
[ST97] B. Sturmfels and R. Thomas. Variation of cost functions in integer programming. Math.
Programming, 77(3, Ser. A):357387, 1997.
[TW97] R. Thomas and R. Weismantel. Truncated Grobner bases for integer programming. Appl.
Algebra Engrg. Comm. Comput., 8(4):241256, 1997.
15-5
MIT 6.256 - Algebraic techniques and semidenite optimization April 16, 2010
Lecture 16
Lecturer: Pablo A. Parrilo Scribe: ???
1 Generalizing the Hermite matrix
Recall the basic construction of the Hermite matrix H
q
(p) in the univariate case, whose signature gave
important information on the signs of the polynomial q(x) on the real roots of p(x).
In a very similar way to the extension of the companion matrix to the multivariate case, we can
construct an analogue of the Hermite form for general zero-dimensional ideals. The basic idea is again
to consider the zero-dimensional ideal I R[x
1
, . . . , x
n
], and an associated basis of the quotient ring
B = {x
1
, . . . , x
m
}, where the elements of B are standard monomials.
For simplicity, we assume rst that I is radical. In this case, the corresponding nite variety is given
by m distinct points, i.e., V (I) = {r
1
, . . . , r
m
} C
n
. Notice rst that, by the denition of the matrices
M
xi
, we have
m
i=1
r
i
= Tr[M
1
x1
M
n
xn
]. Thus, in a similar way as we did in the univariate case, for
any polynomial q =
we have
m
i=1
q(r
i
) = Tr[q(M
x1
, . . . , M
xn
)]. (1)
Once again, this implies that if we have access to matrix representations M
x1
, . . . , M
xn
, then we can
explicitly evaluate these expressions. Notice also that, if both q and the generators of the ideal have
rational coecients, then the expression above is also a rational number (even if the roots are not).
Example 1. Consider the system in Example 4 of the previous lecture, and the polynomial p(x, y, z) =
(x + y + z)
2
. To evaluate the sum of the values that this polynomial takes on the variety, we compute:
p(M
x
, M
y
, M
z
) = Tr (M
x
+ M
y
+ M
z
)
2
= Tr
0 0 0 0 0
2 3 2 2 2
3 2 3 2 2
2 2 2 3 2
2 2 2 2 3
= 12.
As expected, the squares of the sum of the coordinates of each of the ve roots are {0, 9, 1, 1, 1}, with the
total sum being equal to 12.
Given any q R[x
1
, . . . , x
n
], we can then dene a Hermite-like matrix H
q
(I) as
[H
q
(I)]
jk
:=
m
i=1
q(r
i
)r
j+
k
i
. (2)
Notice that the rows and columns of H
q
(I) are indexed by standard monomials.
Consider now a vector f = [f
1
, . . . , f
m
]
T
, and the quadratic form
f
T
H
q
(I)f :=
m
j,k=1
m
i=1
q(r
i
)(f
j
r
j
i
)(f
k
r
k
i
)
=
m
i=1
q(r
i
)(f
1
r
1
i
+ + f
m
r
m
i
)
2
= Tr[(qf
2
)(M
x1
, . . . , M
xn
)].
(3)
16-1
As we see, the matrix H
q
(I) is a specic representation, in a basis given by standard monomials,
of a quadratic form H
q
: C[x]/I C, with H
q
: f
m
i=1
(qf
2
)(r
i
). The expressions in (3) allow
us to explicitly compute a matrix representation of this quadratic map. (What is the other natural
representation of this map?)
The following theorem then generalizes the results of the univariate case, and enable, among other
things, to do root counting.
Theorem 2. The signature of the matrix H
q
(I) is equal to the number of real points r
i
in V (I) for
which q(r
i
) > 0, minus the number of real points for which q(r
i
) < 0.
Corollary 3. Consider a zero dimensional ideal I. The signature of the matrix H
1
(I) is equal to the
number of real roots, i.e., |V (I) R
n
|.
In the general (non-radical) case, we would take the property (3) as the denition of H
q
(I), instead
of (2). Also, in Theorem 2, multiple real zeros are counted only once.
2 Parametric versions
One of the most appealing properties of Groebner-based eigenvalue methods is that they allow us to
extend many of the results to the parametric case, i.e., when we are interested in obtaining all solutions
of a polynomial system as a function of some additional parameters
i
.
Consider for simplicity the case of a single parameter , and a polynomial system dened by p
i
(x, ) =
0. In order to solve this for any xed , we need to compute a Groebner basis of the corresponding
ideal. However, when changes, it is possible that the resulting set of polynomials is no longer a GB. A
way of xing this inconvenience is to compute instead a comprehensive Groebner basis, which is a set of
polynomials with the the property that it remains a Groebner basis of I for all possible specializations
of the parameters. Using the corresponding monomials as a basis for the quotient space, we can give an
eigenvalue characterization of the solutions for all values of .
3 SOS on quotients
For simplicity, we assume throughout that the ideal I is radical. We can interpret the previous result as
essentially stating the fact that when a polynomial is nonnegative on a nite variety, then it is a sum of
squares on the quotient ring; see [Par02].
Theorem 4. Let f(x) be nonnegative on {x R
n
|h
i
(x) = 0}. If the ideal I = h
1
, . . . , h
m
is radical,
then f(x) is a sum of squares in the quotient ring R[x]/I, i.e.. there exist polynomials q
i
,
i
, such that
f(x) =
i
q
2
i
(x) +
m
i=1
i
(x)h
i
(x).
Remark 5. The assumption that I is radical (or a suitable local modication) is necessary when f(x) is
nonnegative but not strictly positive. For instance, the polynomial f = x is nonnegative over the variety
dened by the (non-radical) ideal x
2
, although no decomposition of the form x = s
0
(x) +(x)x
2
(where
s
0
is SOS), can possibly exist.
References
[Par02] P. A. Parrilo. An explicit construction of distinguished representations of polynomials non-
negative over nite sets. Technical Report IfA Technical Report AUT02-02. Available from
http://control.ee.ethz.ch/~parrilo, ETH Z urich, 2002.
16-2
MIT 6.256 - Algebraic techniques and semidenite optimization April 21, 2010
Lecture 17
Lecturer: Pablo A. Parrilo Scribe: ???
One of our main goals in this course is to achieve a better understanding of the techniques available
for polynomial systems over the real eld. Today we discuss how to certify infeasibility for polynomial
equations over the reals, and contrast these approaches with well-known results in linear algebra, linear
programming, and complex algebraic geometry.
We will discuss the possible convergence of these schemes in the general case later in the course,
concentrating today on an elementary proof of the nite convergence in the zero-dimensional case [Par02].
1 Infeasibility of real polynomial equations
Based on what we have learned in the past weeks, we have a quite satisfactory answer to the question
of when a system of polynomial equations has solutions over the complex eld. Indeed, as we have seen,
given a system of polynomial equations {h
i
(x) = 0, i = 1, . . . , m}, we can form the associated ideal
I = h
1
, . . . , h
m
. By the Nullstellensatz, the associated complex variety V (I) (i.e., the solution set
{x C
n
| h
i
(x) = 0}) will be empty if and only if I = C[x], or equivalently, 1 I. Computationally,
this condition can be checked by computing a reduced Groebner basis of I (with respect to any term
ordering), which will be equal to {1} if this holds.
What happens, however, when we are interested in real solutions, and not just complex ones? Or,
if not only we have equations, but also inequalities? Consider, for instance, the basic semialgebraic set
given by
S = {x R
n
| f
i
(x) 0, h
i
(x) = 0}. (1)
How to decide if the set S is empty? Can we give a Groebner-like criterion to demonstrate the infea-
sibility of this system of equations? Even worse, do we even know that this question can be decided
algorithmically
1
?
Fortunately for us, a famous result, the Tarski-Seidenberg theorem, guarantees the algorithmic solv-
ability of this problem (in fact, of a much larger class of problems, that may include quantiers). We will
discuss this powerful approach in more detail later, when presenting cylindrical algebraic decomposition
(CAD) techniques, concentrating instead in a more direct way of tackling the feasibility problem.
2 Certicates
Discuss certificates: NP/co-NP, Linear algebra, LP, Nullstellensatz, P-satz ToDo
3 The zero-dimensional case
What happens in the case where the equations in the system (1) dene a zero dimensional ideal? It
should be intuitively obvious that, in some sense, such a nite certicate exists. Indeed, if we had access
to all the roots, of which there are a nite number, just by evaluating the corresponding expressions we
could decide the feasibility or infeasibility. As we will see, we can actually encode this process in a set
of polynomials, that prove the existence of these certicates.
1
There are certainly similar-looking problems that are not decidable. A famous one is the solvability of polynomial
equations over the integers. This is Hilberts 10th problem, solved in 1970 by Matiyasevich; see [Dav73] for a full account
of the solution and historical remarks. This result implies, in particular, the nonexistence of an algorithm to solve integer
quadratic programming; see [Jer73].
17-1
Theorem 1. Consider the set S in (1), and assume the ideal I = h
1
, . . . , h
m
is radical. Then, S if
empty if and only if there exists a decomposition
1 = s
0
(x) +
i=1
s
i
(x)f
i
(x) +
i=1
i
(x)h
i
(x).
where the s
i
are sums of squares.
Notice that we can equivalently write
1 s
0
(x) +
i=1
s
i
(x)f
i
(x) mod I.
It should be clear that one direction of the implication is obvious (which one?).
4 Optimization
Since optimization can be interpreted as a parametrized family of feasibility problems, we can directly
apply these results towards optimization of polynomial or rational functions. For instance, we have the
following result:
Theorem 2. Let p(x) be nonnegative on S = {x R
n
|f
i
(x) 0, h
i
(x) = 0}, and assume that the ideal
I = h
1
, . . . , h
m
is radical. Consider the optimization problem
max s.t. p(x) = s
0
(x) +
i=1
s
i
(x)f
i
(x) +
i=1
i
(x)h
i
(x).
where the s
i
are sums of squares, and the decision variables are and the coecients of the polynomials
s
i
(x),
i
(x). Then, the optimal value of is exactly equal to the minimum of p(x) over S.
Notice that this is exactly a sum of squares program, since all the constraints are linear and/or sum
of squares constraints.
Remark 3. The assumption that I is radical (or a suitable local modication) is necessary when p(x) is
nonnegative but not strictly positive. For instance, the polynomial p = x is nonnegative over the variety
dened by the (non-radical) ideal x
2
, although no decomposition of the form x = s
0
(x) +(x)x
2
(where
s
0
is SOS), can possibly exist.
More details will follow...
References
[Dav73] M. Davis. Hilberts tenth problem is unsolvable. Amer. Math. Monthly, 80:233269, 1973.
[Jer73] R.G. Jeroslow. There cannot be any algorithm for integer programming with quadratic con-
straints. Operations Res., 21:221224, 1973.
[Par02] P. A. Parrilo. An explicit construction of distinguished representations of polynomials non-
negative over nite sets. Technical Report IfA Technical Report AUT02-02. Available from
http://control.ee.ethz.ch/~parrilo, ETH Z urich, 2002.
17-2
MIT 6.256 - Algebraic techniques and semidenite optimization April 23, 2010
Lecture 18
Lecturer: Pablo A. Parrilo Scribe: ???
Quantier elimination (QE) is a very powerful procedure for problems involving rst-order formulas
over real elds. The cylindrical algebraic decomposition (CAD) is a technique for the ecient im-
plementation of QE, that eectively reduces an seemingly innite problem into a nite (but potentially
large) instance. For much more information about QE and CAD (including a reprint of Tarskis original
1930 work), we recommend the book [CJ98].
1 Quantier elimination
A quantier-free formula is an expression consisting of polynomial equations (f(x) = 0) and inequalities
(f(x) 0) combined using the Boolean operators (and), (or), and (implies). We often also allow
strict inequalities f(x) > 0 and inequations f(x) = 0, since these are just shorthands for particular
boolean combinations of equations and inequalities.
In general, a formula (in prenex form) is an expression in the variables x = (x
1
, ..., x
n
) of the following
type:
(Q
1
x
1
)...(Q
s
x
s
) F(f
1
(x), ..., f
r
(x)) (1)
where Q
i
is one of the quantiers (for all) and (there exists). Furthermore, F(f
1
(x), ..., f
r
(x)) is
assumed to be a quantier-free formula. If there is a quantier corresponding to the variable x
i
, we say
that x
i
is quantied, or free otherwise.
Example 1. The following are valid formulas
(x) [(x 0) (x
2
+ax +b 0)]
(x)(y) [x > y
2
]
()() [(
2
+
2
1) ( = 0)] [ < 1].
The rst formula has two free variables (since the variables a and b are unquantied), while for the other
two all variables are quantied.
We will interpret the symbols in a formula as taking only real values. Notice that a formula without
free variables (usualled called a closed formula or a sentence) is either true or false. For instance, the
last two expressions in Example 1 are sentences, with the rst one being false and the second being true.
Notice also that the truth value may depend on the order of the quantiers.
Tarski showed that for every formula including quantiers there is always an equivalent quantier
free formula. Obtaining the latter from the former is called quantier elimination.
Theorem 2 (Tarski-Seidenberg). For every rst-order formula over the real eld there exists an equiv-
alent quantier-free formula. Furthermore, there is an explicit algorithm to compute this quantier-free
formula.
The Tarski-Seidenberg theorem is an extremely powerful result, since it provides a complete charac-
terization and algorithmic technique for an extremely large collection of problems involving polynomials.
Unfortunately, there are very serious computational barriers to the ecient practical implementation of
these ideas, since the resulting algorithms have extremely poor scaling properties, with respect to the
number of variables (towers of exponentials). Newer methods, such as the (partial) cylindrical algebraic
decomposition (CAD) technique due to Collins and described below, or the critical point method, are by
comparison much better. Nevertheless, by necessity they still behave exponentially (or worse) in terms
of the number of variables.
18-1
2 Tarski-Seidenberg
Example 3. Consider the quantied rst-order formula:
(x)(y) [(x
2
+ay
2
1) (ax
2
a
2
xy + 2 0)]. (2)
This formula is equivalent to the quantier free expression:
(a 0) (a
3
8a 16 0),
which denes the interval [0, a
], where a
].
2.1 Geometric interpretation
A geometric interpretation of the Tarski-Seidenberg theorem is the following:
Theorem 4. The projection of a semialgebraic set is semialgebraic.
2.2 Applications
Add many more ToDo
Static output feedback An early application of Tarski-Seidenberg in control theory was the so-
lution of the static output feedback stabilization problem in [ABJ75]. Given matrices A R
nn
,
B R
nm
, we want to nd a matrix K R
mn
such that the matrix A + BK is Hurwitz, i.e., all its
eigenvalues are in the left-hand plane. Since the existence of such a matrix can be easily expressed as a
formula in rst order logic
1
, the decidability and existence of an eective (but not ecient) algorithm
immediately follows.
Simultaneous stabilization A very interesting result by Blondel [Blo94, BG93] shows that the si-
multaneous stabilization of three linear time-invariant systems is not decidable (and thus, cannot be
semialgebraic). Notice however that, for any given bound on the degree of the controller, the problem
is decidable.
3 Cylindrical Algebraic Decomposition (CAD)
There are a few approaches for eective implementation of the QE procedure. One of the most well-
known, which is also relatively easy to understand, is the cylindrical algebraic decomposition (CAD)
due to Collins [Col75]. We describe the elements of this approach below. We remark that much better
algorithms (in the theoretical complexity sense) are known; see for instance the article by Renegar
[Ren91] (also reprinted in [CJ98]) or [BPR03]. In particular, for CAD the number of operations usually
scales in a doubly exponential fashion with the number of variables, while the newer methods are doubly
exponential in the number of quantier alternations.
1
For instance, (K)(x)() [(A + BK)x = x x = 0] [() 0]. Notice that we are being a bit sloppy with
notation, since for a fully real formulation, we should split x and into real and imaginary parts. There are many other
equivalent expressions, using for instance a Lyapunov equation, or the Routh array.
18-2
3.0.1 Description
Given a set P of multivariate polynomials in n variables, a CAD is a special partition of R
n
into com-
ponents, called cells, over which all the polynomials have constant signs. The algorithm for computing
a CAD also provides a point in each cell, called sample point, which can be used to determine the sign
of the polynomials in the cell.
A cell is called cylindrical if it has the form S R
k
, for some k n. A decomposition of R
n
is a
CAD if all polynomials have constant sign on each cell, and all cells are cylindrical.
The CAD associated to the formula (1) depends only on its quantier free part F(f
1
(x), ..., f
r
(x)).
Since all possible truth values of the formula are in correspondence with the values at the sample points,
we can use the CAD to evaluate its truth value, and to perform quantier elimination.
The basic CAD construction consists of two steps: projection and lifting (plus an additional third
one, if formula construction is desired).
In the rst projection phase, we computes successive sets of polynomials in n1, n2, ..., 1 variables.
The main idea is, given an input set of polynomials, to compute at each step a new set of polynomials
obtained by eliminating one variable at a time. In general, the elimination order does matter and a good
choice leads to lower computational complexity.
The second phase (lifting) constructs a decomposition of R, at the lowest level of projection, af-
ter all but one variable have been eliminated. This decomposition of R is successively extended to a
decomposition of R
n
.
The basic operations necessary in the construction of CADs are (sub)resultants and (sub)discriminants.
Complete ToDo
An implementation of (an improved version of) the CAD method for quantier elimination is the
software package QEPCAD [Bro03].
References
[ABJ75] B. D. O. Anderson, N. K. Bose, and E. I. Jury. Output feedback stabilization and related
problemssolution via decision methods. IEEE Transactions on Automatic Control, 20:53
66, 1975.
[BG93] V. Blondel and M. Gevers. Simultaneous stabilizability of three linear systems is rationally
undecidable. Mathematics of Control, Signals, and Systems, 6(2):135145, 1993.
[Blo94] V. Blondel. Simultaneous stabilization of linear systems, volume 191 of Lecture Notes in Control
and Information Sciences. Springer-Verlag London Ltd., London, 1994.
[BPR03] S. Basu, R. Pollack, and M.-F. Roy. Algorithms in real algebraic geometry, volume 10 of
Algorithms and Computation in Mathematics. Springer-Verlag, Berlin, 2003.
[Bro03] C.W. Brown. QEPCAD - Quantier Elimination by Partial Cylindrical Algebraic Decomposi-
tion, 2003. Available from http://www.cs.usna.edu/
~
qepcad/B/QEPCAD.html.
[CJ98] B. F. Caviness and J. R. Johnson, editors. Quantier elimination and cylindrical algebraic de-
composition, Texts and Monographs in Symbolic Computation, Vienna, 1998. Springer-Verlag.
[Col75] G. E. Collins. Quantier elimination for real closed elds by cylindrical algebraic decomposition.
In Automata theory and formal languages (Second GI Conf., Kaiserslautern, 1975), pages 134
183. Lecture Notes in Comput. Sci., Vol. 33. Springer, Berlin, 1975.
18-3
[Ren91] J. Renegar. Recent progress on the complexity of the decision problem for the reals. In
Discrete and computational geometry (New Brunswick, NJ, 1989/1990), volume 6 of DIMACS
Ser. Discrete Math. Theoret. Comput. Sci., pages 287308. Amer. Math. Soc., Providence, RI,
1991.
18-4
MIT 6.256 - Algebraic techniques and semidenite optimization April 30, 2010
Lecture 19
Lecturer: Pablo A. Parrilo Scribe: ???
Today we continue with some additional aspects of quantier elimination. We will then recall the
Positivstellensatz and its relations with semidenite programming. After introducing copositive matrices,
we present Polyas theorem on positive forms on the simplex, and the associated relaxations. Finally,
we conclude with an important result due to Schm udgen about representation of positive polynomials
on compact sets.
1 Certicates
Talk about certificates in QE ToDo
2 Psatz revisited
Recall the statement of the Positivstellensatz.
Theorem 1 (Positivstellensatz). Consider the set S = {x R
n
| f
i
(x) 0, h
i
(x) = 0}. Then,
S = f, h R[x] s.t.
f +h = 1
f cone{f
1
, . . . , f
s
}
h ideal{h
1
, . . . , h
t
}
Once again, since the conditions on the polynomials f, h are convex and ane, respectively, by re-
stricting their degree to be less than or equal to a given bound d we have a nite-dimensional semidenite
programming problem.
2.1 Hilbert 17th problem
As we have seen, in the general case nonnegative multivariate polynomials can fail to be a sum of squares
(the Motzkin polynomial being the classical counterxample). As part of his famous list of twenty-three
problems that he presented at the International Congress of Mathematicians in 1900, David Hilbert
asked the following
1
:
17. Expression of denite forms by squares. A rational integral function or form
in any number of variables with real coecient such that it becomes negative for no real
values of these variables, is said to be denite. The system of all denite forms is invariant
with respect to the operations of addition and multiplication, but the quotient of two denite
forms in case it should be an integral function of the variables is also a denite form. The
square of any form is evidently always a denite form. But since, as I have shown, not every
denite form can be compounded by addition from squares of forms, the question arises
which I have answered armatively for ternary forms whether every denite form may not
be expressed as a quotient of sums of squares of forms. At the same time it is desirable, for
certain questions as to the possibility of certain geometrical constructions, to know whether
the coecients of the forms to be used in the expression may always be taken from the realm
of rationality given by the coecients of the form represented.
1
This text was obtained from http://aleph0.clarku.edu/
~
djoyce/hilbert/, and corresponds to Newsons translation
of Hilberts original German address. In that website you will also nd links to the current status of the problems, as well
as the original German text.
19-1
In other words, can we write every nonnegative polynomial as a sum of squares of rational functions?
As we we show next, this is a rather direct consequence of the Psatz. Of course, it should be clear (and
goes without saying) that we are (badly) inverting the historical order! In fact, much of the motivation
for the development of real algebra came from Hilberts question.
How can we use the Psatz to prove that a polynomial p(x) is nonnegative? Clearly, p is nonnegative
if and only if the set {x R
n
| p(x) < 0} is empty. Since our version of the Psatz does not allow for
strict inequalities (there are slightly more general, though equivalent, formulations that do), well need
a useful trick discussed earlier (Rabinowitchs trick). Introducing a new variable z, the nonnegativity
of p(x) is equivalent to the emptiness of the set described by
p(x) 0, 1 zp(x) = 0.
The Psatz can be used to show that this holds if and only if there exist polynomials s
0
, s
1
, t R[x, z]
such that
s
0
(x, z) s
1
(x, z) p +t(x, z) (1 zp) = 1,
where s
0
, s
1
are sums of squares. Replace now z 1/p(x), and multiply by p
2k
(where k is suciently
large) to obtain
s
0
s
1
p = p
2k
,
where s
0
, s
1
are sums of squares in R[x]. Solving now for p, we have:
p(x) =
s
0
(x) +p(x)
2k
s
1
(x)
=
s
1
(x)( s
0
(x) +p(x)
2k
)
s
2
1
(x)
,
and since the numerator is a sum of squares, it follows that p(x) is indeed a sum of squares of rational
functions.
3 Copositive matrices and P olyas theorem
An interesting class of matrices are the so-called copositive matrices, which are those for which the
associated quadratic form is nonnegative on the nonnegative orthant.
Denition 2. A matrix M S
n
is copositive is it satises
x
T
Mx 0, for all x
i
0.
As opposed to positive semideniteness, which can be checked in polynomial time, the recognition
problem for copositive matrices is an NP-hard problem. The set of copositive is a closed convex cone,
for which checking membership is a dicult problem. Its dual cone is the set of completely positive
matrices:
Denition 3. A matrix W S
n
is completely positive if it is the sum of outer products of nonnegative
vectors, i.e.,
W =
m
i
x
i
x
T
i
, x
i
0.
Alternatively, it factors as W = FF
T
, where F is a nonnegative matrix (i.e., F = [x
1
, . . . , x
m
] R
mn
).
A good reference on completely positive matrices is [BSM03].
19-2
Applications There are many interesting applications of copositive and completely positive matrices.
Among others, we mention:
Consider a graph G, with A being its the adjacency matrix. The stability number of the graph
G is equal to the cardinality of its largest stable set. By a result of Motzkin and Straus, it is known
that it can be obtained as:
1
(G)
= min
xi0,
i
xi=1
x
T
(I +A)x
This implies that (G) if and only if the matrix (I +A) ee
T
is copositive.
In the analysis of linear dynamical systems with piecewise ane dynamics, it is often convenient to
use piecewise-quadratic Lyapunov functions. In this case, we need to verify positivity conditions
of an indenite quadratic on a polyhedron. To make this precise, consider an ane dynamical
system x = Ax +b, a polyhedron S and a Lyapunov function V (x) dened by:
S :=
x R
n
| L
x
1
, V (x) =
x
1
T
P
x
1
.
Then, conditions for V and
A b
0 0
A b
0 0
T
P L
T
C
2
L,
with C
1
, C
2
copositive.
Another interesting application of copositive matrices is in the performance analysis of queueing
networks; see e.g. [KM96]. Modulo some (important) details, the basic idea is to use a quadratic
function x
T
Mx as a Lyapunov function, where the matrix M is copositive and x represents the
lengths of the queues.
An important related result is Polyas theorem on positive forms on the simplex:
Theorem 4 (Polya). Consider a homogeneous polynomial in n variables of degree d, that is strictly
positive in the unit simplex
n
:= {x R
n
| x
i
0,
n
i=1
x
i
= 1}. Then, for large enough k, the
polynomial (x
1
+ +x
n
)
k
p(x) has nonnegative coecients.
A natural sucient condition for a matrix M to be copositive is if we can express it as the sum of a
positive semidenite matrix and a nonnegative matrix, i.e.,
M = P +N, P 0, N
ij
0.
It is clear that this condition can be checked via SDP. In fact, it exactly corresponds to the condition
that the polynomial p(z
1
, . . . , z
n
) := z
T
Mz be SOS, where z := [z
2
1
, . . . , z
2
n
]
T
.
It is possible to provide a natural hierarchy of sucient conditions for a matrix to be copositive.
Completeness of this hierarchy follows directly from Polyas theorem [Par00].
There are some very interesting connections between Polyas result and a foundational theorem in
probability known as De Finettis exchangeability theorem.
Expand... ToDo
19-3
4 Positive polynomials
The Positivstellensatz allows us to obtain certicates of the emptiness of a basic semialgebraic set,
explicitly given by polynomials.
What if we want to apply this for optimization? As we have seen, it is relatively straightforward to
convert an optimization problem to a family of feasibility problems, by considering the sublevel sets, i.e.,
the sets {x R
n
| f(x) }.
In the general case of constrained problems, however, using the Psatz we will require conditions that
are not linear in the unknown parameter (because we need products between the contraints), and this
presents a diculty to the direct use of SDP. Notice nevertheless, that the problem is certainly an SDP
for any xed value of , and it thus quasiconvex (which is almost as good, except for the fact that we
cannot use standard SDP solvers to solve it directly, but rather rely on methods such as bisection).
Theorem 5 ([Sch91]). If p(x) is strictly positive on K = {x R
n
| f
i
(x) 0}, and K is compact, then
p(x) cone{f
1
, . . . , f
s
}.
In the next lecture we will describe the basic elements of Schm udgens proof. His approach com-
bines both algebraic tools (using the Positivstellensatz to prove the boundedness of certain operators)
and functional analysis (spectral measures of commuting families of operators and the Hahn-Banach
theorem). We will also describe some alternative versions due to Putinar, as well as a related purely
functional-analytic result due to Megretski.
For a comprehensive treatment and additional references, we mention [BCR98, Mar00, PD01] among
others.
References
[BCR98] J. Bochnak, M. Coste, and M-F. Roy. Real Algebraic Geometry. Springer, 1998.
[BSM03] A. Berman and N. Shaked-Monderer. Completely positive matrices. World Scientic, 2003.
[KM96] P. R. Kumar and S. P. Meyn. Duality and linear programs for stability and performance
analysis of queuing networks and scheduling policies. IEEE Trans. Automat. Control, 41(1):4
17, 1996.
[Mar00] M. Marshall. Positive polynomials and sums of squares. Dottorato de Ricerca in Matematica.
Dept. di Mat., Univ. Pisa, 2000.
[MJ81] D.H. Martin and D.H. Jacobson. Copositive matrices and deniteness of quadratic forms
subject to homogeneous linear inequality constraints. Linear Algebra and its Applications,
35:227258, 1981.
[Par00] P. A. Parrilo. Structured semidenite programs and semialgebraic geometry methods in robust-
ness and optimization. PhD thesis, California Institute of Technology, May 2000. Available at
http://resolver.caltech.edu/CaltechETD:etd-05062004-055516.
[PD01] A. Prestel and C. N. Delzell. Positive polynomials: from Hilberts 17th problem to real algebra.
Springer Monographs in Mathematics. Springer, 2001.
[Sch91] K. Schm udgen. The K-moment problem for compact semialgebraic sets. Math. Ann., 289:203
206, 1991.
19-4
MIT 6.256 - Algebraic techniques and semidenite optimization May 5, 2010
Lecture 20
Lecturer: Pablo A. Parrilo Scribe: ???
In this lecture we introduce Schm udgens theorem about the K-moment problem (or equivalently, on
the representation of positive polynomials) and describe the basic elements in his proof. This approach
combines both algebraic tools (using the Positivstellensatz to prove the boundedness of certain operators)
and functional analysis (spectral measures of commuting families of operators and the Hahn-Banach
theorem). We will also describe some alternative versions due to Putinar, as well as a related purely
functional-analytic result due to Megretski.
For a comprehensive treatment and additional references, we mention [BCR98, Mar00, PD01] among
others.
1 Positive polynomials
As we have seen, the Positivstellensatz allows us to obtain certicates of the emptiness of a basic
semialgebraic set, explicitly given by polynomials. When looking for bounded degree certicates, this
provides a natural hierarchy of SDP-based conditions [Par00, Par03].
What if we want to apply this for the particular case of optimization? As we have seen, it is relatively
straightforward to convert a polynomial optimization problem to a one-parameter family of feasibility
problems, by considering the sublevel sets, i.e., the sets {x R
n
| f(x) }.
In the general case of constrained problems, however, using the full power of the Psatz will yield
conditions that are not linear in the unknown parameter (because we need products between the
constraints and objective function), and in principle, this presents a diculty to the direct use of SDP.
Notice nevertheless, that the problem is certainly an SDP for any xed value of , and is thus quasiconvex
(which is almost as good, except for the fact that we cannot use standard SDP solvers to solve it
directly, but rather rely on methods such as bisection).
Of course, we can always produce specic families of certicates that are linear in , and use them
for optimization (e.g., like we did in the copositivity case). However, in general it is unclear whether the
desired family is complete, in the sense that we will be able to prove arbitrarily good bounds on the
optimal value as the degree of the polynomials grows to innity.
2 Schm udgens theorem
In 1991, Schm udgen presented a characterization of the moment sequences of measures supported on a
compact semialgebraic K (the K-moment problem). As in the one-dimensional case we studied earlier
the question is, given an (innite) sequence of moments, decide whether it actually corresponds to a
nonnegative measure with support on a given set K.
His solution combined both real algebraic methods (the Psatz), with some functional analytic tools
(reproducing kernel Hilbert spaces, bounded operators, and the spectral theorem).
This characterization of moment sequences can be used, in turn, to produce an explicit description
of the set of strictly positive polynomials on a compact semialgebraic set:
Theorem 1 ([Sch91]). If p(x) is strictly positive on K = {x R
n
| f
i
(x) 0}, and K is compact, then
p(x) cone{f
1
, . . . , f
m
}.
expand ToDo
There are several interesting ideas in the proof; a coarse description follows. The rst step is to use
the Positivstellensatz to produce an algebraic certicate of the compactness of the set K. Then the
20-1
given moment sequence (which is a positive denite function on the semigroup of monomials) is used to
construct a particular pre-Hilbert space and its completion (namely, the associated reproducing kernel
Hilbert space). In this Hilbert space, we consider linear operators T
xi
given by multiplication by the
coordinate variables, and use the algebraic certicate of compactness to prove that these are bounded.
Now, the T
xi
are a nite collection of pairwise commuting, bounded, self-adjoint operators, and thus
there exists a spectral measure for the family, from which a measure, only supported in K, can be
extracted. Finally, a Hahn-Banach (separating hyperplane) argument is used to prove the nal result.
2.1 Putinars approach
The theorem in the previous section requires (in principle) all 2
m
1 squarefree products of constraints
1
.
Putinar [Put93] presented a modied formulation (under stronger assumptions) for which the represen-
tation is linear in the constraints. We introduce the following concept:
Denition 2. Let {f
1
, . . . , f
m
} R[x]. The preprime generated by the f
i
, and denoted by preprime{f
1
, . . . , f
m
}
is the set of all polynomials of the form s
0
+s
1
f
1
+ +s
m
f
m
, where all the s
i
are sums of squares.
Notice that preprime{f
i
} cone{f
i
}, and that every element in the preprime takes only nonneg-
ative values on {x R
n
, f
i
(x) 0}.
Theorem 3 ([Put93]). Consider a set K = {x R
n
| f
i
(x) 0}, such that there exists a q
preprime{f
1
, . . . , f
m
} and {x R
n
, q(x) 0} is compact (this implies that K is compact). Then,
p(x) > 0 on K if and only if p(x) preprime{f
1
, . . . , f
m
}.
Notice that here, the polynomial q serves as an algebraic certicate of the compactness of K, so in
this case the Psatz is not needed.
Putinars theorem was used by Lasserre to present a hierarchy of semidenite relaxations for poly-
nomial optimization, based on the dual moment interpretation [Las01].
2.2 Tradeos
In principle (and often, in practice) there is a tradeo between how expressive our family of certicates
is, the quality of the resulting bounds, and the complexity of nding proofs.
On one extreme, the most general method is the Psatz, as it encapsulates pretty much every possi-
ble algebraic deduction, and will certainly provide the strongest bounds, since it includes the other
techniques as special cases. For optimization, Schm udgens theorem provides the advantages of a linear
representation, although (possibly) at the cost of having a large number of products between the con-
straints. Finally, the Putinar approach has a reduced number of constraints (and thus, SOS multipliers),
although the obtained bounds can potentially be much weaker than the previous ones.
In the end, the decision concerning what approach to use should be dictated by the available com-
putational resources, i.e., the size of the SDPs that we can solve in a reasonable time. It is not dicult
to produce examples with signicant gaps between the corresponding bounds; see for instance [Ste96]
for a particularly simple example, that is trivial for the Psatz, but for which either the Schm udgen or
Putinar representations need large degree refutations.
Example 4. In [Ste96], Stengle presented an interesting example to assess the computational require-
ments of Schm udgens theorem. His concrete example was to nd a representation certifying the non-
negativity of f(x) := 1 x
2
over g(x) := (1 x
2
)
3
0.
The Positivstellansatz gives a very simple certicate of this property, or equivalently, the emptiness
of the set {g(x) 0, f(x) 0, zf(x) 1 = 0} (where we have used, as before, Rabinowitchs trick).
Indeed, we have the identity:
z
4
(f) g + (zf 1) (z
3
f
3
+z
2
f
2
+zf + 1) = 1.
1
Recall that in practice, this may not be a issue at all, since the restriction on the degree of the certicates imposes a
strict limit on how many products can be included.
20-2
Using a simple argument, Stengle proved in [Ste96], that no representation of the form (1) exists
when = 0.
(1 x
2
) + = Q(x) +P(x)(1 x
2
)
3
, (1)
where Q(x), P(x) are sums of squares.
Furthermore, he has shown that 0, the degrees of P, Q satisfying the identity necessarily have to
go to innity, and provided the bounds O(
1
2
) deg(P) O(
1
2
log
1
).
As an interesting aside, it can be shown that the optimal solution of this problem can be exactly
computed:
Theorem 5. Let the degree of P(x) be equal to 4N. Then, the optimal solution that minimizes in (1)
has:
N
=
1
(2N + 2)
2
1
, P(x) = p(x)
2
, Q(x) = q(x)
2
where
p(x) = 2(N + 1)
2
F
1
(N, N + 2 ;
1
2
; x
2
)
q(x) =
1
N
x
2
F
1
(N 1, N + 1 ;
3
2
; x
2
)
and
2
F
1
(a, b; c, x) is the standard Gauss hypergeometric function [AS64, Chapter 15].
2.3 Trigonometric case
Recently, Megretski [Meg03] analyzed the trigonometric case. We introduce the following notation: let
T
n
= {z C
n
, |z
i
| = 1} be the n-dimensional torus, P
n
is the set of multivariate Laurent polynomials,
and RP
n
P
n
are the Laurent polynomials that are real-valued on T
n
.
Theorem 6 ([Meg03]). Let {F, Q
1
, . . . , Q
m
} RP
n
, such that F(z) > 0 for all z T
n
satisfying
Q
1
(z) = . . . = Q
m
(z) = 0. Then there exist V
1
, . . . , V
r
P
n
, H
1
, . . . , H
m
RP
n
, such that
F(z) =
r
i=1
|V
i
(z)|
2
+
m
j=1
H
j
(z)Q
i
(z).
Notice that, by splitting into real and imaginary part, this corresponds to a special kind of (standard)
polynomials, and a compact semialgebraic set (so in principle, any of the previous theorems would apply).
Of course, the result exploits the complex structure for a more concise representation.
In particular, Megretskis proof is purely functional-analytic, the main tools being Bochners theorem
and Hahn-Banach. Bochners theorem is an important result in harmonic analysis, that characterizes a
positive denite function on an Abelian group in terms of the nonnegativity of its Fourier transform.
Notice that the theorem above deals only with the equality case (no inequalities), and the feasible set
is compact (since so it T
n
). It essentially states that a positive polynomial is a sum of squares modulo
the ideal generated by the Q
i
. Recall we have proved similar results in the zero-dimensional case, and
this theorem naturally generalizes these.
In simplied terms, one reason why trigonometric (or Laurent) polynomials are somewhat easier
than the general case is because in this case there is a group structure, as opposed to the semigroup
structure of regular monomials. For the group case, the corresponding theory is the classical harmonic
analysis on abelian groups (e.g., [Rud90]); while for semigroups there is the newer, but well-developed
characterizations of positive denite functions on (Abelian) semigroups; see for instance [BCR84].
We also mention that there are purely algebraic versions of these theorems, that do not use func-
tional analytic ideas (e.g., [Mar00]). Roughly, the role played by the compactness of K in proving the
boundedness of the operators T
xi
is replaced with a property called Archimedeanity of the corresponding
preorder.
20-3
References
[AS64] M. Abramowitz and I.A. Stegun, editors. Handbook of Mathematical Functions. Dover, 1964.
[BCR84] C. Berg, J. P. R. Christensen, and P. Ressel. Harmonic analysis on semigroups, volume 100 of
Graduate Texts in Mathematics. Springer-Verlag, New York, 1984.
[BCR98] J. Bochnak, M. Coste, and M-F. Roy. Real Algebraic Geometry. Springer, 1998.
[Las01] J. B. Lasserre. Global optimization with polynomials and the problem of moments. SIAM J.
Optim., 11(3):796817, 2001.
[Mar00] M. Marshall. Positive polynomials and sums of squares. Dottorato de Ricerca in Matematica.
Dept. di Mat., Univ. Pisa, 2000.
[Meg03] A. Megretski. Positivity of trigonometric polynomials. In Proceedings of the 42
th
IEEE Con-
ference on Decision and Control, pages 38143817, 2003.
[Par00] P. A. Parrilo. Structured semidenite programs and semialgebraic geometry methods in robust-
ness and optimization. PhD thesis, California Institute of Technology, May 2000. Available at
http://resolver.caltech.edu/CaltechETD:etd-05062004-055516.
[Par03] P. A. Parrilo. Semidenite programming relaxations for semialgebraic problems. Math. Prog.,
96(2, Ser. B):293320, 2003.
[PD01] A. Prestel and C. N. Delzell. Positive polynomials: from Hilberts 17th problem to real algebra.
Springer Monographs in Mathematics. Springer, 2001.
[Put93] M. Putinar. Positive polynomials on compact semi-algebraic sets. Indiana Univ. Math. J.,
42(3):969984, 1993.
[Rud90] W. Rudin. Fourier analysis on groups. Wiley Classics Library. John Wiley & Sons Inc., New
York, 1990.
[Sch91] K. Schm udgen. The K-moment problem for compact semialgebraic sets. Math. Ann., 289:203
206, 1991.
[Ste96] G. Stengle. Complexity estimates for the Schm udgen Positivstellensatz. J. Complexity,
12(2):167174, 1996.
20-4
MIT 6.256 - Algebraic techniques and semidenite optimization May 7, 2010
Lecture 21
Lecturer: Pablo A. Parrilo Scribe: ???
In this lecture we study techniques to exploit the symmetry that can be present in semidenite
programming problems, particularly those arising from sum of squares decompositions [GP04]. For this,
we present the basic elements of the representation theory of nite groups. There are many possible
applications of these ideas in dierent elds; for the case of Markov chains, see [BDPX05]. The celebrated
Delsarte linear programming upper bound for codes (and generalizations by Levenshtein, McEliece, etc.,
[DL98]) can be understood as a natural symmetry reduction of the SDP relaxations based on the Lovasz
theta function; see e.g. [Sch79].
1 Groups and their representations
The representation theory of nite groups is a classical topic; good descriptions are given in [FS92, Ser77].
We concentrate here on the nite case; extensions to compact groups are relatively straightforward.
Denition 1. A group consists of a set G and a binary operation dened on G, for which the
following conditions are satised:
1. Associative: (a b) c = a (b c), for all a, b, c G.
2. Identity: There exist 1 G such that a 1 = 1 a = a, for all a G.
3. Inverse: Given a G, there exists b G such that a b = b a = 1.
We consider a nite group G, and an n-dimensional vector space V . We dene the associated (innite)
group GL(V ), which we can interpret as the set of invertible n n matrices. A linear representation of
the group G is a homomorphism : G GL(V ). In other words, we have a mapping from the group
into linear transformations of V , that respects the group structure, i.e.
(st) = (s)(t) s, t G.
Example 2. Let (g) = 1 for all g G. This is the trivial representation of the group.
Example 3. For a more interesting example, consider the symmetric group S
n
, and the natural
representation : S
n
GL(C
n
), where (g) is a permutation matrix. For instance, for the group of
permutations of two elements, S
2
= {e, g}, where g
2
= e, we have
(e) =
1 0
0 1
, (g) =
0 1
1 0
.
The representation given in Example 3 has an interesting property. The set of matrices {(e), (g)}
have common invariant subspaces (other than the trivial ones, namely (0, 0) and C
2
). Indeed, we can eas-
ily verify that the (orthogonal) one-dimensional subspaces given by (t, t) and (t, t) are invariant under
the action of these matrices. Therefore, the restriction of to those subspaces also gives representations
of the group G. In this case, the one corresponding to the subspace (t, t) is equivalent (in a well-dened
sense) to the trivial representation described in Example 2. The other subspace (t, t) gives the one-
dimensional alternating representation of S
2
, namely
A
(e) = 1,
A
(g) = 1. Thus, the representation
decomposes as =
T
A
, a direct sum of the trivial and the alternating representations.
The same ideas extend to arbitrary nite groups.
Denition 4. An irreducible representation of a group is a linear representation with no nontrivial
invariant subspaces.
21-1
Non convex Convex
Figure 1: Two symmetric optimization problems, one non-convex and the other convex. For the latter,
optimal solutions always lie on the xed-point subspace.
Theorem 5. Every nite group G has a nite number of nonequivalent irreducible representations
i
,
of dimension d
i
. The relation
i
d
2
i
= |G| holds.
Example 6. Consider the group S
3
(permutations in three elements). This group is generated by the
two permutations s : 123 213 and c : 123 312 (swap and cycle), and has six elements
{e, s, c, c
2
, cs, sc}. Notice that c
3
= e, s
2
= e, and s = csc.
The group S
3
has three irreducible representations, two one-dimensional, and one two-dimensional
(so 1
2
+ 1
2
+ 2
2
= |S
3
| = 6). These are:
T
(s) = 1,
T
(c) = 1
A
(s) = 1,
A
(c) = 1
S
(s) =
0 1
1 0
,
S
(c) =
0
0
2
where = e
2i
3
is a cube root of 1. Notice that it is enough to specify a representation on the generators
of the group.
1.1 Symmetry and convexity
A key property of symmetric convex sets is the fact that the group average
1
|G|
gG
(g)x always
belongs to the set.
Therefore, in convex optimization we can always restrict the solution to the xed-point subspace
F := {x|(g)x = x, g G}.
In other words, for convex problems, no symmetry-breaking is ever necessary.
As another interpretation, that will prove useful later, the natural decision variables of a symmetric
optimization problem are the orbits, not the points themselves. Thus, we may look for solutions in the
quotient space.
1.2 Invariant SDPs
We consider a general SDP, described in geometric form. If L is an ane subspace of S
n
, and C, X S
n
,
an SDP is given by:
minC, X s.t. X X := L S
n
+
.
21-2
Denition 7. Given a nite group G, and associated representation : G GL(S
n
), a -invariant
SDP is one where both the feasible set and the cost function are invariant under the group action, i.e.,
C, X = C, (g)X, g G, X X (g)X X g G
Example 8. Consider the SDP given by
mina +c, s.t.
a b
b c
0,
which is invariant under the Z
2
action:
X
11
X
12
X
12
X
22
X
22
X
12
X
12
X
11
.
Usually in SDP, the group acts on S
n
through a congruence transformation, i.e., (g)M = (g)
T
M(g),
where is a representation of G on C
n
. In this case, the restriction to the xed-point subspace takes
the form:
(g)M = M = (g)M M(g) = 0, g G. (1)
The Schur lemma of representation theory exactly characterizes the matrices that commute with a group
action.
As a consequence of an important structural result (Schurs lemma), it turns out that every repre-
sentation can be written in terms of a nite number of primitive blocks, the irreducible representations
of a group.
Theorem 9. Every group representation decomposes as a direct sum of irreducible representations:
= m
1
1
m
2
2
m
N
N
where m
1
, . . . , m
N
are the multiplicities.
This decomposition induces an isotypic decomposition of the space
C
n
= V
1
V
N
, V
i
= V
i1
V
ini
.
In the symmetry-adapted basis, the matrices in the SDP have a block diagonal form:
(I
m1
M
1
) . . . (I
m
N
M
N
)
In terms of our symmetry-reduced SDPs, this means that not only the SDP block-diagonalizes, but
there is also the possibility that many blocks are identical.
1.3 Example: symmetric graphs
Consider the MAXCUT problem on the cycle graph C
n
with n vertices (see Figure 2). It is easy to see
that the optimal cut has cost equal to n or n 1, depending on whether n is even or odd, respectively.
What would the SDP relaxation yield in this case? If A is the adjacency matrix of the graph, then the
SDP relaxations have essentially the form
minimize Tr AX
s.t. X
ii
= 1
X 0
maximize Tr
s.t. A
diagonal
(2)
By the symmetry of the graph, the matrix A is circulant, i.e., A
ij
= a
ij mod n
.
21-3
Figure 2: The cyclic graph C
n
in n vertices (here, n = 9).
We focus now on the dual form. It should be clear that the cyclic symmetry of the graph in-
duces a cyclic symmetry in the SDP, i.e., if = diag(
1
,
2
, . . . ,
n
) is a feasible solution, then
= diag(
n
,
1
,
2
, . . . ,
n1
) is also feasible and achieves the same objective value. Thus, by av-
eraging over the cyclic group, we can always restrict D to be a multiple of the identity matrix, i.e.,
= I. Furthermore, the constraint A I can be block-diagonalized via the Fourier matrix (i.e., the
irreducible representations of the cyclic group), yielding:
A I 2 cos
k
n
k = 0, . . . , n 1.
From this, the optimal solution of the relaxation can be directly computed, yielding the exact expressions
for the upper bound on the size of the cut
mc(C
n
) SDP(C
n
) =
n neven
ncos
2
2n
nodd.
Although this example is extremely simple, exactly the same techniques can be applied to much more
complicated problems; see for instance [PP04, dKMP
+
06, Sch05, BV08] for some recent examples.
1.4 Example: even polynomials
Another (but illustrative) example of symmetry reduction is the case of SOS decompositions of even
polynomials. Consider a polynomial p(x) that is even, i.e., it satises p(x) = p(x). Does this symmetry
help in making the computations more ecient?
Complete ToDo
1.5 Benets
In the case of semidenite programming, there are many benets to exploiting symmetry:
Replace one big SDP with smaller, coupled problems.
Instead of checking if a big matrix is PSD, we use one copy of each repeated block (constraint
aggregation).
Eliminates multiple eigenvalues (numerical diculties).
For groups, the coordinate change depends only on the group, and not on the problem data.
Can be used as a general preprocessing scheme. The coordinate change T is unitary, so well-
conditioned.
As we will see in the next section, this approach can be extended to more general algebras that do not
necessarily arise from groups.
21-4
1.6 Sum of squares
In the case of SDPs arising from sum of squares decompositions, a parallel theory can be developed
by considering the symmetry-induced decomposition of the full polynomial ring R[x]. Since the details
involve some elements of invariant theory, we omit the details here; see [GP04] for the full story.
Include example ToDo
2 Algebra decomposition
An alternative (and somewhat more general) approach can be obtained by focusing instead on the
associative algebra generated by the matrices in a semidenite program.
Denition 10. An associative algebra A over C is a vector space with a C-bilinear operation : AA
A that satises
x (y z) = (x y) z, x, y, z A.
In general, associative algebras do not need to be commutative (i.e., x y = y x). However, that is
an important special case, with many interesting properties. Important examples of nite dimensional
associative algebras are:
Full matrix algebra C
nn
, standard product.
The subalgebra of square matrices with equal row and column sums.
The n-dimensional algebra generated by a single n n matrix.
The group algebra: formal C-linear combination of group elements.
Polynomial multiplication modulo a zero dimensional ideal.
The Bose-Mesner algebra of an association scheme.
We have already encountered some of these, when studying the companion matrix and its general-
izations to the multivariate case. A particularly interesting class of algebras (for a variety of reasons)
are the semisimple algebras.
Denition 11. The radical of an associative algebra A, denoted rad(A), is the intersection of all
maximal left ideals of A.
Denition 12. An associative algebra A is semisimple if Rad(A) = 0.
For a semidenite programming problem in standard (dual) form
max b
T
y s.t. A
0
i=1
A
i
y
i
0,
we consider the algebra generated by the A
i
.
Theorem 13. Let {A
0
, . . . , A
m
} be given symmetric matrices, and A the generated associative algebra.
Then, A is a semisimple algebra.
Semisimple algebras have a very nice structure, since they are essentially the direct sum of much
simpler algebras.
21-5
Theorem 14 (Wedderburn). Every nite dimensional semisimple associative algebra over C can be
decomposed as a direct sum
A = A
1
A
2
. . . A
k
.
Each A
i
is isomorphic to a simple full matrix algebra.
Example 15. A well-known example is the (commutative) algebra of circulant matrices, i.e., those of
the form
A =
a
1
a
2
a
3
a
4
a
4
a
1
a
2
a
3
a
3
a
4
a
1
a
2
a
2
a
3
a
4
a
1
.
Circulant matrices are ubiquitous in many applications, such as signal processing. It is well-known that
there exists a xed coordinate change (the Fourier matrix) under which all matrices A are diagonal (with
distinct scalar blocks).
Remark 16. In general, any associative algebra is the direct sum of its radical and a semisimple algebra.
For the n-dimensional algebra generated by a single matrix A C
nn
, we have that A = S + N, where
S is diagonalizable, N is nilpotent, and SN = NS. Thus, this statement is essentially equivalent to the
existence of the Jordan decomposition.
References
[BDPX05] S. Boyd, P. Diaconis, P. A. Parrilo, and L. Xiao. Symmetry analysis of reversible Markov
chains. Internet Math., 2(1):3171, 2005.
[BV08] C. Bachoc and F. Vallentin. New upper bounds for kissing numbers from semidenite
programming. J. Amer. Math. Soc, 21(3):909924, 2008.
[dKMP
+
06] E. de Klerk, J. Maharry, D.V. Pasechnik, R.B. Richter, and G. Salazar. Improved bounds
for the crossing numbers of K
m,n
and K
n
. SIAM Journal on Discrete Mathematics, 20:189,
2006.
[DL98] P. Delsarte and VI Levenshtein. Association schemes and coding theory. IEEE Transactions
on Information Theory, 44(6):24772504, 1998.
[FS92] A. Fassler and E. Stiefel. Group Theoretical Methods and Their Applications. Birkhauser,
1992.
[GP04] K. Gatermann and P. A. Parrilo. Symmetry groups, semidenite programs, and sums of
squares. Journal of Pure and Applied Algebra, 192(1-3):95128, 2004.
[PP04] P. A. Parrilo and R. Peretz. An inequality for circle packings proved by semidenite pro-
gramming. Discrete and Computational Geometry, 31(3):357367, 2004.
[Sch79] A. Schrijver. A comparison of the Delsarte and Lovasz bounds. IEEE Transactions on
Information Theory, 25(4), 1979.
[Sch05] A. Schrijver. New code upper bounds from the Terwilliger algebra and semidenite pro-
gramming. IEEE Transactions on Information Theory, 51(8):28592866, 2005.
[Ser77] J.-P. Serre. Linear Representations of Finite Groups. Springer-Verlag, 1977.
21-6
MIT 6.972 Algebraic techniques and semidenite optimization May 18, 2006
Lecture 22
Lecturer: Pablo A. Parrilo Scribe: ???
In this lecture we revisit a few questions raised at dierent points during the course, and reexamine
them under the light of the results we have learned. Finally, we point out several interesting research
directions and open problems.
1 SDP representations for convex semialgebraic sets
One of the main questions we were originally interested in is the possible existence of exact semidenite
representations for arbitrary convex basic semialgebraic sets. While the full question still remains open,
the SOS approach allows us to settle the approximation question.
Indeed, as we have seen, we have learned how to optimize arbitrary polynomials over semialgebraic
sets. Recall that convex sets are uniquely dened by their set of supporting hyperplanes. Therefore, by
minimizing arbitrary ane functions over the set, we can produce (an inner approximation of) all the
separating hyperplanes, and by duality, we obtain an outer approximation of the original set. These can
be directly interpreted in terms of a lifting of the original set into a higher-dimensional space.
As we have seen, in general, this procedure can only guarantee approximate representations, since
both the Putinar and Schm udgen-type results apply only to strictly positive polynomials (and thus,
only strictly separating hyperplanes are guaranteed). Of course, it is of interest to develop alternative
approximation procedures, or special problem classes, for which exact representations hold. An example
of this is provided in the next section.
2 Exact SDP representations for genus zero curves
We present now a particular class of convex sets in the plane, for which we can guarantee exact semidef-
inite representations.
A (real) plane algebraic curve is the set in R
2
dened by a single polynomial equation p(x, y) = 0.
An important invariant of a plane curve is its genus. This is usually dened algebraically, in terms of
the degree of the curve and the number of singularities. It also corresponds (in the nonsingular case) to
the topological genus of the associated Riemann surface, i.e., the complex surface p(z
1
, z
2
) = 0, where
z
i
inC.
The crucial property (for us) of genus zero curves is the fact that they are rational curves, i.e., it is
possible to parametrize them in terms of rational functions of a single parameter t, i.e., (x(t), y(t)) =
(
r1(t)
r2(t)
,
r3(t)
r4(t)
).
Example 1. Consider the lemniscate p(x, y) = y
4
+ 2y
2
x
2
+ x
4
+ y
2
x
2
, illustrated in Figure 1. This
curve has genus zero, and the rational representation:
x(t) =
t(1 + t
2
)
1 + t
4
, y(t) =
t(1 t
2
)
1 + t
4
.
For this class of curves, we have the following result:
Theorem 2 ([Para]). Consider a plane algebraic curve p(x, y) = 0 of genus zero. Consider the set S,
dened as the convex hull of a nite collection of closed segments of the curve. Then, the set S has an
exact representation in terms of semidenite constraints.
22-1
-1 -0.5 0 0.5 1
-0.4
-0.2
0
0.2
0.4
Figure 1: Lemniscate curve. This is a genus zero (rational) curve.
3 Outlook and additional topics
There are many interesting research directions and open problems in this general area. Of paramount
interest is a better integration between purely symbolic and purely numerical approaches. There are
signicant strenghts in both viewpoints, and a much more unied understanding of these is necessary.
SOS-based methods are a good ground for this in the case of real algebra, but are by no means the only
possible approach (see for instance [Ste04] for polynomial equation solving).
In this direction, several issues deserve a much more careful attention:
Numerical conditioning issues.
Ecient formulation and solution of SOS-based SDPs.
Among other interesting applications, we mention:
Games with strategy sets and payos that are semialgebraic [Parb, SOP].
Safety and performance analysis of hybrid systems [PJ04].
Separability and entanglement of quantum systems [DPS02, DPS04].
Geometric inequalities [PP04].
For systems and control applications, the volume [HG05] is a good starting point.
On the theoretical side of SOS/SDP methods, there are many aspects that deserve a better under-
standing, and several interesting new developments:
Polynomial matrix constraints [HS04, Koj03].
Sparse relaxations [KKW05]
Fast methods for SOS-based SDPs [GHND03, LP04, RV].
Incomplete, add many more ToDo
22-2
References
[DPS02] A. C. Doherty, P. A. Parrilo, and F. M. Spedalieri. Distinguishing separable and entangled
states. Physical Review Letters, 88(18), 2002.
[DPS04] A. C. Doherty, P. A. Parrilo, and F. M. Spedalieri. Complete family of separability criteria.
Physical Review A, 69:022308, 2004.
[GHND03] Y. Genin, Y. Hachez, Yu. Nesterov, and P. Van Dooren. Optimization problems over positive
pseudopolynomial matrices. SIAM J. Matrix Anal. Appl., 25(1):5779 (electronic), 2003.
[HG05] D. Henrion and A. Garulli, editors. Positive polynomials in Control, volume 312 of Lecture
Notes in Control and Information Sciences. Springer-Verlag, Berlin, 2005.
[HS04] C.W.J. Hol and C.W. Scherer. Sum of squares relaxations for polynomial semi-denite
programming. In Proceedings of the Sixteenth International Symposium on Mathematical
Theory of Networks and Systems (MTNS2004), 2004.
[KKW05] M. Kojima, S. Kim, and H. Waki. Sparsity in sums of squares of polynomials. Mathematical
Programming, 103(1):4562, May 2005.
[Koj03] M. Kojima. Sums of squares relaxations of polynomial semidenite programs. Research
report B-397, Dept. of Mathematical and Computing Sciences, Tokyo Institute of Technology,
2003.
[LP04] J. Lofberg and P. A. Parrilo. From coecients to samples: a new approach to SOS opti-
mization. In Proceedings of the 43
th
IEEE Conference on Decision and Control, 2004.
[Para] P. A. Parrilo. Exact semidenite representations for genus zero curves. Manuscript in
preparation, 2006.
[Parb] P.A. Parrilo. Polynomial games, minimax, and SOS optimization. Preprint, 2005.
[PJ04] S. Prajna and A. Jadbabaie. Safety verication of hybrid systems using barrier certicates.
In Rajeev Alur and George J. Pappas, editors, HSCC, volume 2993 of Lecture Notes in
Computer Science, pages 477492. Springer, 2004.
[PP04] P. A. Parrilo and R. Peretz. An inequality for circle packings proved by semidenite pro-
gramming. Discrete and Computational Geometry, 31(3):357367, 2004.
[RV] T. Roh and L. Vandenberghe. Discrete transforms, semidenite programming and sum-of-
squares representations of nonnegative polynomials. Preprint.
[SOP] N. D. Stein, A. E. Ozdaglar, and P. A. Parrilo. Separable and low-rank continuous games.
MIT LIDS technical report, 2006.
[Ste04] H. J. Stetter. Numerical polynomial algebra. Society for Industrial and Applied Mathematics
(SIAM), Philadelphia, PA, 2004.
22-3
Sum of Squares Programs and Polynomial Inequalities
Pablo A. Parrilo
Laboratory for Information and Decision Systems
Massachusetts Institute of Technology
Cambridge, MA 02139-4307, USA
[email protected]
Submitted to SIAG/OPT News and Views
1 Introduction
Consider a given system of polynomial equations and
inequalities, for instance:
f
1
(x
1
, x
2
) := x
2
1
+x
2
2
1 = 0,
g
1
(x
1
, x
2
) := 3 x
2
x
3
1
2 0,
g
2
(x
1
, x
2
) := x
1
8 x
3
2
0.
(1)
How can one nd real solutions (x
1
, x
2
)? How to
prove that they do not exist? And if the solution set
is nonempty, how to optimize a polynomial function
over this set?
Until a few years ago, the default answer to these
and similar questions would have been that the possi-
ble nonconvexity of the feasible set and/or objective
function precludes any kind of analytic global results.
Even today, the methods of choice for most prac-
titioners would probably employ mostly local tech-
niques (Newtons and its variations), possibly com-
plemented by a systematic search using deterministic
or stochastic exploration of the solution space, inter-
val analysis or branch and bound.
However, very recently there have been renewed
hopes for the ecient solution of specic instances of
this kind of problems. The main reason is the appear-
ance of methods that combine in a very interesting
fashion ideas from real algebraic geometry and convex
optimization [27, 30, 21]. As we will see, these meth-
ods are based on the intimate links between sum of
squares decompositions for multivariate polynomials
and semidenite programming (SDP).
In this note we outline the essential elements of
this new research approach as introduced in [30, 32],
and provide pointers to the literature. The center-
pieces will be the following two facts about multi-
variate polynomials and systems of polynomials in-
equalities:
Sum of squares decompositions can be com-
puted using semidenite programming.
The search for infeasibility certicates is a
convex problem. For bounded degree, it is
an SDP.
In the rest of this note, we dene the basic ideas
needed to make the assertions above precise, and ex-
plain the relationship with earlier techniques. For
this, we will introduce sum of squares polynomials
and the notion of sum of squares programs. We then
explain how to use them to provide infeasibility cer-
ticates for systems of polynomial inequalities, nally
putting it all together via the surprising connections
with optimization.
On a related but dierent note, we mention a grow-
ing body of work also aimed at the integration of ideas
from algebra and optimization, but centered instead
on integer programming and toric ideals; see for in-
stance [7, 42, 3] and the volume [1] as starting points.
2 Sums of squares and SOS
programs
Our notation is mostly standard. The monomial x
i
. A
polynomial is a nite linear combination of monomi-
als
S
c
are real. If
all the monomials have the same degree d, we will call
the polynomial homogeneous of degree d. We denote
the ring of multivariate polynomials with real coe-
cients in the indeterminates {x
1
, . . . , x
n
} as R[x].
A multivariate polynomial is a sum of squares
(SOS) if it can be written as a sum of squares of
1
other polynomials, i.e.,
p(x) =
i
q
2
i
(x), q
i
(x) R[x].
If p(x) is SOS then clearly p(x) 0 for all x. In
general, SOS decompositions are not unique.
Example 1 The polynomial p(x
1
, x
2
) = x
2
1
x
1
x
2
2
+
x
4
2
+ 1 is SOS. Among innite others, it has the de-
compositions:
p(x
1
, x
2
) =
3
4
(x
1
x
2
2
)
2
+
1
4
(x
1
+x
2
2
)
2
+ 1
=
1
9
(3 x
2
2
)
2
+
2
3
x
2
2
+
+
1
288
(9x
1
16x
2
2
)
2
+
23
32
x
2
1
.
The sum of squares condition is a quite natural suf-
cient test for polynomial nonnegativity. Its rich
mathematical structure has been analyzed in detail in
the past, notably by Reznick and his coauthors [6, 38],
but until very recently the computational implica-
tions have not been fully explored. In the last few
years there have been some very interesting new de-
velopments surrounding sums of squares, where sev-
eral independent approaches have produced a wide
array of results linking foundational questions in al-
gebra with computational possibilities arising from
convex optimization. Most of them employ semidef-
inite programming (SDP) as the essential computa-
tional tool. For completeness, we present in the next
paragraph a brief summary of SDP.
Semidenite programming SDP is a broad gen-
eralization of linear programming (LP), to the case
of symmetric matrices. Denoting by S
n
the space of
nn symmetric matrices, the standard SDP primal-
dual formulation is:
min
X
C X s.t.
A
i
X = b
i
, i = 1, . . . , m
X 0
max
y
b
T
y, s.t.
m
i=1
A
i
y
i
C
(2)
where A
i
, C, X S
n
and b, y R
m
. The matrix
inequalities are to be interpreted in the partial or-
der induced by the positive semidenite cone, i.e.,
X Y means that X Y is a positive semidenite
matrix. Since its appearance almost a decade ago
(related ideas, such as eigenvalue optimization, have
been around for decades) there has been a true rev-
olution in computational methods, supported by an
astonishing variety of applications. By now there are
several excellent introductions to SDP; among them
we mention the well-known work of Vandenberghe
and Boyd [44] as a wonderful survey of the basic the-
ory and initial applications, and the handbook [45]
for a comprehensive treatment of the many aspects
of the subject. Other survey works, covering dier-
ent complementary aspects are the early work by Al-
izadeh [2], Goemans [15], as well as the more recent
ones due to Todd [43], De Klerk [9] and Laurent and
Rendl [25].
From SDP to SOS The main object of interest in
semidenite programming is
Quadratic forms, that are positive semi-
denite.
When attempting to generalize this construction to
homogeneous polynomials of higher degree, an un-
surmountable diculty that appears is the fact that
deciding nonnegativity for quartic or higher degree
forms is an NP-hard problem. Therefore, a computa-
tional tractable replacement for this is the following:
Even degree polynomials, that are sums
of squares.
Sum of squares programs can then be dened as op-
timization problems over ane families of polynomi-
als, subject to SOS contraints. Like SDPs, there are
several possible equivalent descriptions. We choose
below a free variables formulation, to highlight the
analogy with the standard SDP dual form discussed
above.
Denition 1 A sum of squares program has the
form
max
y
b
1
y
1
+ +b
m
y
m
s.t. P
i
(x, y) are SOS, i = 1, . . . , p
where P
i
(x, y) := C
i
(x) +A
i1
(x)y
1
+ +A
im
(x)y
m
,
and the C
i
, A
ij
are given polynomials in the variables
x
i
.
SOS programs are very useful, since they directly op-
erate with polynomials as their basic objects, thus
providing a quite natural modelling formulation for
many problems. Among others, examples for this are
the search for Lyapunov functions for nonlinear sys-
tems [30, 28], probability inequalities [4], as well as
the relaxations in [30, 21] discussed below.
Interestingly enough, despite their apparently
greater generality, sum of squares programs are in
fact equivalent to SDPs. On the one hand, by choos-
ing the polynomials C
i
(x), A
ij
(x) to be quadratic
2
forms, we recover standard SDP. On the other hand,
as we will see in the next section, it is possible to
exactly embed every SOS program into a larger SDP.
Nevertheless, the rich algebraic structure of SOS pro-
grams will allow us a much deeper understanding of
their special properties, as well as enable customized,
more ecient algorithms for their solution [26].
Furthermore, as illustrated in later sections, there
are numerous questions related to some foundational
issues in nonconvex optimization that have simple
and natural formulations as SOS programs.
SOS programs as SDPs Sum of squares pro-
grams can be written as SDPs. The reason is the
following theorem:
Theorem 1 A polynomial p(x) is SOS if and only
if p(x) = z
T
Qz, where z is a vector of monomials in
the x
i
variables, Q S
N
and Q 0.
In other words, every SOS polynomial can be written
as a quadratic form in a set of monomials of cardinal-
ity N, with the corresponding matrix being positive
semidenite. The vector of monomials z (and there-
fore N) in general depends on the degree and sparsity
pattern of p(x). If p(x) has n variables and total de-
gree 2d, then z can always be chosen as a subset of
the set of monomials of degree less than or equal to
d, of cardinality N =
n+d
d
.
Example 2 Consider again the polynomial from Ex-
ample 1. It has the representation
p(x
1
, x
2
) =
1
6
_
_
1
x
2
x
2
2
x
1
_
_
T
_
_
6 0 2 0
0 4 0 0
2 0 6 3
0 0 3 6
_
_
_
_
1
x
2
x
2
2
x
1
_
_
,
and the matrix in the expression above is positive
semidenite.
In the representation f(x) = z
T
Qz, for the right-
and left-hand sides to be identical, all the coecients
of the corresponding polynomials should be equal.
Since Q is simultaneously constrained by linear equa-
tions and a positive semideniteness condition, the
problem can be easily seen to be directly equivalent
to an SDP feasibility problem in the standard primal
form (2).
Given a SOS program, we can use the theorem
above to construct an equivalent SDP. The conversion
step is fully algorithmic, and has been implemented,
for instance, in the SOSTOOLS [36] software pack-
age. Therefore, we can in principle directly apply
all the available numerical methods for SDP to solve
SOS programs.
SOS and convexity The connection between sum
of squares decompositions and convexity can be
traced back to the work of N. Z. Shor [39]. In this
1987 paper, he essentially outlined the links between
Hilberts 17th problem and a class of convex bounds
for unconstrained polynomial optimization problems.
Unfortunately, the approach went mostly unnoticed
for several years, probably due to the lack of the con-
venient framework of SDP.
3 Algebra and optimization
A central theme throughout convex optimization is
the idea of infeasibility certicates (for instance, in
LP via Farkas lemma), or equivalently, theorems of
the alternative. As we will see, the key link relating
algebra and optimization in this approach is the fact
that infeasibility can always be certied by a partic-
ular algebraic identity, whose solution is found via
convex optimization.
We explain some of the concrete results in The-
orem 5, after a brief introduction to two algebraic
concepts, and a comparison with three well-known
infeasibility certicates.
Ideals and cones For later reference, we dene
here two important algebraic objects: the ideal and
the cone associated with a set of polynomials:
Denition 2 Given a set of multivariate polynomi-
als {f
1
, . . . , f
m
}, let
ideal(f
1
, . . . , f
m
) := {f | f =
m
i=1
t
i
f
i
, t
i
R[x]}.
Denition 3 Given a set of multivariate polynomi-
als {g
1
, . . . , g
m
}, let
cone(g
1
, . . . , g
m
) := {g | g = s
0
+
{i}
s
i
g
i
+
+
{i,j}
s
ij
g
i
g
j
+
{i,j,k}
s
ijk
g
i
g
j
g
k
+ },
where each term in the sum is a squarefree product of
the polynomials g
i
, with a coecient s
R[x] that is
a sums of squares. The sum is nite, with a total of
2
m
1 terms, corresponding to the nonempty subsets
of {g
1
, . . . , g
m
}.
These algebraic objects will be used for deriving valid
inequalities, which are logical consequences of the
given constraints. Notice that by construction, every
polynomial in ideal(f
i
) vanishes in the solution set
3
of f
i
(x) = 0. Similarly, every element of cone(g
i
) is
clearly nonnegative on the feasible set of g
i
(x) 0.
The notions of ideal and cone as used above are
standard in real algebraic geometry; see for instance
[5]. In particular, the cones are also referred to as a
preorders. Notice that as geometric objects, ideals are
ane sets, and cones are closed under convex com-
binations and nonnegative scalings (i.e., they are ac-
tually cones in the convex geometry sense). These
convexity properties, coupled with the relationships
between SDP and SOS, will be key for our develop-
ments in the next section.
Infeasibility certicates If a system of equations
does not have solutions, how do we prove this fact?
A very useful concept is that of certicates, which
are formal algebraic identities that provide irrefutable
evidence of the inexistence of solutions.
We briey illustrate some well-known examples be-
low. The rst two deal with linear systems and poly-
nomial equations over the complex numbers, respec-
tively.
Theorem 2 (Range/kernel)
Ax = b is infeasible
s.t. A
T
= 0, b
T
= 1.
Theorem 3 (Hilberts Nullstellensatz) Let
f
i
(z), . . . , f
m
(z) be polynomials in complex variables
z
1
, . . . , z
n
. Then,
f
i
(z) = 0 (i = 1, . . . , m) is infeasible in C
n
1 ideal(f
1
, . . . , f
m
).
Each of these theorems has an easy direction. For
instance, for the rst case, given the multipliers the
infeasibility is obvious, since
Ax = b
T
Ax =
T
b 0 = 1,
which is clearly a contradiction.
The two theorems above deal only with the case of
equations. The inclusion of inequalities in the prob-
lem formulation poses additional algebraic challenges,
because we need to work on an ordered eld. In other
words, we need to take into account special properties
of the reals, and not just the complex numbers.
For the case of linear inequalities, LP duality pro-
vides the following characterization:
Theorem 4 (Farkas lemma)
Ax +b = 0
Cx +d 0
is infeasible
0, s.t.
A
T
+C
T
= 0
b
T
+d
T
= 1.
Although not widely known in the optimization com-
munity until recently, it turns out that similar cer-
ticates do exist for arbitrary systems of polynomial
equations and inequalities over the reals. The result
essentially appears in this form in [5], and is due to
Stengle [40].
Theorem 5 (Positivstellensatz)
f
i
(x) = 0, (i = 1, . . . , m)
g
i
(x) 0, (i = 1, . . . , p)
is infeasible in R
n
_
6 3 3 0 0 3
3 4 2 0 1 1
3 2 6 2 0 3
0 0 2 4 7 2
0 1 0 7 18 0
3 1 3 2 0 6
_
_
z,
where
z =
1 x
2
x
2
2
x
1
x
1
x
2
x
2
1
T
.
The resulting identity (3) thus certies the inconsis-
tency of the system {f
1
= 0, g
1
0, g
2
0}.
As outlined in the preceding paragraphs, there is a
direct connection going from general polynomial op-
timization problems to SDP, via P-satz infeasibility
certicates. Pictorially, we have the following:
Polynomial systems
P-satz certicates
SOS programs
SDP
Even though we have discussed only feasibility prob-
lems, there are obvious straightforward connections
with optimization. By considering the emptiness of
the sublevel sets of the objective function, sequences
of converging bounds indexed by certicate degree
can be directly constructed.
4 Further developments and
applications
We have covered only the core elements of the
SOS/SDP approach. Much more is known, and even
5
more still remains to be discovered, both in the theo-
retical and computational ends. Some specic issues
are discussed below.
Exploiting structure and numerical computa-
tion To what extent can the inherent structure in
SOS programs be exploited for ecient computa-
tions? Given the algebraic origins of the formulation,
it is perhaps not surprising to nd that several intrin-
sic properties of the input polynomials can be prof-
itably used [29]. In this direction, symmetry reduc-
tion techniques have been employed by Gatermann
and Parrilo in [14] to provide novel representations
for symmetric polynomials. Kojima, Kim and Waki
[20] have recently presented some results for sparse
polynomials. Parrilo [31] and Laurent [23] have ana-
lyzed the further simplications that occur when the
inequality constraints dene a zero-dimensional ideal.
Other relaxations Lasserre [21, 22] has indepen-
dently introduced a scheme for polynomial optimiza-
tion dual to the one described here, but relying on
Putinars representation theorem for positive poly-
nomials rather than the P-satz. There are very inter-
esting relationship between SOS-based methods and
earlier relaxation and approximation schemes, such
as Lovasz-Schrijver and Sherali-Adams. Laurent [24]
analyzes this in the specic case of 0-1 programming.
Implementations The software SOSTOOLS [36]
is a free, third-party MATLAB
1
toolbox for formu-
lating and solving general sum of squares programs.
The related sofware Gloptipoly [17] is oriented to-
ward global optimization problems. In their current
version, both use the SDP solver SeDuMi [41] for nu-
merical computations.
Approximation properties There are several im-
portant open questions regarding the provable qual-
ity of the approximations. In this direction, De Klerk
and Pasechnik [11] have established some approxima-
tions guarantees of a SOS-based scheme for the ap-
proximation of the stability number of a graph. Re-
cently, De Klerk, Laurent and Parrilo [10] have shown
that a related procedure based on a result by Polya
provides a polynomial-time approximation scheme
(PTAS) for polynomial optimization over simplices.
Applications There are many exciting applica-
tions of the ideas described here. The descriptions
that follow are necessarily brief; our main objective
1
A registered trademark of The MathWorks, Inc.
here is to provide the reader with some good starting
points to this growing literature.
In systems and control theory, the techniques have
provided some of the best available analysis and de-
sign methods, in areas such as nonlinear stability and
robustness analysis [30, 28, 35], state feedback control
[19], xed-order controllers [18], nonlinear synthesis
[37], and model validation [34]. Also, there have been
interesting recent applications in geometric theorem
proving [33] and quantum information theory [12, 13].
Acknowledgments: The author would like to thank
Etienne de Klerk and Luis Vicente for their helpful
comments and suggestions.
References
[1] K. Aardal and R. Thomas (eds.), Algebraic and geo-
metric methods in discrete optimization, Springer-
Verlag, Heidelberg, 2003, Math. Program. 96 (2003),
no. 2, Ser. B. MR 1 993 044
[2] F. Alizadeh, Interior point methods in semidenite
programming with applications to combinatorial op-
timization, SIAM J. Optim. 5 (1995), no. 1, 1351.
[3] D. Bertsimas, G. Perakis, and S. Tayur, A new al-
gebraic geometry algorithm for integer programming,
Management Science 46 (2000), no. 7, 9991008.
[4] D. Bertsimas and I. Popescu, Optimal inequal-
ities in probability: A convex optimization ap-
proach, INSEAD working paper, available at
http://www.insead.edu/facultyresearch/tm/popescu/,
1999-2001.
[5] J. Bochnak, M. Coste, and M-F. Roy, Real algebraic
geometry, Springer, 1998.
[6] M. D. Choi, T. Y. Lam, and B. Reznick, Sums of
squares of real polynomials, Proceedings of Symposia
in Pure Mathematics 58 (1995), no. 2, 103126.
[7] P. Conti and C. Traverso, Buchberger algorithm and
integer programming, Applied algebra, algebraic al-
gorithms and error-correcting codes (New Orleans,
LA, 1991), Lecture Notes in Comput. Sci., vol. 539,
Springer, Berlin, 1991, pp. 130139. MR 1 229 314
[8] D. A. Cox, J. B. Little, and D. OShea, Ideals, va-
rieties, and algorithms: an introduction to computa-
tional algebraic geometry and commutative algebra,
Springer, 1997.
[9] E. de Klerk, Aspects of semidenite programming:
Interior point algorithms and selected applications,
Applied Optimization, vol. 65, Kluwer Academic
Publishers, 2002.
[10] E. de Klerk, M. Laurent, and P. A. Parrilo, A
PTAS for the minimization of polynomials of xed
degree over the simplex, Submitted, available at
http://www.mit.edu/~parrilo, 2004.
6
[11] E. de Klerk and D.V. Pasechnik, Approximating the
stability number of a graph via copositive program-
ming, SIAM Journal on Optimization 12 (2002),
no. 4, 875892.
[12] A. C. Doherty, P. A. Parrilo, and F. M. Spedalieri,
Distinguishing separable and entangled states, Phys-
ical Review Letters 88 (2002), no. 18.
[13] , Complete family of separability criteria,
Physical Review A 69 (2004), 022308.
[14] K. Gatermann and P. A. Parrilo, Symmetry groups,
semidenite programs, and sums of squares, Journal
of Pure and Applied Algebra 192 (2004), no. 1-3,
95128.
[15] M. X. Goemans, Semidenite programming in com-
binatorial optimization, Math. Programming 79
(1997), no. 1-3, 143161. MR 98g:90028
[16] D. Grigoriev and N. Vorobjov, Complexity of Null-
and Positivstellensatz proofs, Annals of Pure and Ap-
plied Logic 113 (2002), no. 1-3, 153160.
[17] D. Henrion and J.-B. Lasserre, Glop-
tiPoly - global optimization over polynomi-
als with Matlab and SeDuMi, Available from
http://www.laas.fr/~henrion/software/gloptipoly/.
[18] D. Henrion, M. Sebek, and V. Kucera, Positive poly-
nomials and robust stabilization with xed-order con-
trollers, IEEE Transactions on Automatic Control
48 (2003), no. 7, 11781186.
[19] Z. Jarvis-Wloszek, R. Feeley, W. Tan, K. Sun, and
A. Packard, Some controls applications of sum of
squares, Proceedings of the 42
th
IEEE Conference
on Decision and Control, 2003, pp. 46764681.
[20] M. Kojima, S. Kim, and H. Waki, Sparsity in sum of
squares of polynomials, Research report B-391, Dept.
of Mathematical and Computing Sciences, Tokyo In-
stitute of Technology, 2003.
[21] J. B. Lasserre, Global optimization with polynomials
and the problem of moments, SIAM J. Optim. 11
(2001), no. 3, 796817. MR 1 814 045
[22] , Polynomials nonnegative on a grid and dis-
crete optimization, Trans. Amer. Math. Soc. 354
(2002), no. 2, 631649. MR 1 862 561
[23] M. Laurent, Semidenite representations for nite
varieties, Preprint.
[24] , A comparison of the Sherali-Adams, Lovasz-
Schrijver, and Lasserre relaxations for 0-1 program-
ming, Math. Oper. Res. 28 (2003), no. 3, 470496.
MR 1 997 246
[25] M. Laurent and F. Rendl, Semidenite programming
and integer programming, Tech. Report PNA-R0210,
CWI, Amsterdam, April 2002.
[26] J. Lofberg and P. A. Parrilo, From coecients to
samples: a new approach in SOS optimization, Pro-
ceedings of the 43
th
IEEE Conference on Decision
and Control, 2004.
[27] Y. Nesterov, Squared functional systems and opti-
mization problems, High Performance Optimization
(J.B.G. Frenk, C. Roos, T. Terlaky, and S. Zhang,
eds.), Kluwer Academic Publishers, 2000, pp. 405
440.
[28] A. Papachristodoulou and S. Prajna, On the con-
struction of Lyapunov functions using the sum of
squares decomposition, Proceedings of the 41
th
IEEE
Conference on Decision and Control, 2002.
[29] P. A. Parrilo, Exploiting algebraic structure in sum
of squares programs, In Positive Polynomials in
Control, D. Henrion and A. Garulli, eds., LNCIS,
Springer, forthcoming.
[30] , Structured semidenite programs and
semialgebraic geometry methods in robustness
and optimization, Ph.D. thesis, California In-
stitute of Technology, May 2000, Available at
http://resolver.caltech.edu/CaltechETD:
etd-05062004-055516.
[31] , An explicit construction of distin-
guished representations of polynomials non-
negative over nite sets, Tech. Report IfA
Technical Report AUT02-02. Available from
http://control.ee.ethz.ch/~parrilo, ETH
Z urich, 2002.
[32] , Semidenite programming relaxations for
semialgebraic problems, Math. Prog. 96 (2003),
no. 2, Ser. B, 293320.
[33] P. A. Parrilo and R. Peretz, An inequality for circle
packings proved by semidenite programming, Dis-
crete and Computational Geometry 31 (2004), no. 3,
357367.
[34] S. Prajna, Barrier certicates for nonlinear model
validation, Proceedings of the 42
th
IEEE Conference
on Decision and Control, 2003, pp. 28842889.
[35] S. Prajna and A. Papachristodoulou, Analysis of
switched and hybrid systems beyond piecewise
quadratic methods, Proceedings of the American
Control Conference, 2003.
[36] S. Prajna, A. Papachristodoulou, and P. A.
Parrilo, SOSTOOLS: Sum of squares optimiza-
tion toolbox for MATLAB, 2002-04, Available
from http://www.cds.caltech.edu/sostools and
http://www.mit.edu/~parrilo/sostools.
[37] S. Prajna, P. A. Parrilo, and A. Rantzer, Nonlin-
ear control synthesis by convex optimization, IEEE
Transactions on Automatic Control 49 (2004), no. 2,
310314.
[38] B. Reznick, Some concrete aspects of Hilberts
17th problem, Contemporary Mathematics, vol. 253,
American Mathematical Society, 2000, pp. 251272.
[39] N. Z. Shor, Class of global minimum bounds of poly-
nomial functions, Cybernetics 23 (1987), no. 6, 731
734, (Russian orig.: Kibernetika, No. 6, (1987), 9
11).
7
[40] G. Stengle, A Nullstellensatz and a Positivstellensatz
in semialgebraic geometry, Math. Ann. 207 (1974),
8797.
[41] J. Sturm, SeDuMi version 1.05,
October 2001, Available from
http://fewcal.uvt.nl/sturm/software/sedumi.html.
[42] R. Thomas and R. Weismantel, Truncated Grobner
bases for integer programming, Appl. Algebra En-
grg. Comm. Comput. 8 (1997), no. 4, 241256. MR
98i:90055
[43] M. Todd, Semidenite optimization, Acta Numerica
10 (2001), 515560.
[44] L. Vandenberghe and S. Boyd, Semidenite program-
ming, SIAM Review 38 (1996), no. 1, 4995.
[45] H. Wolkowicz, R. Saigal, and L. Vandenberghe
(eds.), Handbook of semidenite programming,
Kluwer, 2000.
8