Given a vector c ∈ R n {\displaystyle c\in \mathbb {R} ^{n}} and polynomials a k , j {\displaystyle a_{k,j}} for k = 1 , … N s {\displaystyle k=1,\dots N_{s}} , j = 0 , 1 , … , n {\displaystyle j=0,1,\dots ,n} , a sum-of-squares optimization problem is written as
maximize u ∈ R n c T u subject to a k , 0 ( x ) + a k , 1 ( x ) u 1 + ⋯ + a k , n ( x ) u n ∈ SOS ( k = 1 , … , N s ) . {\displaystyle {\begin{aligned}{\underset {u\in \mathbb {R} ^{n}}{\text{maximize}}}\quad &c^{T}u\\{\text{subject to}}\quad &a_{k,0}(x)+a_{k,1}(x)u_{1}+\cdots +a_{k,n}(x)u_{n}\in {\text{SOS}}\quad (k=1,\ldots ,N_{s}).\end{aligned}}}
Here "SOS" represents the class of sum-of-squares (SOS) polynomials. The quantities u ∈ R n {\displaystyle u\in \mathbb {R} ^{n}} are the decision variables. SOS programs can be converted to semidefinite programs (SDPs) using the duality of the SOS polynomial program and a relaxation for constrained polynomial optimization using positive-semidefinite matrices, see the following section.
Suppose we have an n {\displaystyle n} -variate polynomial p ( x ) : R n → R {\displaystyle p(x):\mathbb {R} ^{n}\to \mathbb {R} } , and suppose that we would like to minimize this polynomial over a subset A ⊆ R n {\textstyle A\subseteq \mathbb {R} ^{n}} . Suppose furthermore that the constraints on the subset A {\textstyle A} can be encoded using m {\textstyle m} polynomial equalities of degree at most 2 d {\displaystyle 2d} , each of the form a i ( x ) = 0 {\textstyle a_{i}(x)=0} where a i : R n → R {\displaystyle a_{i}:\mathbb {R} ^{n}\to \mathbb {R} } is a polynomial of degree at most 2 d {\displaystyle 2d} . A natural, though generally non-convex program for this optimization problem is the following: min x ∈ R n ⟨ C , x ≤ d ( x ≤ d ) ⊤ ⟩ {\displaystyle \min _{x\in \mathbb {R} ^{n}}\langle C,x^{\leq d}(x^{\leq d})^{\top }\rangle } subject to:
x ∅ = 1 , {\displaystyle x_{\emptyset }=1,} where x ≤ d {\textstyle x^{\leq d}} is the n O ( d ) {\displaystyle n^{O(d)}} -dimensional vector with one entry for every monomial in x {\displaystyle x} of degree at most d {\displaystyle d} , so that for each multiset S ⊂ [ n ] , | S | ≤ d , {\displaystyle S\subset [n],|S|\leq d,} x S = ∏ i ∈ S x i {\textstyle x_{S}=\prod _{i\in S}x_{i}} , C {\textstyle C} is a matrix of coefficients of the polynomial p ( x ) {\textstyle p(x)} that we want to minimize, and A i {\textstyle A_{i}} is a matrix of coefficients of the polynomial a i ( x ) {\textstyle a_{i}(x)} encoding the i {\displaystyle i} -th constraint on the subset A ⊂ R n {\displaystyle A\subset \mathbb {R} ^{n}} . The additional, fixed constant index in our search space, x ∅ = 1 {\displaystyle x_{\emptyset }=1} , is added for the convenience of writing the polynomials p ( x ) {\textstyle p(x)} and a i ( x ) {\textstyle a_{i}(x)} in a matrix representation.
This program is generally non-convex, because the constraints (1) are not convex. One possible convex relaxation for this minimization problem uses semidefinite programming to replace the rank-one matrix of variables x ≤ d ( x ≤ d ) ⊤ {\displaystyle x^{\leq d}(x^{\leq d})^{\top }} with a positive-semidefinite matrix X {\displaystyle X} : we index each monomial of size at most 2 d {\displaystyle 2d} by a multiset S {\displaystyle S} of at most 2 d {\displaystyle 2d} indices, S ⊂ [ n ] , | S | ≤ 2 d {\displaystyle S\subset [n],|S|\leq 2d} . For each such monomial, we create a variable X S {\displaystyle X_{S}} in the program, and we arrange the variables X S {\displaystyle X_{S}} to form the matrix X ∈ R [ n ] ≤ d × [ n ] ≤ d {\textstyle X\in \mathbb {R} ^{[n]^{\leq d}\times [n]^{\leq d}}} , where R [ n ] ≤ d × [ n ] ≤ d {\displaystyle \mathbb {R} ^{[n]^{\leq d}\times [n]^{\leq d}}} is the set of real matrices whose rows and columns are identified with multisets of elements from n {\displaystyle n} of size at most d {\displaystyle d} . We then write the following semidefinite program in the variables X S {\displaystyle X_{S}} : min X ∈ R [ n ] ≤ d × [ n ] ≤ d ⟨ C , X ⟩ {\displaystyle \min _{X\in \mathbb {R} ^{[n]^{\leq d}\times [n]^{\leq d}}}\langle C,X\rangle } subject to: ⟨ A i , X ⟩ = 0 ∀ i ∈ [ m ] , Q {\displaystyle \langle A_{i},X\rangle =0\qquad \forall \ i\in [m],Q} X ∅ = 1 , {\displaystyle X_{\emptyset }=1,} X U ∪ V = X S ∪ T ∀ U , V , S , T ⊆ [ n ] , | U | , | V | , | S | , | T | ≤ d , and U ∪ V = S ∪ T , {\displaystyle X_{U\cup V}=X_{S\cup T}\qquad \forall \ U,V,S,T\subseteq [n],|U|,|V|,|S|,|T|\leq d,{\text{ and}}\ U\cup V=S\cup T,} X ⪰ 0 , {\displaystyle X\succeq 0,}
where again C {\textstyle C} is the matrix of coefficients of the polynomial p ( x ) {\textstyle p(x)} that we want to minimize, and A i {\textstyle A_{i}} is the matrix of coefficients of the polynomial a i ( x ) {\textstyle a_{i}(x)} encoding the i {\displaystyle i} -th constraint on the subset A ⊂ R n {\displaystyle A\subset \mathbb {R} ^{n}} .
The third constraint ensures that the value of a monomial that appears several times within the matrix is equal throughout the matrix, and is added to make X {\displaystyle X} respect the symmetries present in the quadratic form x ≤ d ( x ≤ d ) ⊤ {\displaystyle x^{\leq d}(x^{\leq d})^{\top }} .
One can take the dual of the above semidefinite program and obtain the following program: max y ∈ R m ′ y 0 , {\displaystyle \max _{y\in \mathbb {R} ^{m'}}y_{0},} subject to: C − y 0 e ∅ − ∑ i ∈ [ m ] y i A i − ∑ S ∪ T = U ∪ V y S , T , U , V ( e S , T − e U , V ) ⪰ 0. {\displaystyle C-y_{0}e_{\emptyset }-\sum _{i\in [m]}y_{i}A_{i}-\sum _{S\cup T=U\cup V}y_{S,T,U,V}(e_{S,T}-e_{U,V})\succeq 0.}
We have a variable y 0 {\displaystyle y_{0}} corresponding to the constraint ⟨ e ∅ , X ⟩ = 1 {\displaystyle \langle e_{\emptyset },X\rangle =1} (where e ∅ {\displaystyle e_{\emptyset }} is the matrix with all entries zero save for the entry indexed by ( ∅ , ∅ ) {\displaystyle (\varnothing ,\varnothing )} ), a real variable y i {\displaystyle y_{i}} for each polynomial constraint ⟨ X , A i ⟩ = 0 s . t . i ∈ [ m ] , {\displaystyle \langle X,A_{i}\rangle =0\quad s.t.i\in [m],} and for each group of multisets S , T , U , V ⊂ [ n ] , | S | , | T | , | U | , | V | ≤ d , S ∪ T = U ∪ V {\displaystyle S,T,U,V\subset [n],|S|,|T|,|U|,|V|\leq d,S\cup T=U\cup V} , we have a dual variable y S , T , U , V {\displaystyle y_{S,T,U,V}} for the symmetry constraint ⟨ X , e S , T − e U , V ⟩ = 0 {\displaystyle \langle X,e_{S,T}-e_{U,V}\rangle =0} . The positive-semidefiniteness constraint ensures that p ( x ) − y 0 {\displaystyle p(x)-y_{0}} is a sum-of-squares of polynomials over A ⊂ R n {\displaystyle A\subset \mathbb {R} ^{n}} : by a characterization of positive-semidefinite matrices, for any positive-semidefinite matrix Q ∈ R m × m {\textstyle Q\in \mathbb {R} ^{m\times m}} , we can write Q = ∑ i ∈ [ m ] f i f i ⊤ {\textstyle Q=\sum _{i\in [m]}f_{i}f_{i}^{\top }} for vectors f i ∈ R m {\textstyle f_{i}\in \mathbb {R} ^{m}} . Thus for any x ∈ A ⊂ R n {\textstyle x\in A\subset \mathbb {R} ^{n}} , p ( x ) − y 0 = p ( x ) − y 0 − ∑ i ∈ [ m ′ ] y i a i ( x ) since x ∈ A = ( x ≤ d ) ⊤ ( C − y 0 e ∅ − ∑ i ∈ [ m ′ ] y i A i − ∑ S ∪ T = U ∪ V y S , T , U , V ( e S , T − e U , V ) ) x ≤ d by symmetry = ( x ≤ d ) ⊤ ( ∑ i f i f i ⊤ ) x ≤ d = ∑ i ⟨ x ≤ d , f i ⟩ 2 = ∑ i f i ( x ) 2 , {\displaystyle {\begin{aligned}p(x)-y_{0}&=p(x)-y_{0}-\sum _{i\in [m']}y_{i}a_{i}(x)\qquad {\text{since }}x\in A\\&=(x^{\leq d})^{\top }\left(C-y_{0}e_{\emptyset }-\sum _{i\in [m']}y_{i}A_{i}-\sum _{S\cup T=U\cup V}y_{S,T,U,V}(e_{S,T}-e_{U,V})\right)x^{\leq d}\qquad {\text{by symmetry}}\\&=(x^{\leq d})^{\top }\left(\sum _{i}f_{i}f_{i}^{\top }\right)x^{\leq d}\\&=\sum _{i}\langle x^{\leq d},f_{i}\rangle ^{2}\\&=\sum _{i}f_{i}(x)^{2},\end{aligned}}}
where we have identified the vectors f i {\textstyle f_{i}} with the coefficients of a polynomial of degree at most d {\displaystyle d} . This gives a sum-of-squares proof that the value p ( x ) ≥ y 0 {\textstyle p(x)\geq y_{0}} over A ⊂ R n {\displaystyle A\subset \mathbb {R} ^{n}} .
The above can also be extended to regions A ⊂ R n {\displaystyle A\subset \mathbb {R} ^{n}} defined by polynomial inequalities.
The sum-of-squares hierarchy (SOS hierarchy), also known as the Lasserre hierarchy, is a hierarchy of convex relaxations of increasing power and increasing computational cost. For each natural number d ∈ N {\textstyle d\in \mathbb {N} } the corresponding convex relaxation is known as the d {\textstyle d} th level or d {\textstyle d} -th round of the SOS hierarchy. The 1 {\textstyle 1} st round, when d = 1 {\textstyle d=1} , corresponds to a basic semidefinite program, or to sum-of-squares optimization over polynomials of degree at most 2 {\displaystyle 2} . To augment the basic convex program at the 1 {\textstyle 1} st level of the hierarchy to d {\textstyle d} -th level, additional variables and constraints are added to the program to have the program consider polynomials of degree at most 2 d {\displaystyle 2d} .
The SOS hierarchy derives its name from the fact that the value of the objective function at the d {\textstyle d} -th level is bounded with a sum-of-squares proof using polynomials of degree at most 2 d {\textstyle 2d} via the dual (see "Duality" above). Consequently, any sum-of-squares proof that uses polynomials of degree at most 2 d {\textstyle 2d} can be used to bound the objective value, allowing one to prove guarantees on the tightness of the relaxation.
In conjunction with a theorem of Berg, this further implies that given sufficiently many rounds, the relaxation becomes arbitrarily tight on any fixed interval. Berg's result56 states that every non-negative real polynomial within a bounded interval can be approximated within accuracy ε {\textstyle \varepsilon } on that interval with a sum-of-squares of real polynomials of sufficiently high degree, and thus if O B J ( x ) {\textstyle OBJ(x)} is the polynomial objective value as a function of the point x {\textstyle x} , if the inequality c + ε − O B J ( x ) ≥ 0 {\textstyle c+\varepsilon -OBJ(x)\geq 0} holds for all x {\textstyle x} in the region of interest, then there must be a sum-of-squares proof of this fact. Choosing c {\textstyle c} to be the minimum of the objective function over the feasible region, we have the result.
When optimizing over a function in n {\textstyle n} variables, the d {\textstyle d} -th level of the hierarchy can be written as a semidefinite program over n O ( d ) {\textstyle n^{O(d)}} variables, and can be solved in time n O ( d ) {\textstyle n^{O(d)}} using the ellipsoid method.
Main article: Polynomial SOS
A polynomial p {\displaystyle p} is a sum of squares (SOS) if there exist polynomials { f i } i = 1 m {\displaystyle \{f_{i}\}_{i=1}^{m}} such that p = ∑ i = 1 m f i 2 {\textstyle p=\sum _{i=1}^{m}f_{i}^{2}} . For example, p = x 2 − 4 x y + 7 y 2 {\displaystyle p=x^{2}-4xy+7y^{2}} is a sum of squares since p = f 1 2 + f 2 2 {\displaystyle p=f_{1}^{2}+f_{2}^{2}} where f 1 = ( x − 2 y ) and f 2 = 3 y . {\displaystyle f_{1}=(x-2y){\text{ and }}f_{2}={\sqrt {3}}y.} Note that if p {\displaystyle p} is a sum of squares then p ( x ) ≥ 0 {\displaystyle p(x)\geq 0} for all x ∈ R n {\displaystyle x\in \mathbb {R} ^{n}} . Detailed descriptions of polynomial SOS are available.789
Quadratic forms can be expressed as p ( x ) = x T Q x {\displaystyle p(x)=x^{T}Qx} where Q {\displaystyle Q} is a symmetric matrix. Similarly, polynomials of degree ≤ 2d can be expressed as p ( x ) = z ( x ) T Q z ( x ) , {\displaystyle p(x)=z(x)^{\mathsf {T}}Qz(x),} where the vector z {\displaystyle z} contains all monomials of degree ≤ d {\displaystyle \leq d} . This is known as the Gram matrix form. An important fact is that p {\displaystyle p} is SOS if and only if there exists a symmetric and positive-semidefinite matrix Q {\displaystyle Q} such that p ( x ) = z ( x ) T Q z ( x ) {\displaystyle p(x)=z(x)^{\mathsf {T}}Qz(x)} . This provides a connection between SOS polynomials and positive-semidefinite matrices.
Sum of squares : theory and applications : AMS short course, sum of squares : theory and applications, January 14-15, 2019, Baltimore, Maryland. Parrilo, Pablo A.; Thomas, Rekha R. Providence, Rhode Island: American Mathematical Society. 2020. ISBN 978-1-4704-5025-0. OCLC 1157604983.{{cite book}}: CS1 maint: others (link) 978-1-4704-5025-0 ↩
Tan, W., Packard, A., 2004. "Searching for control Lyapunov functions using sums of squares programming". In: Allerton Conf. on Comm., Control and Computing. pp. 210–219. http://jagger.me.berkeley.edu/papers/weehong_3.pdf ↩
Tan, W., Topcu, U., Seiler, P., Balas, G., Packard, A., 2008. Simulation-aided reachability and local gain analysis for nonlinear dynamical systems. In: Proc. of the IEEE Conference on Decision and Control. pp. 4097–4102. http://folk.ntnu.no/skoge/prost/proceedings/cdc-2008/data/papers/1767.pdf ↩
A. Chakraborty, P. Seiler, and G. Balas, "Susceptibility of F/A-18 Flight Controllers to the Falling-Leaf Mode: Nonlinear Analysis," AIAA Journal of Guidance, Control, and Dynamics, vol. 34 no. 1 (2011), pp. 73–85. http://www.aem.umn.edu/~AerospaceControl/V&VWebpage/Papers/AIAANlin.pdf ↩
Berg, Christian (1987). "The multidimensional moment problem and semigroups". In Landau, Henry J. (ed.). Moments in Mathematics. Proceedings of Symposia in Applied Mathematics. Vol. 37. pp. 110–124. doi:10.1090/psapm/037/921086. ISBN 9780821801147. 9780821801147 ↩
Lasserre, J. (2007-01-01). "A Sum of Squares Approximation of Nonnegative Polynomials". SIAM Review. 49 (4): 651–669. arXiv:math/0412398. Bibcode:2007SIAMR..49..651L. doi:10.1137/070693709. ISSN 0036-1445. http://epubs.siam.org/doi/abs/10.1137/070693709 ↩
Parrilo, P., (2000) Structured semidefinite programs and semialgebraic geometry methods in robustness and optimization. Ph.D. thesis, California Institute of Technology. https://thesis.library.caltech.edu/1647/1/Parrilo-Thesis.pdf ↩
Parrilo, P. (2003) "Semidefinite programming relaxations for semialgebraic problems". Mathematical Programming Ser. B 96 (2), 293–320. http://www.cds.caltech.edu/~doyle/hot/SDPrelaxations.pdf ↩
Lasserre, J. (2001) "Global optimization with polynomials and the problem of moments". SIAM Journal on Optimization, 11 (3), 796{817. http://khalilghorbal.info/assets/spa/papers/lasserre.pdf ↩