Sum-of-squares optimization

<h2 id="optimization-problem">Optimization problem</h2>
<p>Given a vector 
  
    
      
        c
        ∈
        
          
            R
          
          
            n
          
        
      
    
    {\displaystyle c\in \mathbb {R} ^{n}}
  
 and polynomials 
  
    
      
        
          a
          
            k
            ,
            j
          
        
      
    
    {\displaystyle a_{k,j}}
  
 for 
  
    
      
        k
        =
        1
        ,
        …
        
          N
          
            s
          
        
      
    
    {\displaystyle k=1,\dots N_{s}}
  
, 
  
    
      
        j
        =
        0
        ,
        1
        ,
        …
        ,
        n
      
    
    {\displaystyle j=0,1,\dots ,n}
  
, a sum-of-squares optimization problem is written as
</p><p>
  
    
      
        
          
            
              
                
                  
                    maximize
                    
                      u
                      ∈
                      
                        
                          R
                        
                        
                          n
                        
                      
                    
                  
                
                
              
              
                
                  c
                  
                    T
                  
                
                u
              
            
            
              
                
                  subject to
                
                
              
              
                
                  a
                  
                    k
                    ,
                    0
                  
                
                (
                x
                )
                +
                
                  a
                  
                    k
                    ,
                    1
                  
                
                (
                x
                )
                
                  u
                  
                    1
                  
                
                +
                ⋯
                +
                
                  a
                  
                    k
                    ,
                    n
                  
                
                (
                x
                )
                
                  u
                  
                    n
                  
                
                ∈
                
                  SOS
                
                
                (
                k
                =
                1
                ,
                …
                ,
                
                  N
                  
                    s
                  
                
                )
                .
              
            
          
        
      
    
    {\displaystyle {\begin{aligned}{\underset {u\in \mathbb {R} ^{n}}{\text{maximize}}}\quad &c^{T}u\\{\text{subject to}}\quad &a_{k,0}(x)+a_{k,1}(x)u_{1}+\cdots +a_{k,n}(x)u_{n}\in {\text{SOS}}\quad (k=1,\ldots ,N_{s}).\end{aligned}}}

</p><p>Here "SOS" represents the class of sum-of-squares (SOS) polynomials.
The quantities 
  
    
      
        u
        ∈
        
          
            R
          
          
            n
          
        
      
    
    {\displaystyle u\in \mathbb {R} ^{n}}
  
 are the decision variables. SOS programs can be converted to <a href="/facts/Semidefinite_programming/pFN25EKE">semidefinite programs</a> (SDPs) using the <a href="/facts/Duality_(optimization)/lXqeKfnK">duality</a> of the <a href="/facts/Polynomial_SOS/mwf3SeW4">SOS polynomial</a> program and a relaxation for constrained polynomial optimization using <a href="/facts/Positive-semidefinite_matrix/Hbr6AuPS">positive-semidefinite matrices</a>, see the following section.
</p>
<h2 id="dual-problem-constrained-polynomial-optimization">Dual problem: constrained polynomial optimization</h2>
<p>Suppose we have an 
  
    
      
        n
      
    
    {\displaystyle n}
  
-variate polynomial 
  
    
      
        p
        (
        x
        )
        :
        
          
            R
          
          
            n
          
        
        →
        
          R
        
      
    
    {\displaystyle p(x):\mathbb {R} ^{n}\to \mathbb {R} }
  
 , and suppose that we would like to minimize this polynomial over a subset 
  
    
      
        A
        ⊆
        
          
            R
          
          
            n
          
        
      
    
    {\textstyle A\subseteq \mathbb {R} ^{n}}
  
. Suppose furthermore that the constraints on the subset 
  
    
      
        A
      
    
    {\textstyle A}
  
 can be encoded using 
  
    
      
        m
      
    
    {\textstyle m}
  
 polynomial equalities of degree at most 
  
    
      
        2
        d
      
    
    {\displaystyle 2d}
  
, each of the form 
  
    
      
        
          a
          
            i
          
        
        (
        x
        )
        =
        0
      
    
    {\textstyle a_{i}(x)=0}
  
 where 
  
    
      
        
          a
          
            i
          
        
        :
        
          
            R
          
          
            n
          
        
        →
        
          R
        
      
    
    {\displaystyle a_{i}:\mathbb {R} ^{n}\to \mathbb {R} }
  
 is a polynomial of degree at most 
  
    
      
        2
        d
      
    
    {\displaystyle 2d}
  
. A natural, though generally non-convex program for this optimization problem is the following:

min
          
            x
            ∈
            
              
                R
              
              
                n
              
            
          
        
        ⟨
        C
        ,
        
          x
          
            ≤
            d
          
        
        (
        
          x
          
            ≤
            d
          
        
        
          )
          
            ⊤
          
        
        ⟩
      
    
    {\displaystyle \min _{x\in \mathbb {R} ^{n}}\langle C,x^{\leq d}(x^{\leq d})^{\top }\rangle }

subject to:
</p>
<table><tbody><tr><td>                    ⟨                  A                      i                          ,                  x                      ≤            d                          (                  x                      ≤            d                                    )                      ⊤                          ⟩        =        0                ∀                 i        ∈        [        m        ]        ,              {\displaystyle \langle A_{i},x^{\leq d}(x^{\leq d})^{\top }\rangle =0\qquad \forall \ i\in [m],}  </td> <td></td> <td>1</td></tr></tbody></table>
<p>
  
    
      
        
          x
          
            ∅
          
        
        =
        1
        ,
      
    
    {\displaystyle x_{\emptyset }=1,}

where 
  
    
      
        
          x
          
            ≤
            d
          
        
      
    
    {\textstyle x^{\leq d}}
  
 is the 
  
    
      
        
          n
          
            O
            (
            d
            )
          
        
      
    
    {\displaystyle n^{O(d)}}
  
-dimensional vector with one entry for every monomial in 
  
    
      
        x
      
    
    {\displaystyle x}
  
 of degree at most 
  
    
      
        d
      
    
    {\displaystyle d}
  
, so that for each multiset 
  
    
      
        S
        ⊂
        [
        n
        ]
        ,
        
          |
        
        S
        
          |
        
        ≤
        d
        ,
      
    
    {\displaystyle S\subset [n],|S|\leq d,}
  
 
  
    
      
        
          x
          
            S
          
        
        =
        
          ∏
          
            i
            ∈
            S
          
        
        
          x
          
            i
          
        
      
    
    {\textstyle x_{S}=\prod _{i\in S}x_{i}}
  
, 
  
    
      
        C
      
    
    {\textstyle C}
  
 is a matrix of coefficients of the polynomial 
  
    
      
        p
        (
        x
        )
      
    
    {\textstyle p(x)}
  
 that we want to minimize, and 
  
    
      
        
          A
          
            i
          
        
      
    
    {\textstyle A_{i}}
  
 is a matrix of coefficients of the polynomial 
  
    
      
        
          a
          
            i
          
        
        (
        x
        )
      
    
    {\textstyle a_{i}(x)}
  
 encoding the 
  
    
      
        i
      
    
    {\displaystyle i}
  
-th constraint on the subset 
  
    
      
        A
        ⊂
        
          
            R
          
          
            n
          
        
      
    
    {\displaystyle A\subset \mathbb {R} ^{n}}
  
. The additional, fixed constant index in our search space, 
  
    
      
        
          x
          
            ∅
          
        
        =
        1
      
    
    {\displaystyle x_{\emptyset }=1}
  
, is added for the convenience of writing the polynomials 
  
    
      
        p
        (
        x
        )
      
    
    {\textstyle p(x)}
  
 and 
  
    
      
        
          a
          
            i
          
        
        (
        x
        )
      
    
    {\textstyle a_{i}(x)}
  
 in a matrix representation.
</p><p>This program is generally non-convex, because the constraints (1) are not convex. One possible convex relaxation for this minimization problem uses <a href="/facts/Semidefinite_programming/pFN25EKE">semidefinite programming</a> to replace the rank-one matrix of variables 
  
    
      
        
          x
          
            ≤
            d
          
        
        (
        
          x
          
            ≤
            d
          
        
        
          )
          
            ⊤
          
        
      
    
    {\displaystyle x^{\leq d}(x^{\leq d})^{\top }}
  
 with a positive-semidefinite matrix 
  
    
      
        X
      
    
    {\displaystyle X}
  
: we index each monomial of size at most 
  
    
      
        2
        d
      
    
    {\displaystyle 2d}
  
 by a multiset 
  
    
      
        S
      
    
    {\displaystyle S}
  
 of at most 
  
    
      
        2
        d
      
    
    {\displaystyle 2d}
  
 indices, 
  
    
      
        S
        ⊂
        [
        n
        ]
        ,
        
          |
        
        S
        
          |
        
        ≤
        2
        d
      
    
    {\displaystyle S\subset [n],|S|\leq 2d}
  
. For each such monomial, we create a variable 
  
    
      
        
          X
          
            S
          
        
      
    
    {\displaystyle X_{S}}
  
 in the program, and we arrange the variables 
  
    
      
        
          X
          
            S
          
        
      
    
    {\displaystyle X_{S}}
  
 to form the matrix 
  
    
      
        X
        ∈
        
          
            R
          
          
            [
            n
            
              ]
              
                ≤
                d
              
            
            ×
            [
            n
            
              ]
              
                ≤
                d
              
            
          
        
      
    
    {\textstyle X\in \mathbb {R} ^{[n]^{\leq d}\times [n]^{\leq d}}}
  
, where 
  
    
      
        
          
            R
          
          
            [
            n
            
              ]
              
                ≤
                d
              
            
            ×
            [
            n
            
              ]
              
                ≤
                d
              
            
          
        
      
    
    {\displaystyle \mathbb {R} ^{[n]^{\leq d}\times [n]^{\leq d}}}
  
 is the set of real matrices whose rows and columns are identified with multisets of elements from 
  
    
      
        n
      
    
    {\displaystyle n}
  
 of size at most 
  
    
      
        d
      
    
    {\displaystyle d}
  
. We then write the following semidefinite program in the variables 
  
    
      
        
          X
          
            S
          
        
      
    
    {\displaystyle X_{S}}
  
:

min
          
            X
            ∈
            
              
                R
              
              
                [
                n
                
                  ]
                  
                    ≤
                    d
                  
                
                ×
                [
                n
                
                  ]
                  
                    ≤
                    d
                  
                
              
            
          
        
        ⟨
        C
        ,
        X
        ⟩
      
    
    {\displaystyle \min _{X\in \mathbb {R} ^{[n]^{\leq d}\times [n]^{\leq d}}}\langle C,X\rangle }

subject to:

⟨
        
          A
          
            i
          
        
        ,
        X
        ⟩
        =
        0
        
        ∀
         
        i
        ∈
        [
        m
        ]
        ,
        Q
      
    
    {\displaystyle \langle A_{i},X\rangle =0\qquad \forall \ i\in [m],Q}

X
          
            ∅
          
        
        =
        1
        ,
      
    
    {\displaystyle X_{\emptyset }=1,}

X
          
            U
            ∪
            V
          
        
        =
        
          X
          
            S
            ∪
            T
          
        
        
        ∀
         
        U
        ,
        V
        ,
        S
        ,
        T
        ⊆
        [
        n
        ]
        ,
        
          |
        
        U
        
          |
        
        ,
        
          |
        
        V
        
          |
        
        ,
        
          |
        
        S
        
          |
        
        ,
        
          |
        
        T
        
          |
        
        ≤
        d
        ,
        
           and
        
         
        U
        ∪
        V
        =
        S
        ∪
        T
        ,
      
    
    {\displaystyle X_{U\cup V}=X_{S\cup T}\qquad \forall \ U,V,S,T\subseteq [n],|U|,|V|,|S|,|T|\leq d,{\text{ and}}\ U\cup V=S\cup T,}

X
        ⪰
        0
        ,
      
    
    {\displaystyle X\succeq 0,}

</p><p>where again 
  
    
      
        C
      
    
    {\textstyle C}
  
 is the matrix of coefficients of the polynomial 
  
    
      
        p
        (
        x
        )
      
    
    {\textstyle p(x)}
  
 that we want to minimize, and 
  
    
      
        
          A
          
            i
          
        
      
    
    {\textstyle A_{i}}
  
 is the matrix of coefficients of the polynomial 
  
    
      
        
          a
          
            i
          
        
        (
        x
        )
      
    
    {\textstyle a_{i}(x)}
  
 encoding the 
  
    
      
        i
      
    
    {\displaystyle i}
  
-th constraint on the subset 
  
    
      
        A
        ⊂
        
          
            R
          
          
            n
          
        
      
    
    {\displaystyle A\subset \mathbb {R} ^{n}}
  
.
</p><p>The third constraint ensures that the value of a monomial that appears several times within the matrix is equal throughout the matrix, and is added to make 
  
    
      
        X
      
    
    {\displaystyle X}
  
 respect the symmetries present in the quadratic form 
  
    
      
        
          x
          
            ≤
            d
          
        
        (
        
          x
          
            ≤
            d
          
        
        
          )
          
            ⊤
          
        
      
    
    {\displaystyle x^{\leq d}(x^{\leq d})^{\top }}
  
.
</p>
<h3>Duality</h3>
<p>One can take the dual of the above semidefinite program and obtain the following program:

max
          
            y
            ∈
            
              
                R
              
              
                
                  m
                  ′
                
              
            
          
        
        
          y
          
            0
          
        
        ,
      
    
    {\displaystyle \max _{y\in \mathbb {R} ^{m'}}y_{0},}

subject to:

C
        −
        
          y
          
            0
          
        
        
          e
          
            ∅
          
        
        −
        
          ∑
          
            i
            ∈
            [
            m
            ]
          
        
        
          y
          
            i
          
        
        
          A
          
            i
          
        
        −
        
          ∑
          
            S
            ∪
            T
            =
            U
            ∪
            V
          
        
        
          y
          
            S
            ,
            T
            ,
            U
            ,
            V
          
        
        (
        
          e
          
            S
            ,
            T
          
        
        −
        
          e
          
            U
            ,
            V
          
        
        )
        ⪰
        0.
      
    
    {\displaystyle C-y_{0}e_{\emptyset }-\sum _{i\in [m]}y_{i}A_{i}-\sum _{S\cup T=U\cup V}y_{S,T,U,V}(e_{S,T}-e_{U,V})\succeq 0.}

</p><p>We have a variable 
  
    
      
        
          y
          
            0
          
        
      
    
    {\displaystyle y_{0}}
  
 corresponding to the constraint 
  
    
      
        ⟨
        
          e
          
            ∅
          
        
        ,
        X
        ⟩
        =
        1
      
    
    {\displaystyle \langle e_{\emptyset },X\rangle =1}
  
 (where 
  
    
      
        
          e
          
            ∅
          
        
      
    
    {\displaystyle e_{\emptyset }}
  
 is the matrix with all entries zero save for the entry indexed by 
  
    
      
        (
        ∅
        ,
        ∅
        )
      
    
    {\displaystyle (\varnothing ,\varnothing )}
  
), a real variable 
  
    
      
        
          y
          
            i
          
        
      
    
    {\displaystyle y_{i}}
  
 for each polynomial constraint 
  
    
      
        ⟨
        X
        ,
        
          A
          
            i
          
        
        ⟩
        =
        0
        
        s
        .
        t
        .
        i
        ∈
        [
        m
        ]
        ,
      
    
    {\displaystyle \langle X,A_{i}\rangle =0\quad s.t.i\in [m],}
  
 and for each group of multisets 
  
    
      
        S
        ,
        T
        ,
        U
        ,
        V
        ⊂
        [
        n
        ]
        ,
        
          |
        
        S
        
          |
        
        ,
        
          |
        
        T
        
          |
        
        ,
        
          |
        
        U
        
          |
        
        ,
        
          |
        
        V
        
          |
        
        ≤
        d
        ,
        S
        ∪
        T
        =
        U
        ∪
        V
      
    
    {\displaystyle S,T,U,V\subset [n],|S|,|T|,|U|,|V|\leq d,S\cup T=U\cup V}
  
, we have a dual variable 
  
    
      
        
          y
          
            S
            ,
            T
            ,
            U
            ,
            V
          
        
      
    
    {\displaystyle y_{S,T,U,V}}
  
 for the symmetry constraint 
  
    
      
        ⟨
        X
        ,
        
          e
          
            S
            ,
            T
          
        
        −
        
          e
          
            U
            ,
            V
          
        
        ⟩
        =
        0
      
    
    {\displaystyle \langle X,e_{S,T}-e_{U,V}\rangle =0}
  
. The positive-semidefiniteness constraint ensures that 
  
    
      
        p
        (
        x
        )
        −
        
          y
          
            0
          
        
      
    
    {\displaystyle p(x)-y_{0}}
  
 is a sum-of-squares of polynomials over 
  
    
      
        A
        ⊂
        
          
            R
          
          
            n
          
        
      
    
    {\displaystyle A\subset \mathbb {R} ^{n}}
  
: by a characterization of positive-semidefinite matrices, for any positive-semidefinite matrix 
  
    
      
        Q
        ∈
        
          
            R
          
          
            m
            ×
            m
          
        
      
    
    {\textstyle Q\in \mathbb {R} ^{m\times m}}
  
, we can write 
  
    
      
        Q
        =
        
          ∑
          
            i
            ∈
            [
            m
            ]
          
        
        
          f
          
            i
          
        
        
          f
          
            i
          
          
            ⊤
          
        
      
    
    {\textstyle Q=\sum _{i\in [m]}f_{i}f_{i}^{\top }}
  
 for vectors 
  
    
      
        
          f
          
            i
          
        
        ∈
        
          
            R
          
          
            m
          
        
      
    
    {\textstyle f_{i}\in \mathbb {R} ^{m}}
  
. Thus for any 
  
    
      
        x
        ∈
        A
        ⊂
        
          
            R
          
          
            n
          
        
      
    
    {\textstyle x\in A\subset \mathbb {R} ^{n}}
  
,

p
                (
                x
                )
                −
                
                  y
                  
                    0
                  
                
              
              
                
                =
                p
                (
                x
                )
                −
                
                  y
                  
                    0
                  
                
                −
                
                  ∑
                  
                    i
                    ∈
                    [
                    
                      m
                      ′
                    
                    ]
                  
                
                
                  y
                  
                    i
                  
                
                
                  a
                  
                    i
                  
                
                (
                x
                )
                
                
                  since 
                
                x
                ∈
                A
              
            
            
              
              
                
                =
                (
                
                  x
                  
                    ≤
                    d
                  
                
                
                  )
                  
                    ⊤
                  
                
                
                  (
                  
                    C
                    −
                    
                      y
                      
                        0
                      
                    
                    
                      e
                      
                        ∅
                      
                    
                    −
                    
                      ∑
                      
                        i
                        ∈
                        [
                        
                          m
                          ′
                        
                        ]
                      
                    
                    
                      y
                      
                        i
                      
                    
                    
                      A
                      
                        i
                      
                    
                    −
                    
                      ∑
                      
                        S
                        ∪
                        T
                        =
                        U
                        ∪
                        V
                      
                    
                    
                      y
                      
                        S
                        ,
                        T
                        ,
                        U
                        ,
                        V
                      
                    
                    (
                    
                      e
                      
                        S
                        ,
                        T
                      
                    
                    −
                    
                      e
                      
                        U
                        ,
                        V
                      
                    
                    )
                  
                  )
                
                
                  x
                  
                    ≤
                    d
                  
                
                
                
                  by symmetry
                
              
            
            
              
              
                
                =
                (
                
                  x
                  
                    ≤
                    d
                  
                
                
                  )
                  
                    ⊤
                  
                
                
                  (
                  
                    
                      ∑
                      
                        i
                      
                    
                    
                      f
                      
                        i
                      
                    
                    
                      f
                      
                        i
                      
                      
                        ⊤
                      
                    
                  
                  )
                
                
                  x
                  
                    ≤
                    d
                  
                
              
            
            
              
              
                
                =
                
                  ∑
                  
                    i
                  
                
                ⟨
                
                  x
                  
                    ≤
                    d
                  
                
                ,
                
                  f
                  
                    i
                  
                
                
                  ⟩
                  
                    2
                  
                
              
            
            
              
              
                
                =
                
                  ∑
                  
                    i
                  
                
                
                  f
                  
                    i
                  
                
                (
                x
                
                  )
                  
                    2
                  
                
                ,
              
            
          
        
      
    
    {\displaystyle {\begin{aligned}p(x)-y_{0}&=p(x)-y_{0}-\sum _{i\in [m']}y_{i}a_{i}(x)\qquad {\text{since }}x\in A\\&=(x^{\leq d})^{\top }\left(C-y_{0}e_{\emptyset }-\sum _{i\in [m']}y_{i}A_{i}-\sum _{S\cup T=U\cup V}y_{S,T,U,V}(e_{S,T}-e_{U,V})\right)x^{\leq d}\qquad {\text{by symmetry}}\\&=(x^{\leq d})^{\top }\left(\sum _{i}f_{i}f_{i}^{\top }\right)x^{\leq d}\\&=\sum _{i}\langle x^{\leq d},f_{i}\rangle ^{2}\\&=\sum _{i}f_{i}(x)^{2},\end{aligned}}}

</p><p>where we have identified the vectors 
  
    
      
        
          f
          
            i
          
        
      
    
    {\textstyle f_{i}}
  
 with the coefficients of a polynomial of degree at most 
  
    
      
        d
      
    
    {\displaystyle d}
  
. This gives a sum-of-squares proof that the value 
  
    
      
        p
        (
        x
        )
        ≥
        
          y
          
            0
          
        
      
    
    {\textstyle p(x)\geq y_{0}}
  
 over 
  
    
      
        A
        ⊂
        
          
            R
          
          
            n
          
        
      
    
    {\displaystyle A\subset \mathbb {R} ^{n}}
  
.
</p><p>The above can also be extended to regions 
  
    
      
        A
        ⊂
        
          
            R
          
          
            n
          
        
      
    
    {\displaystyle A\subset \mathbb {R} ^{n}}
  
 defined by polynomial inequalities.
</p>
<h2 id="sum-of-squares-hierarchy">Sum-of-squares hierarchy</h2>
<p>The sum-of-squares hierarchy (SOS hierarchy), also known as the Lasserre hierarchy, is a hierarchy of convex relaxations of increasing power and increasing computational cost. For each natural number 
  
    
      
        d
        ∈
        
          N
        
      
    
    {\textstyle d\in \mathbb {N} }
  
 the corresponding convex relaxation is known as the <i>
  
    
      
        d
      
    
    {\textstyle d}
  
th level</i> or <i>
  
    
      
        d
      
    
    {\textstyle d}
  
-th round of the SOS hierarchy.</i> The 
  
    
      
        1
      
    
    {\textstyle 1}
  
st round, when 
  
    
      
        d
        =
        1
      
    
    {\textstyle d=1}
  
, corresponds to a basic <a href="/facts/Semidefinite_programming/pFN25EKE">semidefinite program</a>, or to sum-of-squares optimization over polynomials of degree at most 
  
    
      
        2
      
    
    {\displaystyle 2}
  
. To augment the basic convex program at the 
  
    
      
        1
      
    
    {\textstyle 1}
  
st level of the hierarchy to 
  
    
      
        d
      
    
    {\textstyle d}
  
-th level, additional variables and constraints are added to the program to have the program consider polynomials of degree at most 
  
    
      
        2
        d
      
    
    {\displaystyle 2d}
  
.
</p><p>The SOS hierarchy derives its name from the fact that the value of the objective function at the 
  
    
      
        d
      
    
    {\textstyle d}
  
-th level is bounded with a sum-of-squares proof using polynomials of degree at most 
  
    
      
        2
        d
      
    
    {\textstyle 2d}
  
 via the dual (see "Duality" above). Consequently, any sum-of-squares proof that uses polynomials of degree at most 
  
    
      
        2
        d
      
    
    {\textstyle 2d}
  
 can be used to bound the objective value, allowing one to prove guarantees on the tightness of the relaxation.
</p><p>In conjunction with a theorem of Berg, this further implies that given sufficiently many rounds, the relaxation becomes arbitrarily tight on any fixed interval. Berg's result<a class="footnote-ref" id="fnref:5" href="#fn:5"><sup>5</sup></a><a class="footnote-ref" id="fnref:6" href="#fn:6"><sup>6</sup></a> states that every non-negative real polynomial within a bounded interval can be approximated within accuracy 
  
    
      
        ε
      
    
    {\textstyle \varepsilon }
  
 on that interval with a sum-of-squares of real polynomials of sufficiently high degree, and thus if 
  
    
      
        O
        B
        J
        (
        x
        )
      
    
    {\textstyle OBJ(x)}
  
 is the polynomial objective value as a function of the point 
  
    
      
        x
      
    
    {\textstyle x}
  
, if the inequality 
  
    
      
        c
        +
        ε
        −
        O
        B
        J
        (
        x
        )
        ≥
        0
      
    
    {\textstyle c+\varepsilon -OBJ(x)\geq 0}
  
 holds for all 
  
    
      
        x
      
    
    {\textstyle x}
  
 in the region of interest, then there must be a sum-of-squares proof of this fact. Choosing 
  
    
      
        c
      
    
    {\textstyle c}
  
 to be the minimum of the objective function over the feasible region, we have the result.
</p>
<h3>Computational cost</h3>
<p>When optimizing over a function in 
  
    
      
        n
      
    
    {\textstyle n}
  
 variables, the 
  
    
      
        d
      
    
    {\textstyle d}
  
-th level of the hierarchy can be written as a semidefinite program over 
  
    
      
        
          n
          
            O
            (
            d
            )
          
        
      
    
    {\textstyle n^{O(d)}}
  
 variables, and can be solved in time 
  
    
      
        
          n
          
            O
            (
            d
            )
          
        
      
    
    {\textstyle n^{O(d)}}
  
 using the <a href="/facts/Ellipsoid_method/K3SDpleG">ellipsoid method</a>.
</p>
<h2 id="sum-of-squares-background">Sum-of-squares background</h2>
<p class="note">Main article: <a href="/facts/Polynomial_SOS/mwf3SeW4">Polynomial SOS</a></p>
<p>A polynomial 
  
    
      
        p
      
    
    {\displaystyle p}
  
 is a <i>sum of squares</i> (<i>SOS</i>) if there exist polynomials 
  
    
      
        {
        
          f
          
            i
          
        
        
          }
          
            i
            =
            1
          
          
            m
          
        
      
    
    {\displaystyle \{f_{i}\}_{i=1}^{m}}
  
 such that 
  
    
      
        p
        =
        
          ∑
          
            i
            =
            1
          
          
            m
          
        
        
          f
          
            i
          
          
            2
          
        
      
    
    {\textstyle p=\sum _{i=1}^{m}f_{i}^{2}}
  
. For example,

p
        =
        
          x
          
            2
          
        
        −
        4
        x
        y
        +
        7
        
          y
          
            2
          
        
      
    
    {\displaystyle p=x^{2}-4xy+7y^{2}}

is a sum of squares since

p
        =
        
          f
          
            1
          
          
            2
          
        
        +
        
          f
          
            2
          
          
            2
          
        
      
    
    {\displaystyle p=f_{1}^{2}+f_{2}^{2}}

where

f
          
            1
          
        
        =
        (
        x
        −
        2
        y
        )
        
           and 
        
        
          f
          
            2
          
        
        =
        
          
            3
          
        
        y
        .
      
    
    {\displaystyle f_{1}=(x-2y){\text{ and }}f_{2}={\sqrt {3}}y.}

Note that if 
  
    
      
        p
      
    
    {\displaystyle p}
  
 is a sum of squares then 
  
    
      
        p
        (
        x
        )
        ≥
        0
      
    
    {\displaystyle p(x)\geq 0}
  
 for all 
  
    
      
        x
        ∈
        
          
            R
          
          
            n
          
        
      
    
    {\displaystyle x\in \mathbb {R} ^{n}}
  
. Detailed descriptions of <a href="/facts/Polynomial_SOS/mwf3SeW4">polynomial SOS</a> are available.<a class="footnote-ref" id="fnref:7" href="#fn:7"><sup>7</sup></a><a class="footnote-ref" id="fnref:8" href="#fn:8"><sup>8</sup></a><a class="footnote-ref" id="fnref:9" href="#fn:9"><sup>9</sup></a>
</p><p><a href="/facts/Quadratic_forms/l2n1jXPe">Quadratic forms</a> can be expressed as 
  
    
      
        p
        (
        x
        )
        =
        
          x
          
            T
          
        
        Q
        x
      
    
    {\displaystyle p(x)=x^{T}Qx}
  
 where 
  
    
      
        Q
      
    
    {\displaystyle Q}
  
 is a symmetric matrix. Similarly, polynomials of degree ≤ 2<i>d</i> can be expressed as

p
        (
        x
        )
        =
        z
        (
        x
        
          )
          
            
              T
            
          
        
        Q
        z
        (
        x
        )
        ,
      
    
    {\displaystyle p(x)=z(x)^{\mathsf {T}}Qz(x),}

where the vector 
  
    
      
        z
      
    
    {\displaystyle z}
  
 contains all monomials of degree 
  
    
      
        ≤
        d
      
    
    {\displaystyle \leq d}
  
. This is known as the <a href="/facts/Gram_matrix/SGb2VGTU">Gram matrix</a> form. An important fact is that 
  
    
      
        p
      
    
    {\displaystyle p}
  
 is SOS if and only if there exists a symmetric and <a href="/facts/Positive-semidefinite_matrix/Hbr6AuPS">positive-semidefinite matrix</a> 
  
    
      
        Q
      
    
    {\displaystyle Q}
  
 such that 
  
    
      
        p
        (
        x
        )
        =
        z
        (
        x
        
          )
          
            
              T
            
          
        
        Q
        z
        (
        x
        )
      
    
    {\displaystyle p(x)=z(x)^{\mathsf {T}}Qz(x)}
  
.
This provides a connection between SOS polynomials and positive-semidefinite matrices.
</p>
<h2 id="software-tools">Software tools</h2>
<ul><li><a href="http://www.cds.caltech.edu/sostools/">SOSTOOLS</a>, licensed under the <a href="/facts/GNU_GPL/S75KhLSX">GNU GPL</a>. The reference guide is available at arXiv:1310.4716 [math.OC], and a presentation about its internals is available <a href="https://www.youtube.com/watch?v=4_N16UPAgzQ">here</a>.</li>
<li><a href="https://github.com/oxfordcontrol/CDCS/tree/master/packages/%2Bcdcs_sos">CDCS-sos</a>, a package from <a href="https://github.com/oxfordcontrol/CDCS">CDCS</a>, an <a href="/facts/Augmented_Lagrangian_method/8zamSYrl">augmented Lagrangian method</a> solver, to deal with large scale SOS programs.</li>
<li>The <a href="https://github.com/JuliaOpt/SumOfSquares.jl">SumOfSquares</a> extension of <a href="https://github.com/JuliaOpt/JuMP.jl">JuMP</a> for Julia.</li>
<li><a href="https://github.com/wangjie212/TSSOS">TSSOS</a> for Julia, a polynomial optimization tool based on the sparsity adapted moment-SOS hierarchies.</li>
<li>For the dual problem of constrained polynomial optimization, <a href="http://homepages.laas.fr/henrion/software/gloptipoly/">GloptiPoly</a> for MATLAB/Octave, <a href="https://github.com/peterwittek/ncpol2sdpa">Ncpol2sdpa</a> for Python and <a href="https://github.com/lanl-ansi/MomentOpt.jl">MomentOpt</a> for Julia.</li></ul>

<h2 id="references">References</h2>

<ol>
<li id="fn:1"><p>Sum of squares : theory and applications : AMS short course, sum of squares : theory and applications, January 14-15, 2019, Baltimore, Maryland. Parrilo, Pablo A.; Thomas, Rekha R. Providence, Rhode Island: American Mathematical Society. 2020. ISBN 978-1-4704-5025-0. OCLC 1157604983.{{cite book}}:  CS1 maint: others (link) <a href="978-1-4704-5025-0" target="_blank">978-1-4704-5025-0</a> <a href="#fnref:1" class="footnote-back-ref">↩</a></p></li>
<li id="fn:2"><p>Tan, W., Packard, A., 2004. "Searching for control Lyapunov functions using sums of squares programming". In: Allerton Conf. on Comm., Control and Computing. pp. 210–219. <a href="http://jagger.me.berkeley.edu/papers/weehong_3.pdf" target="_blank">http://jagger.me.berkeley.edu/papers/weehong_3.pdf</a> <a href="#fnref:2" class="footnote-back-ref">↩</a></p></li>
<li id="fn:3"><p>Tan, W., Topcu, U., Seiler, P., Balas, G., Packard, A., 2008. Simulation-aided reachability and local gain analysis for nonlinear dynamical systems. In: Proc. of the IEEE Conference on Decision and Control. pp. 4097–4102. <a href="http://folk.ntnu.no/skoge/prost/proceedings/cdc-2008/data/papers/1767.pdf" target="_blank">http://folk.ntnu.no/skoge/prost/proceedings/cdc-2008/data/papers/1767.pdf</a> <a href="#fnref:3" class="footnote-back-ref">↩</a></p></li>
<li id="fn:4"><p>A. Chakraborty, P. Seiler, and G. Balas, "Susceptibility of F/A-18 Flight Controllers to the Falling-Leaf Mode: Nonlinear Analysis," AIAA Journal of Guidance, Control, and Dynamics, vol. 34 no. 1 (2011), pp. 73–85. <a href="http://www.aem.umn.edu/~AerospaceControl/V&VWebpage/Papers/AIAANlin.pdf" target="_blank">http://www.aem.umn.edu/~AerospaceControl/V&VWebpage/Papers/AIAANlin.pdf</a> <a href="#fnref:4" class="footnote-back-ref">↩</a></p></li>
<li id="fn:5"><p>Berg, Christian (1987). "The multidimensional moment problem and semigroups". In Landau, Henry J. (ed.). Moments in Mathematics. Proceedings of Symposia in Applied Mathematics. Vol. 37. pp. 110–124. doi:10.1090/psapm/037/921086. ISBN 9780821801147. <a href="9780821801147" target="_blank">9780821801147</a> <a href="#fnref:5" class="footnote-back-ref">↩</a></p></li>
<li id="fn:6"><p>Lasserre, J. (2007-01-01). "A Sum of Squares Approximation of Nonnegative Polynomials". SIAM Review. 49 (4): 651–669. arXiv:math/0412398. Bibcode:2007SIAMR..49..651L. doi:10.1137/070693709. ISSN 0036-1445. <a href="http://epubs.siam.org/doi/abs/10.1137/070693709" target="_blank">http://epubs.siam.org/doi/abs/10.1137/070693709</a> <a href="#fnref:6" class="footnote-back-ref">↩</a></p></li>
<li id="fn:7"><p>Parrilo, P., (2000) Structured semidefinite programs and semialgebraic geometry methods in robustness and optimization. Ph.D. thesis, California Institute of Technology. <a href="https://thesis.library.caltech.edu/1647/1/Parrilo-Thesis.pdf" target="_blank">https://thesis.library.caltech.edu/1647/1/Parrilo-Thesis.pdf</a> <a href="#fnref:7" class="footnote-back-ref">↩</a></p></li>
<li id="fn:8"><p>Parrilo, P. (2003) "Semidefinite programming relaxations for semialgebraic problems". Mathematical Programming Ser. B 96 (2), 293–320. <a href="http://www.cds.caltech.edu/~doyle/hot/SDPrelaxations.pdf" target="_blank">http://www.cds.caltech.edu/~doyle/hot/SDPrelaxations.pdf</a> <a href="#fnref:8" class="footnote-back-ref">↩</a></p></li>
<li id="fn:9"><p>Lasserre, J. (2001) "Global optimization with polynomials and the problem of moments". SIAM Journal on Optimization, 11 (3), 796{817. <a href="http://khalilghorbal.info/assets/spa/papers/lasserre.pdf" target="_blank">http://khalilghorbal.info/assets/spa/papers/lasserre.pdf</a> <a href="#fnref:9" class="footnote-back-ref">↩</a></p></li>
</ol>

Sum-of-squares optimization open-in-new

Sum-of-squares optimization