Adjoint state method

<h2 id="general-case">General case</h2>
<p>The original adjoint calculation method goes back to Jean Cea,<a class="footnote-ref" id="fnref:6" href="#fn:6"><sup>6</sup></a> with the use of the Lagrangian of the optimization problem to compute the derivative of a <a href="/facts/Functional_(mathematics)/3Cp3PHBR">functional</a> with respect to a <a href="/facts/Shape_optimization/zo4cdRG0">shape</a> parameter.
</p><p>For a state variable 
  
    
      
        u
        ∈
        
          
            U
          
        
      
    
    {\displaystyle u\in {\mathcal {U}}}
  
, an optimization variable 
  
    
      
        v
        ∈
        
          
            V
          
        
      
    
    {\displaystyle v\in {\mathcal {V}}}
  
, an objective functional 
  
    
      
        J
        :
        
          
            U
          
        
        ×
        
          
            V
          
        
        →
        
          R
        
      
    
    {\displaystyle J:{\mathcal {U}}\times {\mathcal {V}}\to \mathbb {R} }
  
 is defined. The state variable 
  
    
      
        u
      
    
    {\displaystyle u}
  
 is often implicitly dependent on 
  
    
      
        v
      
    
    {\displaystyle v}
  
 through the (direct) state equation 
  
    
      
        
          D
          
            v
          
        
        (
        u
        )
        =
        0
      
    
    {\displaystyle D_{v}(u)=0}
  
 (usually the <a href="/facts/Weak_formulation/hXd2jSoW">weak form</a> of a <a href="/facts/Partial_differential_equation/DyPsBlv7">partial differential equation</a>), thus the considered objective is 
  
    
      
        j
        (
        v
        )
        =
        J
        (
        
          u
          
            v
          
        
        ,
        v
        )
      
    
    {\displaystyle j(v)=J(u_{v},v)}
  
, where 
  
    
      
        
          u
          
            v
          
        
      
    
    {\displaystyle u_{v}}
  
 is the solution of the state equation given the optimization variables 
  
    
      
        v
      
    
    {\displaystyle v}
  
. Usually, one would be interested in calculating 
  
    
      
        ∇
        j
        (
        v
        )
      
    
    {\displaystyle \nabla j(v)}
  
 using the <a href="/facts/Chain_rule/aMLYxP0x">chain rule</a>:
</p>

∇
        j
        (
        v
        )
        =
        
          ∇
          
            v
          
        
        J
        (
        
          u
          
            v
          
        
        ,
        v
        )
        +
        
          ∇
          
            u
          
        
        J
        (
        
          u
          
            v
          
        
        )
        
          ∇
          
            v
          
        
        
          u
          
            v
          
        
        .
      
    
    {\displaystyle \nabla j(v)=\nabla _{v}J(u_{v},v)+\nabla _{u}J(u_{v})\nabla _{v}u_{v}.}

<p>Unfortunately, the term 
  
    
      
        
          ∇
          
            v
          
        
        
          u
          
            v
          
        
      
    
    {\displaystyle \nabla _{v}u_{v}}
  
 is often very hard to differentiate analytically since the dependance is defined through an implicit equation. The Lagrangian functional can be used as a workaround for this issue. Since the state equation can be considered as a constraint in the minimization of 
  
    
      
        j
      
    
    {\displaystyle j}
  
, the problem
</p>

minimize
        
         
        j
        (
        v
        )
        =
        J
        (
        
          u
          
            v
          
        
        ,
        v
        )
      
    
    {\displaystyle {\text{minimize}}\ j(v)=J(u_{v},v)}

subject to
        
         
        
          D
          
            v
          
        
        (
        
          u
          
            v
          
        
        )
        =
        0
      
    
    {\displaystyle {\text{subject to}}\ D_{v}(u_{v})=0}

<p>has an associate Lagrangian functional 
  
    
      
        
          
            L
          
        
        :
        
          
            U
          
        
        ×
        
          
            V
          
        
        ×
        
          
            U
          
        
        →
        
          R
        
      
    
    {\displaystyle {\mathcal {L}}:{\mathcal {U}}\times {\mathcal {V}}\times {\mathcal {U}}\to \mathbb {R} }
  
 defined by
</p>

L
          
        
        (
        u
        ,
        v
        ,
        λ
        )
        =
        J
        (
        u
        ,
        v
        )
        +
        ⟨
        
          D
          
            v
          
        
        (
        u
        )
        ,
        λ
        ⟩
        ,
      
    
    {\displaystyle {\mathcal {L}}(u,v,\lambda )=J(u,v)+\langle D_{v}(u),\lambda \rangle ,}

<p>where 
  
    
      
        λ
        ∈
        
          
            U
          
        
      
    
    {\displaystyle \lambda \in {\mathcal {U}}}
  
 is a <a href="/facts/Lagrange_multiplier/U5mSI5YP">Lagrange multiplier</a> or adjoint state variable and 
  
    
      
        ⟨
        ⋅
        ,
        ⋅
        ⟩
      
    
    {\displaystyle \langle \cdot ,\cdot \rangle }
  
 is an <a href="/facts/Inner_product/HoyjElEy">inner product</a> on 
  
    
      
        
          
            U
          
        
      
    
    {\displaystyle {\mathcal {U}}}
  
. The method of Lagrange multipliers states that a solution to the problem has to be a <a href="/facts/Stationary_point/f56GSgBj">stationary point</a> of the lagrangian, namely
</p>

{
            
              
                
                  
                    d
                    
                      u
                    
                  
                  
                    
                      L
                    
                  
                  (
                  u
                  ,
                  v
                  ,
                  λ
                  ;
                  
                    δ
                    
                      u
                    
                  
                  )
                  =
                  
                    d
                    
                      u
                    
                  
                  J
                  (
                  u
                  ,
                  v
                  ;
                  
                    δ
                    
                      u
                    
                  
                  )
                  +
                  ⟨
                  
                    δ
                    
                      u
                    
                  
                  ,
                  
                    D
                    
                      v
                    
                    
                      ∗
                    
                  
                  (
                  λ
                  )
                  ⟩
                  =
                  0
                
                
                  ∀
                  
                    δ
                    
                      u
                    
                  
                  ∈
                  
                    
                      U
                    
                  
                  ,
                
              
              
                
                  
                    d
                    
                      v
                    
                  
                  
                    
                      L
                    
                  
                  (
                  u
                  ,
                  v
                  ,
                  λ
                  ;
                  
                    δ
                    
                      v
                    
                  
                  )
                  =
                  
                    d
                    
                      v
                    
                  
                  J
                  (
                  u
                  ,
                  v
                  ;
                  
                    δ
                    
                      v
                    
                  
                  )
                  +
                  ⟨
                  
                    d
                    
                      v
                    
                  
                  
                    D
                    
                      v
                    
                  
                  (
                  u
                  ;
                  
                    δ
                    
                      v
                    
                  
                  )
                  ,
                  λ
                  ⟩
                  =
                  0
                
                
                  ∀
                  
                    δ
                    
                      v
                    
                  
                  ∈
                  
                    
                      V
                    
                  
                  ,
                
              
              
                
                  
                    d
                    
                      λ
                    
                  
                  
                    
                      L
                    
                  
                  (
                  u
                  ,
                  v
                  ,
                  λ
                  ;
                  
                    δ
                    
                      λ
                    
                  
                  )
                  =
                  ⟨
                  
                    D
                    
                      v
                    
                  
                  (
                  u
                  )
                  ,
                  
                    δ
                    
                      λ
                    
                  
                  ⟩
                  =
                  0
                  
                
                
                  ∀
                  
                    δ
                    
                      λ
                    
                  
                  ∈
                  
                    
                      U
                    
                  
                  ,
                
              
            
            
          
        
      
    
    {\displaystyle {\begin{cases}d_{u}{\mathcal {L}}(u,v,\lambda ;\delta _{u})=d_{u}J(u,v;\delta _{u})+\langle \delta _{u},D_{v}^{*}(\lambda )\rangle =0&\forall \delta _{u}\in {\mathcal {U}},\\d_{v}{\mathcal {L}}(u,v,\lambda ;\delta _{v})=d_{v}J(u,v;\delta _{v})+\langle d_{v}D_{v}(u;\delta _{v}),\lambda \rangle =0&\forall \delta _{v}\in {\mathcal {V}},\\d_{\lambda }{\mathcal {L}}(u,v,\lambda ;\delta _{\lambda })=\langle D_{v}(u),\delta _{\lambda }\rangle =0\quad &\forall \delta _{\lambda }\in {\mathcal {U}},\end{cases}}}

<p>where 
  
    
      
        
          d
          
            x
          
        
        F
        (
        x
        ;
        
          δ
          
            x
          
        
        )
      
    
    {\displaystyle d_{x}F(x;\delta _{x})}
  
 is the <a href="/facts/Gateaux_derivative/dqdtTtpf">Gateaux derivative</a> of 
  
    
      
        F
      
    
    {\displaystyle F}
  
 with respect to 
  
    
      
        x
      
    
    {\displaystyle x}
  
 in the direction 
  
    
      
        
          δ
          
            x
          
        
      
    
    {\displaystyle \delta _{x}}
  
. The last equation is equivalent to 
  
    
      
        
          D
          
            v
          
        
        (
        u
        )
        =
        0
      
    
    {\displaystyle D_{v}(u)=0}
  
, the state equation, to which the solution is 
  
    
      
        
          u
          
            v
          
        
      
    
    {\displaystyle u_{v}}
  
. The first equation is the so-called adjoint state equation,
</p>

⟨
        
          δ
          
            u
          
        
        ,
        
          D
          
            v
          
          
            ∗
          
        
        (
        λ
        )
        ⟩
        =
        −
        
          d
          
            u
          
        
        J
        (
        
          u
          
            v
          
        
        ,
        v
        ;
        
          δ
          
            u
          
        
        )
        
        ∀
        
          δ
          
            u
          
        
        ∈
        
          
            U
          
        
        ,
      
    
    {\displaystyle \langle \delta _{u},D_{v}^{*}(\lambda )\rangle =-d_{u}J(u_{v},v;\delta _{u})\quad \forall \delta _{u}\in {\mathcal {U}},}

<p>because the operator involved is the adjoint operator of 
  
    
      
        
          D
          
            v
          
        
      
    
    {\displaystyle D_{v}}
  
, 
  
    
      
        
          D
          
            v
          
          
            ∗
          
        
      
    
    {\displaystyle D_{v}^{*}}
  
. Resolving this equation yields the adjoint state 
  
    
      
        
          λ
          
            v
          
        
      
    
    {\displaystyle \lambda _{v}}
  
.
The gradient of the quantity of interest 
  
    
      
        j
      
    
    {\displaystyle j}
  
 with respect to 
  
    
      
        v
      
    
    {\displaystyle v}
  
 is 
  
    
      
        ⟨
        ∇
        j
        (
        v
        )
        ,
        
          δ
          
            v
          
        
        ⟩
        =
        
          d
          
            v
          
        
        j
        (
        v
        ;
        
          δ
          
            v
          
        
        )
        =
        
          d
          
            v
          
        
        
          
            L
          
        
        (
        
          u
          
            v
          
        
        ,
        v
        ,
        
          λ
          
            v
          
        
        ;
        
          δ
          
            v
          
        
        )
      
    
    {\displaystyle \langle \nabla j(v),\delta _{v}\rangle =d_{v}j(v;\delta _{v})=d_{v}{\mathcal {L}}(u_{v},v,\lambda _{v};\delta _{v})}
  
 (the second equation with 
  
    
      
        u
        =
        
          u
          
            v
          
        
      
    
    {\displaystyle u=u_{v}}
  
 and 
  
    
      
        λ
        =
        
          λ
          
            v
          
        
      
    
    {\displaystyle \lambda =\lambda _{v}}
  
), thus it can be easily identified by subsequently resolving the direct and adjoint state equations. The process is even simpler when the operator 
  
    
      
        
          D
          
            v
          
        
      
    
    {\displaystyle D_{v}}
  
 is <a href="/facts/Self-adjoint_operator/VrH1nEAK">self-adjoint</a> or symmetric since the direct and adjoint state equations differ only by their right-hand side.
</p>
<h2 id="example-linear-case">Example: Linear case</h2>
<p>In a real finite dimensional <a href="/facts/Linear_programming/GduXFQxT">linear programming</a> context, the objective function could be 
  
    
      
        J
        (
        u
        ,
        v
        )
        =
        ⟨
        A
        u
        ,
        v
        ⟩
      
    
    {\displaystyle J(u,v)=\langle Au,v\rangle }
  
, for 
  
    
      
        v
        ∈
        
          
            R
          
          
            n
          
        
      
    
    {\displaystyle v\in \mathbb {R} ^{n}}
  
, 
  
    
      
        u
        ∈
        
          
            R
          
          
            m
          
        
      
    
    {\displaystyle u\in \mathbb {R} ^{m}}
  
 and 
  
    
      
        A
        ∈
        
          
            R
          
          
            n
            ×
            m
          
        
      
    
    {\displaystyle A\in \mathbb {R} ^{n\times m}}
  
, and let the state equation be 
  
    
      
        
          B
          
            v
          
        
        u
        =
        b
      
    
    {\displaystyle B_{v}u=b}
  
, with 
  
    
      
        
          B
          
            v
          
        
        ∈
        
          
            R
          
          
            m
            ×
            m
          
        
      
    
    {\displaystyle B_{v}\in \mathbb {R} ^{m\times m}}
  
 and 
  
    
      
        b
        ∈
        
          
            R
          
          
            m
          
        
      
    
    {\displaystyle b\in \mathbb {R} ^{m}}
  
.
</p><p>The Lagrangian function of the problem is 
  
    
      
        
          
            L
          
        
        (
        u
        ,
        v
        ,
        λ
        )
        =
        ⟨
        A
        u
        ,
        v
        ⟩
        +
        ⟨
        
          B
          
            v
          
        
        u
        −
        b
        ,
        λ
        ⟩
      
    
    {\displaystyle {\mathcal {L}}(u,v,\lambda )=\langle Au,v\rangle +\langle B_{v}u-b,\lambda \rangle }
  
, where 
  
    
      
        λ
        ∈
        
          
            R
          
          
            m
          
        
      
    
    {\displaystyle \lambda \in \mathbb {R} ^{m}}
  
. 
</p><p>The derivative of 
  
    
      
        
          
            L
          
        
      
    
    {\displaystyle {\mathcal {L}}}
  
 with respect to 
  
    
      
        λ
      
    
    {\displaystyle \lambda }
  
 yields the state equation as shown before, and the state variable is 
  
    
      
        
          u
          
            v
          
        
        =
        
          B
          
            v
          
          
            −
            1
          
        
        b
      
    
    {\displaystyle u_{v}=B_{v}^{-1}b}
  
. The derivative of 
  
    
      
        
          
            L
          
        
      
    
    {\displaystyle {\mathcal {L}}}
  
 with respect to 
  
    
      
        u
      
    
    {\displaystyle u}
  
 is equivalent to the adjoint equation, which is, for every 
  
    
      
        
          δ
          
            u
          
        
        ∈
        
          
            R
          
          
            m
          
        
      
    
    {\displaystyle \delta _{u}\in \mathbb {R} ^{m}}
  
,
</p>

d
          
            u
          
        
        [
        ⟨
        
          B
          
            v
          
        
        ⋅
        −
        b
        ,
        λ
        ⟩
        ]
        (
        u
        ;
        
          δ
          
            u
          
        
        )
        =
        −
        ⟨
        
          A
          
            ⊤
          
        
        v
        ,
        δ
        u
        ⟩
        
        ⟺
        
        ⟨
        
          B
          
            v
          
        
        
          δ
          
            u
          
        
        ,
        λ
        ⟩
        =
        −
        ⟨
        
          A
          
            ⊤
          
        
        v
        ,
        δ
        u
        ⟩
        
        ⟺
        
        ⟨
        
          B
          
            v
          
          
            ⊤
          
        
        λ
        +
        
          A
          
            ⊤
          
        
        v
        ,
        
          δ
          
            u
          
        
        ⟩
        =
        0
        
        ⟺
        
        
          B
          
            v
          
          
            ⊤
          
        
        λ
        =
        −
        
          A
          
            ⊤
          
        
        v
        .
      
    
    {\displaystyle d_{u}[\langle B_{v}\cdot -b,\lambda \rangle ](u;\delta _{u})=-\langle A^{\top }v,\delta u\rangle \iff \langle B_{v}\delta _{u},\lambda \rangle =-\langle A^{\top }v,\delta u\rangle \iff \langle B_{v}^{\top }\lambda +A^{\top }v,\delta _{u}\rangle =0\iff B_{v}^{\top }\lambda =-A^{\top }v.}

<p>Thus, we can write symbolically 
  
    
      
        
          λ
          
            v
          
        
        =
        
          B
          
            v
          
          
            −
            ⊤
          
        
        
          A
          
            ⊤
          
        
        v
      
    
    {\displaystyle \lambda _{v}=B_{v}^{-\top }A^{\top }v}
  
. The gradient would be
</p>

⟨
        ∇
        j
        (
        v
        )
        ,
        
          δ
          
            v
          
        
        ⟩
        =
        ⟨
        A
        
          u
          
            v
          
        
        ,
        
          δ
          
            v
          
        
        ⟩
        +
        ⟨
        
          ∇
          
            v
          
        
        
          B
          
            v
          
        
        :
        
          λ
          
            v
          
        
        ⊗
        
          u
          
            v
          
        
        ,
        
          δ
          
            v
          
        
        ⟩
        ,
      
    
    {\displaystyle \langle \nabla j(v),\delta _{v}\rangle =\langle Au_{v},\delta _{v}\rangle +\langle \nabla _{v}B_{v}:\lambda _{v}\otimes u_{v},\delta _{v}\rangle ,}

<p>where 
  
    
      
        
          ∇
          
            v
          
        
        
          B
          
            v
          
        
        =
        
          
            
              ∂
              
                B
                
                  i
                  j
                
              
            
            
              ∂
              
                v
                
                  k
                
              
            
          
        
      
    
    {\displaystyle \nabla _{v}B_{v}={\frac {\partial B_{ij}}{\partial v_{k}}}}
  
 is a third-order <a href="/facts/Tensor/pNjFo5tV">tensor</a>, 
  
    
      
        
          λ
          
            v
          
        
        ⊗
        
          u
          
            v
          
        
        =
        
          λ
          
            v
          
          
            ⊤
          
        
        
          u
          
            v
          
        
      
    
    {\displaystyle \lambda _{v}\otimes u_{v}=\lambda _{v}^{\top }u_{v}}
  
 is the <a href="/facts/Dyadic_product/FoLJnrwR">dyadic product</a> between the direct and adjoint states and 
  
    
      
        :
      
    
    {\displaystyle :}
  
 denotes a double <a href="/facts/Tensor_contraction/fqz0zvWp">tensor contraction</a>. It is assumed that 
  
    
      
        
          B
          
            v
          
        
      
    
    {\displaystyle B_{v}}
  
 has a known analytic expression that can be differentiated easily.
</p>
<h3>Numerical consideration for the self-adjoint case</h3>
<p>If the operator 
  
    
      
        
          B
          
            v
          
        
      
    
    {\displaystyle B_{v}}
  
 was self-adjoint, 
  
    
      
        
          B
          
            v
          
        
        =
        
          B
          
            v
          
          
            ⊤
          
        
      
    
    {\displaystyle B_{v}=B_{v}^{\top }}
  
, the direct state equation and the adjoint state equation would have the same left-hand side. In the goal of never inverting a matrix, which is a very slow process numerically, a <a href="/facts/LU_decomposition/M13Ew8RD">LU decomposition</a> can be used instead to solve the state equation, in 
  
    
      
        O
        (
        
          m
          
            3
          
        
        )
      
    
    {\displaystyle O(m^{3})}
  
 operations for the decomposition and 
  
    
      
        O
        (
        
          m
          
            2
          
        
        )
      
    
    {\displaystyle O(m^{2})}
  
 operations for the resolution. That same decomposition can then be used to solve the adjoint state equation in only 
  
    
      
        O
        (
        
          m
          
            2
          
        
        )
      
    
    {\displaystyle O(m^{2})}
  
 operations since the matrices are the same.
</p>
<h2 id="see-also">See also</h2>
<ul><li><a href="/facts/Adjoint_equation/e6yG2P1m">Adjoint equation</a></li>
<li><a href="/facts/Backpropagation/lCsIdKHc">Backpropagation</a></li>
<li><a href="/facts/Lagrange_multipliers/U5mSI5YP">Method of Lagrange multipliers</a></li>
<li><a href="/facts/Shape_optimization/zo4cdRG0">Shape optimization</a></li></ul>

<h2 id="external-links">External links</h2>
<ul><li>A well written explanation by Errico: <a href="https://dx.doi.org/10.1175/1520-0477(1997)078%3C2577:WIAAM%3E2.0.CO;2">What is an adjoint Model? </a></li>
<li>Another well written explanation with worked examples, written by Bradley <a href="http://cs.stanford.edu/~ambrad/adjoint_tutorial.pdf">[1]</a></li>
<li>More technical explanation: A <a href="https://doi.org/10.1111/j.1365-246X.2006.02978.x">review</a> of the adjoint-state method for computing the gradient of a functional with geophysical applications</li>
<li>MIT course <a href="https://web.archive.org/web/20150906095132/http://ocw.mit.edu/courses/mathematics/18-325-topics-in-applied-mathematics-waves-and-imaging-fall-2012/lecture-notes/MIT18_325F12_Chapter4.pdf">[2]</a></li>
<li>MIT notes <a href="http://math.mit.edu/~stevenj/18.336/adjoint.pdf">[3]</a></li></ul>

<h2 id="references">References</h2>

<ol>
<li id="fn:1"><p>Pollini, Nicolò; Lavan, Oren; Amir, Oded (2018-06-01). "Adjoint sensitivity analysis and optimization of hysteretic dynamic systems with nonlinear viscous dampers". Structural and Multidisciplinary Optimization. 57 (6): 2273–2289. doi:10.1007/s00158-017-1858-2. ISSN 1615-1488. S2CID 125712091. <a href="/wiki/Doi_(identifier)" target="_blank">/wiki/Doi_(identifier)</a> <a href="#fnref:1" class="footnote-back-ref">↩</a></p></li>
<li id="fn:2"><p>Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, David Duvenaud Neural Ordinary Differential Equations Available online <a href="https://arxiv.org/abs/1806.07366" target="_blank">https://arxiv.org/abs/1806.07366</a> <a href="#fnref:2" class="footnote-back-ref">↩</a></p></li>
<li id="fn:3"><p>Plessix, R-E. "A review of the adjoint-state method for computing the gradient of a functional with geophysical applications." Geophysical Journal International, 2006, 167(2): 495-503. free access on GJI website <a href="https://academic.oup.com/gji/article/167/2/495/559970" target="_blank">https://academic.oup.com/gji/article/167/2/495/559970</a> <a href="#fnref:3" class="footnote-back-ref">↩</a></p></li>
<li id="fn:4"><p>McNamara, Antoine; Treuille, Adrien; Popović, Zoran; Stam, Jos (August 2004). "Fluid control using the adjoint method" (PDF). ACM Transactions on Graphics. 23 (3): 449–456. doi:10.1145/1015706.1015744. Archived (PDF) from the original on 29 January 2022. Retrieved 28 October 2022. <a href="https://www.dgp.toronto.edu/public_user/stam/reality/Research/pdf/sig04.pdf" target="_blank">https://www.dgp.toronto.edu/public_user/stam/reality/Research/pdf/sig04.pdf</a> <a href="#fnref:4" class="footnote-back-ref">↩</a></p></li>
<li id="fn:5"><p>Lundvall, Johan (2007). "Data Assimilation in Fluid Dynamics using Adjoint Optimization" (PDF). Sweden: Linköping University of Technology. Archived (PDF) from the original on 9 October 2022. Retrieved 28 October 2022. <a href="http://liu.diva-portal.org/smash/get/diva2:24091/FULLTEXT01.pdf" target="_blank">http://liu.diva-portal.org/smash/get/diva2:24091/FULLTEXT01.pdf</a> <a href="#fnref:5" class="footnote-back-ref">↩</a></p></li>
<li id="fn:6"><p>Cea, Jean (1986). "Conception optimale ou identification de formes, calcul rapide de la dérivée directionnelle de la fonction coût". ESAIM: Mathematical Modelling and Numerical Analysis - Modélisation Mathématique et Analyse Numérique (in French). 20 (3): 371–402. doi:10.1051/m2an/1986200303711. <a href="http://www.numdam.org/item/M2AN_1986__20_3_371_0/" target="_blank">http://www.numdam.org/item/M2AN_1986__20_3_371_0/</a> <a href="#fnref:6" class="footnote-back-ref">↩</a></p></li>
</ol>

Adjoint state method open-in-new

Adjoint state method