The original adjoint calculation method goes back to Jean Cea,6 with the use of the Lagrangian of the optimization problem to compute the derivative of a functional with respect to a shape parameter.
For a state variable u ∈ U {\displaystyle u\in {\mathcal {U}}} , an optimization variable v ∈ V {\displaystyle v\in {\mathcal {V}}} , an objective functional J : U × V → R {\displaystyle J:{\mathcal {U}}\times {\mathcal {V}}\to \mathbb {R} } is defined. The state variable u {\displaystyle u} is often implicitly dependent on v {\displaystyle v} through the (direct) state equation D v ( u ) = 0 {\displaystyle D_{v}(u)=0} (usually the weak form of a partial differential equation), thus the considered objective is j ( v ) = J ( u v , v ) {\displaystyle j(v)=J(u_{v},v)} , where u v {\displaystyle u_{v}} is the solution of the state equation given the optimization variables v {\displaystyle v} . Usually, one would be interested in calculating ∇ j ( v ) {\displaystyle \nabla j(v)} using the chain rule:
Unfortunately, the term ∇ v u v {\displaystyle \nabla _{v}u_{v}} is often very hard to differentiate analytically since the dependance is defined through an implicit equation. The Lagrangian functional can be used as a workaround for this issue. Since the state equation can be considered as a constraint in the minimization of j {\displaystyle j} , the problem
has an associate Lagrangian functional L : U × V × U → R {\displaystyle {\mathcal {L}}:{\mathcal {U}}\times {\mathcal {V}}\times {\mathcal {U}}\to \mathbb {R} } defined by
where λ ∈ U {\displaystyle \lambda \in {\mathcal {U}}} is a Lagrange multiplier or adjoint state variable and ⟨ ⋅ , ⋅ ⟩ {\displaystyle \langle \cdot ,\cdot \rangle } is an inner product on U {\displaystyle {\mathcal {U}}} . The method of Lagrange multipliers states that a solution to the problem has to be a stationary point of the lagrangian, namely
where d x F ( x ; δ x ) {\displaystyle d_{x}F(x;\delta _{x})} is the Gateaux derivative of F {\displaystyle F} with respect to x {\displaystyle x} in the direction δ x {\displaystyle \delta _{x}} . The last equation is equivalent to D v ( u ) = 0 {\displaystyle D_{v}(u)=0} , the state equation, to which the solution is u v {\displaystyle u_{v}} . The first equation is the so-called adjoint state equation,
because the operator involved is the adjoint operator of D v {\displaystyle D_{v}} , D v ∗ {\displaystyle D_{v}^{*}} . Resolving this equation yields the adjoint state λ v {\displaystyle \lambda _{v}} . The gradient of the quantity of interest j {\displaystyle j} with respect to v {\displaystyle v} is ⟨ ∇ j ( v ) , δ v ⟩ = d v j ( v ; δ v ) = d v L ( u v , v , λ v ; δ v ) {\displaystyle \langle \nabla j(v),\delta _{v}\rangle =d_{v}j(v;\delta _{v})=d_{v}{\mathcal {L}}(u_{v},v,\lambda _{v};\delta _{v})} (the second equation with u = u v {\displaystyle u=u_{v}} and λ = λ v {\displaystyle \lambda =\lambda _{v}} ), thus it can be easily identified by subsequently resolving the direct and adjoint state equations. The process is even simpler when the operator D v {\displaystyle D_{v}} is self-adjoint or symmetric since the direct and adjoint state equations differ only by their right-hand side.
In a real finite dimensional linear programming context, the objective function could be J ( u , v ) = ⟨ A u , v ⟩ {\displaystyle J(u,v)=\langle Au,v\rangle } , for v ∈ R n {\displaystyle v\in \mathbb {R} ^{n}} , u ∈ R m {\displaystyle u\in \mathbb {R} ^{m}} and A ∈ R n × m {\displaystyle A\in \mathbb {R} ^{n\times m}} , and let the state equation be B v u = b {\displaystyle B_{v}u=b} , with B v ∈ R m × m {\displaystyle B_{v}\in \mathbb {R} ^{m\times m}} and b ∈ R m {\displaystyle b\in \mathbb {R} ^{m}} .
The Lagrangian function of the problem is L ( u , v , λ ) = ⟨ A u , v ⟩ + ⟨ B v u − b , λ ⟩ {\displaystyle {\mathcal {L}}(u,v,\lambda )=\langle Au,v\rangle +\langle B_{v}u-b,\lambda \rangle } , where λ ∈ R m {\displaystyle \lambda \in \mathbb {R} ^{m}} .
The derivative of L {\displaystyle {\mathcal {L}}} with respect to λ {\displaystyle \lambda } yields the state equation as shown before, and the state variable is u v = B v − 1 b {\displaystyle u_{v}=B_{v}^{-1}b} . The derivative of L {\displaystyle {\mathcal {L}}} with respect to u {\displaystyle u} is equivalent to the adjoint equation, which is, for every δ u ∈ R m {\displaystyle \delta _{u}\in \mathbb {R} ^{m}} ,
Thus, we can write symbolically λ v = B v − ⊤ A ⊤ v {\displaystyle \lambda _{v}=B_{v}^{-\top }A^{\top }v} . The gradient would be
where ∇ v B v = ∂ B i j ∂ v k {\displaystyle \nabla _{v}B_{v}={\frac {\partial B_{ij}}{\partial v_{k}}}} is a third-order tensor, λ v ⊗ u v = λ v ⊤ u v {\displaystyle \lambda _{v}\otimes u_{v}=\lambda _{v}^{\top }u_{v}} is the dyadic product between the direct and adjoint states and : {\displaystyle :} denotes a double tensor contraction. It is assumed that B v {\displaystyle B_{v}} has a known analytic expression that can be differentiated easily.
If the operator B v {\displaystyle B_{v}} was self-adjoint, B v = B v ⊤ {\displaystyle B_{v}=B_{v}^{\top }} , the direct state equation and the adjoint state equation would have the same left-hand side. In the goal of never inverting a matrix, which is a very slow process numerically, a LU decomposition can be used instead to solve the state equation, in O ( m 3 ) {\displaystyle O(m^{3})} operations for the decomposition and O ( m 2 ) {\displaystyle O(m^{2})} operations for the resolution. That same decomposition can then be used to solve the adjoint state equation in only O ( m 2 ) {\displaystyle O(m^{2})} operations since the matrices are the same.
Pollini, Nicolò; Lavan, Oren; Amir, Oded (2018-06-01). "Adjoint sensitivity analysis and optimization of hysteretic dynamic systems with nonlinear viscous dampers". Structural and Multidisciplinary Optimization. 57 (6): 2273–2289. doi:10.1007/s00158-017-1858-2. ISSN 1615-1488. S2CID 125712091. /wiki/Doi_(identifier) ↩
Ricky T. Q. Chen, Yulia Rubanova, Jesse Bettencourt, David Duvenaud Neural Ordinary Differential Equations Available online https://arxiv.org/abs/1806.07366 ↩
Plessix, R-E. "A review of the adjoint-state method for computing the gradient of a functional with geophysical applications." Geophysical Journal International, 2006, 167(2): 495-503. free access on GJI website https://academic.oup.com/gji/article/167/2/495/559970 ↩
McNamara, Antoine; Treuille, Adrien; Popović, Zoran; Stam, Jos (August 2004). "Fluid control using the adjoint method" (PDF). ACM Transactions on Graphics. 23 (3): 449–456. doi:10.1145/1015706.1015744. Archived (PDF) from the original on 29 January 2022. Retrieved 28 October 2022. https://www.dgp.toronto.edu/public_user/stam/reality/Research/pdf/sig04.pdf ↩
Lundvall, Johan (2007). "Data Assimilation in Fluid Dynamics using Adjoint Optimization" (PDF). Sweden: Linköping University of Technology. Archived (PDF) from the original on 9 October 2022. Retrieved 28 October 2022. http://liu.diva-portal.org/smash/get/diva2:24091/FULLTEXT01.pdf ↩
Cea, Jean (1986). "Conception optimale ou identification de formes, calcul rapide de la dérivée directionnelle de la fonction coût". ESAIM: Mathematical Modelling and Numerical Analysis - Modélisation Mathématique et Analyse Numérique (in French). 20 (3): 371–402. doi:10.1051/m2an/1986200303711. http://www.numdam.org/item/M2AN_1986__20_3_371_0/ ↩