In stochastic processes, filtering estimates the state of a system from incomplete and noisy observations, such as using GPS signals to find a car’s true position. The optimal non-linear filtering problem was solved by Ruslan L. Stratonovich and further developed by Harold J. Kushner and Moshe Zakai, who introduced the Zakai equation. While exact solutions are infinite-dimensional, practical approximations include the Wiener filter, Kalman-Bucy filter, and heuristic methods like the extended Kalman filter or particle filters. When the separation principle holds, filtering is central to optimal control, exemplified by the Kalman filter’s role in linear-quadratic-Gaussian control.
The mathematical formalism
Consider a probability space (Ω, Σ, P) and suppose that the (random) state Yt in n-dimensional Euclidean space Rn of a system of interest at time t is a random variable Yt : Ω → Rn given by the solution to an Itō stochastic differential equation of the form
d Y t = b ( t , Y t ) d t + σ ( t , Y t ) d B t , {\displaystyle \mathrm {d} Y_{t}=b(t,Y_{t})\,\mathrm {d} t+\sigma (t,Y_{t})\,\mathrm {d} B_{t},}where B denotes standard p-dimensional Brownian motion, b : [0, +∞) × Rn → Rn is the drift field, and σ : [0, +∞) × Rn → Rn×p is the diffusion field. It is assumed that observations Ht in Rm (note that m and n may, in general, be unequal) are taken for each time t according to
H t = c ( t , Y t ) + γ ( t , Y t ) ⋅ noise . {\displaystyle H_{t}=c(t,Y_{t})+\gamma (t,Y_{t})\cdot {\mbox{noise}}.}Adopting the Itō interpretation of the stochastic differential and setting
Z t = ∫ 0 t H s d s , {\displaystyle Z_{t}=\int _{0}^{t}H_{s}\,\mathrm {d} s,}this gives the following stochastic integral representation for the observations Zt:
d Z t = c ( t , Y t ) d t + γ ( t , Y t ) d W t , {\displaystyle \mathrm {d} Z_{t}=c(t,Y_{t})\,\mathrm {d} t+\gamma (t,Y_{t})\,\mathrm {d} W_{t},}where W denotes standard r-dimensional Brownian motion, independent of B and the initial condition Y0, and c : [0, +∞) × Rn → Rn and γ : [0, +∞) × Rn → Rn×r satisfy
| c ( t , x ) | + | γ ( t , x ) | ≤ C ( 1 + | x | ) {\displaystyle {\big |}c(t,x){\big |}+{\big |}\gamma (t,x){\big |}\leq C{\big (}1+|x|{\big )}}for all t and x and some constant C.
The filtering problem is the following: given observations Zs for 0 ≤ s ≤ t, what is the best estimate Ŷt of the true state Yt of the system based on those observations?
By "based on those observations" it is meant that Ŷt is measurable with respect to the σ-algebra Gt generated by the observations Zs, 0 ≤ s ≤ t. Denote by K = K(Z, t) the collection of all Rn-valued random variables Y that are square-integrable and Gt-measurable:
K = K ( Z , t ) = L 2 ( Ω , G t , P ; R n ) . {\displaystyle K=K(Z,t)=L^{2}(\Omega ,G_{t},\mathbf {P} ;\mathbf {R} ^{n}).}By "best estimate", it is meant that Ŷt minimizes the mean-square distance between Yt and all candidates in K:
E [ | Y t − Y ^ t | 2 ] = inf Y ∈ K E [ | Y t − Y | 2 ] . (M) {\displaystyle \mathbf {E} \left[{\big |}Y_{t}-{\hat {Y}}_{t}{\big |}^{2}\right]=\inf _{Y\in K}\mathbf {E} \left[{\big |}Y_{t}-Y{\big |}^{2}\right].\qquad {\mbox{(M)}}}Basic result: orthogonal projection
The space K(Z, t) of candidates is a Hilbert space, and the general theory of Hilbert spaces implies that the solution Ŷt of the minimization problem (M) is given by
Y ^ t = P K ( Z , t ) ( Y t ) , {\displaystyle {\hat {Y}}_{t}=P_{K(Z,t)}{\big (}Y_{t}{\big )},}where PK(Z,t) denotes the orthogonal projection of L2(Ω, Σ, P; Rn) onto the linear subspace K(Z, t) = L2(Ω, Gt, P; Rn). Furthermore, it is a general fact about conditional expectations that if F is any sub-σ-algebra of Σ then the orthogonal projection
P K : L 2 ( Ω , Σ , P ; R n ) → L 2 ( Ω , F , P ; R n ) {\displaystyle P_{K}:L^{2}(\Omega ,\Sigma ,\mathbf {P} ;\mathbf {R} ^{n})\to L^{2}(\Omega ,F,\mathbf {P} ;\mathbf {R} ^{n})}is exactly the conditional expectation operator E[·|F], i.e.,
P K ( X ) = E [ X | F ] . {\displaystyle P_{K}(X)=\mathbf {E} {\big [}X{\big |}F{\big ]}.}Hence,
Y ^ t = P K ( Z , t ) ( Y t ) = E [ Y t | G t ] . {\displaystyle {\hat {Y}}_{t}=P_{K(Z,t)}{\big (}Y_{t}{\big )}=\mathbf {E} {\big [}Y_{t}{\big |}G_{t}{\big ]}.}This elementary result is the basis for the general Fujisaki-Kallianpur-Kunita equation of filtering theory.
More advanced result: nonlinear filtering SPDE
The complete knowledge of the filter at a time t would be given by the probability law of the signal Yt conditional on the sigma-field Gt generated by observations Z up to time t. If this probability law admits a density, informally
p t ( y ) d y = P ( Y t ∈ d y | G t ) , {\displaystyle p_{t}(y)\ dy={\bf {P}}(Y_{t}\in dy|G_{t}),}then under some regularity assumptions the density p t ( y ) {\displaystyle p_{t}(y)} satisfies a non-linear stochastic partial differential equation (SPDE) driven by d Z t {\displaystyle dZ_{t}} and called Kushner-Stratonovich equation,10 or a unnormalized version q t ( y ) {\displaystyle q_{t}(y)} of the density p t ( y ) {\displaystyle p_{t}(y)} satisfies a linear SPDE called Zakai equation.11 These equations can be formulated for the above system, but to simplify the exposition one can assume that the unobserved signal Y and the partially observed noisy signal Z satisfy the equations
d Y t = b ( t , Y t ) d t + σ ( t , Y t ) d B t , {\displaystyle \mathrm {d} Y_{t}=b(t,Y_{t})\,\mathrm {d} t+\sigma (t,Y_{t})\,\mathrm {d} B_{t},} d Z t = c ( t , Y t ) d t + d W t . {\displaystyle \mathrm {d} Z_{t}=c(t,Y_{t})\,\mathrm {d} t+\mathrm {d} W_{t}.}In other terms, the system is simplified by assuming that the observation noise W is not state dependent.
One might keep a deterministic time dependent γ {\displaystyle \gamma } in front of d W {\displaystyle dW} but we assume this has been taken out by re-scaling.
For this particular system, the Kushner-Stratonovich SPDE for the density p t {\displaystyle p_{t}} reads
d p t = L t ∗ p t d t + p t [ c ( t , ⋅ ) − E p t ( c ( t , ⋅ ) ) ] T [ d Z t − E p t ( c ( t , ⋅ ) ) d t ] {\displaystyle \mathrm {d} p_{t}={\cal {L}}_{t}^{*}p_{t}\ dt+p_{t}[c(t,\cdot )-E_{p_{t}}(c(t,\cdot ))]^{T}[dZ_{t}-E_{p_{t}}(c(t,\cdot ))dt]}where T denotes transposition, E p {\displaystyle E_{p}} denotes the expectation with respect to the density p, E p [ f ] = ∫ f ( y ) p ( y ) d y , {\displaystyle E_{p}[f]=\int f(y)p(y)dy,} and the forward diffusion operator L t ∗ {\displaystyle {\cal {L}}_{t}^{*}} is
L t ∗ f ( t , y ) = − ∑ i ∂ ∂ y i [ b i ( t , y ) f ( t , y ) ] + 1 2 ∑ i , j ∂ 2 ∂ y i ∂ y j [ a i j ( t , y ) f ( t , y ) ] {\displaystyle {\cal {L}}_{t}^{*}f(t,y)=-\sum _{i}{\frac {\partial }{\partial y_{i}}}[b_{i}(t,y)f(t,y)]+{\frac {1}{2}}\sum _{i,j}{\frac {\partial ^{2}}{\partial y_{i}\partial y_{j}}}[a_{ij}(t,y)f(t,y)]}where a = σ σ T {\displaystyle a=\sigma \sigma ^{T}} . If we choose the unnormalized density q t ( y ) {\displaystyle q_{t}(y)} , the Zakai SPDE for the same system reads
d q t = L t ∗ q t d t + q t [ c ( t , ⋅ ) ] T d Z t . {\displaystyle \mathrm {d} q_{t}={\cal {L}}_{t}^{*}q_{t}\ dt+q_{t}[c(t,\cdot )]^{T}dZ_{t}.}These SPDEs for p and q are written in Ito calculus form. It is possible to write them in Stratonovich calculus form, which turns out to be helpful when deriving filtering approximations based on differential geometry, as in the projection filters. For example, the Kushner-Stratonovich equation written in Stratonovich calculus reads
d p t = L t ∗ p t d t − 1 2 p t [ | c ( ⋅ , t ) | 2 − E p t ( | c ( ⋅ , t ) | 2 ) ] d t + p t [ c ( ⋅ , t ) − E p t ( c ( ⋅ , t ) ) ] T ∘ d Z t . {\displaystyle dp_{t}={\cal {L}}_{t}^{\ast }\,p_{t}\,dt-{\frac {1}{2}}\,p_{t}\,[\vert c(\cdot ,t)\vert ^{2}-E_{p_{t}}(\vert c(\cdot ,t)\vert ^{2})]\,dt+p_{t}\,[c(\cdot ,t)-E_{p_{t}}(c(\cdot ,t))]^{T}\circ dZ_{t}\ .}From any of the densities p and q one can calculate all statistics of the signal Yt conditional on the sigma-field generated by observations Z up to time t, so that the densities give complete knowledge of the filter. Under the particular linear-constant assumptions with respect to Y, where the systems coefficients b and c are linear functions of Y and where σ {\displaystyle \sigma } and γ {\displaystyle \gamma } do not depend on Y, with the initial condition for the signal Y being Gaussian or deterministic, the density p t ( y ) {\displaystyle p_{t}(y)} is Gaussian and it can be characterized by its mean and variance-covariance matrix, whose evolution is described by the Kalman-Bucy filter, which is finite dimensional.12 More generally, the evolution of the filter density occurs in an infinite-dimensional function space,13 and it has to be approximated via a finite dimensional approximation, as hinted above.
See also
- The smoothing problem, closely related to the filtering problem
- Filter (signal processing)
- Kalman filter, a well-known filtering algorithm for linear systems, related both to the filtering problem and the smoothing problem
- Extended Kalman filter, an extension of the Kalman filter to nonlinear systems
- Smoothing
- Projection filters
- Particle filters
Further reading
- Jazwinski, Andrew H. (1970). Stochastic Processes and Filtering Theory. New York: Academic Press. ISBN 0-12-381550-9.
- Øksendal, Bernt K. (2003). Stochastic Differential Equations: An Introduction with Applications (Sixth ed.). Berlin: Springer. ISBN 3-540-04758-1. (See Section 6.1)
References
Stratonovich, R. L. (1959). Optimum nonlinear systems which bring about a separation of a signal with constant parameters from noise. Radiofizika, 2:6, pp. 892-901. /wiki/Ruslan_Stratonovich ↩
Stratonovich, R.L. (1960). Application of the Markov processes theory to optimal filtering. Radio Engineering and Electronic Physics, 5:11, pp.1-19. ↩
Kushner, Harold. (1967). Nonlinear filtering: The exact dynamical equations satisfied by the conditional mode. Automatic Control, IEEE Transactions on Volume 12, Issue 3, Jun 1967 Page(s): 262 - 267 /wiki/Harold_J._Kushner ↩
Zakai, Moshe (1969), On the optimal filtering of diffusion processes. Zeit. Wahrsch. 11 230–243. MR242552, Zbl 0164.19201, doi:10.1007/BF00536382 /wiki/Moshe_Zakai ↩
Mireille Chaleyat-Maurel and Dominique Michel. Des resultats de non existence de filtre de dimension finie. Stochastics, 13(1+2):83-102, 1984. ↩
Maybeck, Peter S., Stochastic models, estimation, and control, Volume 141, Series Mathematics in Science and Engineering, 1979, Academic Press ↩
Damiano Brigo, Bernard Hanzon and François LeGland, A Differential Geometric approach to nonlinear filtering: the Projection Filter, I.E.E.E. Transactions on Automatic Control Vol. 43, 2 (1998), pp 247--252. /wiki/Damiano_Brigo ↩
Damiano Brigo, Bernard Hanzon and François Le Gland, Approximate Nonlinear Filtering by Projection on Exponential Manifolds of Densities, Bernoulli, Vol. 5, N. 3 (1999), pp. 495--534 ↩
Del Moral, Pierre (1998). "Measure Valued Processes and Interacting Particle Systems. Application to Non Linear Filtering Problems". Annals of Applied Probability. 8 (2) (Publications du Laboratoire de Statistique et Probabilités, 96-15 (1996) ed.): 438–495. doi:10.1214/aoap/1028903535. http://projecteuclid.org/download/pdf_1/euclid.aoap/1028903535 ↩
Bain, A., and Crisan, D. (2009). Fundamentals of Stochastic Filtering. Springer-Verlag, New York, https://doi.org/10.1007/978-0-387-76896-0 https://doi.org/10.1007/978-0-387-76896-0 ↩
Bain, A., and Crisan, D. (2009). Fundamentals of Stochastic Filtering. Springer-Verlag, New York, https://doi.org/10.1007/978-0-387-76896-0 https://doi.org/10.1007/978-0-387-76896-0 ↩
Bain, A., and Crisan, D. (2009). Fundamentals of Stochastic Filtering. Springer-Verlag, New York, https://doi.org/10.1007/978-0-387-76896-0 https://doi.org/10.1007/978-0-387-76896-0 ↩
Mireille Chaleyat-Maurel and Dominique Michel. Des resultats de non existence de filtre de dimension finie. Stochastics, 13(1+2):83-102, 1984. ↩