In computer science, a predictive state representation (PSR) is a way to model a state of controlled dynamical system from a history of actions taken and resulting observations. PSR captures the state of a system as a vector of predictions for future tests (experiments) that can be done on the system. A test is a sequence of action-observation pairs and its prediction is the probability of the test's observation-sequence happening if the test's action-sequence were to be executed on the system. One of the advantage of using PSR is that the predictions are directly related to observable quantities. This is in contrast to other models of dynamical systems, such as partially observable Markov decision processes (POMDPs) where the state of the system is represented as a probability distribution over unobserved nominal states.
Definition
Consider a dynamic system based on a discrete set A {\displaystyle {\mathcal {A}}} of actions and a discrete set O {\displaystyle {\mathcal {O}}} of observations.3 A history h {\displaystyle h} is a sequence a 1 o 1 … a ℓ o ℓ {\displaystyle a_{1}o_{1}\dots a_{\ell }o_{\ell }} where a 1 , … , a ℓ {\displaystyle a_{1},\dots ,a_{\ell }} are the actions taken by the agent in that order and o 1 , … , o ℓ {\displaystyle o_{1},\dots ,o_{\ell }} are the observations made up by the agent. Let us write P ( a 1 o 1 … a ℓ o ℓ ) {\displaystyle P(a_{1}o_{1}\dots a_{\ell }o_{\ell })} to be the conditional probability of observing o 1 , … , o ℓ {\displaystyle o_{1},\dots ,o_{\ell }} given that the actions taken are a 1 , … , a ℓ {\displaystyle a_{1},\dots ,a_{\ell }} .
We now want to characterize a given hidden state reached after some history h {\displaystyle h} . To do that, we introduce the notion of test. A test t {\displaystyle t} is of the same type that a history: it is a sequence of action-observation pairs. The idea is now to consider a set of tests { t 1 , … , t n } {\displaystyle \{t_{1},\dots ,t_{n}\}} to full characterize a hidden state. To do that, we first define the probability of a test t {\displaystyle t} conditional on a history h {\displaystyle h} : P ( t ∣ h ) := P ( h t ) P ( h ) {\displaystyle P(t\mid h):={\frac {P(ht)}{P(h)}}} .
We now define the prediction vector p ( h ) = [ P ( t 1 ∣ h ) , … , P ( t n ∣ h ) ] {\displaystyle p(h)=[P(t_{1}\mid h),\dots ,P(t_{n}\mid h)]} . We say that p ( h ) {\displaystyle p(h)} is a predictive state representation (PSR) if and only if it forms a sufficient statistic for the system. In other words, p ( h ) {\displaystyle p(h)} is a predictive state representation (PSR) if and only if for all possible tests t {\displaystyle t} , there exists a function f t {\displaystyle f_{t}} such that for all histories h {\displaystyle h} , P ( t ∣ h ) = f t ( p ( h ) ) {\displaystyle P(t\mid h)=f_{t}(p(h))} .
The functions f t {\displaystyle f_{t}} are called projection functions. We say that the PSR is linear when the function f t {\displaystyle f_{t}} is linear for all possible tests t {\displaystyle t} . The main theorem proved in 4 is stated as follows.
Theorem. Consider a finite POMDP with k {\displaystyle k} states. Then there exists a linear PSR with a number of tests n {\displaystyle n} smaller that k {\displaystyle k} .
- Littman, Michael L.; Richard S. Sutton; Satinder Singh (2002). "Predictive Representations of State" (PDF). Advances in Neural Information Processing Systems 14 (NIPS). pp. 1555–1561.
- Singh, Satinder; Michael R. James; Matthew R. Rudary (2004). "Predictive State Representations: A New Theory for Modeling Dynamical Systems" (PDF). Uncertainty in Artificial Intelligence: Proceedings of the Twentieth Conference (UAI). pp. 512–519.
- Wiewiora, Eric Walter (2008), Modeling Probability Distributions with Predictive State Representations (PDF)
References
James, Michael R.; Singh, Satinder (2004). "Learning and discovery of predictive state representations in dynamical systems with reset". Twenty-first international conference on Machine learning - ICML '04. p. 53. CiteSeerX 10.1.1.67.5179. doi:10.1145/1015330.1015359. ISBN 978-1-58113-838-2. S2CID 9111832. 978-1-58113-838-2 ↩
Izadi, Masoumeh T.; Precup, Doina (9 August 2003). "A planning algorithm for predictive state representations". Proceedings of the 18th International Joint Conference on Artificial Intelligence. Ijcai'03: 1520–1521. https://dl.acm.org/doi/abs/10.5555/1630659.1630919 ↩
Littman, Michael; Sutton, Richard S (2001). "Predictive Representations of State". Advances in Neural Information Processing Systems. 14. MIT Press. https://papers.nips.cc/paper_files/paper/2001/hash/1e4d36177d71bbb3558e43af9577d70e-Abstract.html ↩
Littman, Michael; Sutton, Richard S (2001). "Predictive Representations of State". Advances in Neural Information Processing Systems. 14. MIT Press. https://papers.nips.cc/paper_files/paper/2001/hash/1e4d36177d71bbb3558e43af9577d70e-Abstract.html ↩