In probability theory, Robbins' problem of optimal stopping, named after Herbert Robbins, is sometimes referred to as the fourth secretary problem or the problem of minimizing the expected rank with full information.
Let X1, ... , Xn be independent, identically distributed random variables, uniform on [0, 1]. We observe the Xk's sequentially and must stop on exactly one of them. No recall of preceding observations is permitted. What stopping rule minimizes the expected rank of the selected observation, and what is its corresponding value?
The general solution to this full-information expected rank problem is unknown. The major difficulty is that the problem is fully history-dependent, that is, the optimal rule depends at every stage on all preceding values, and not only on simpler sufficient statistics of these. Only bounds are known for the limiting value v as n goes to infinity, namely 1.908 < v < 2.329. These bounds are obtained by studying so-called memoryless strategies, that is strategies in which the decision to stop on $X_k$ depends only on the value of $X_k$ and not on the history of observations $X_1, \cdots, X_{k-1}$. It is known that there is some room to improve the lower bound by further computations for a truncated version of the problem within the class of memoryless stategeies. It is still not known how to improve on the upper bound for the limiting value, and this for whatever strategy.
Another attempt proposed to make progress on the problem is a continuous time version of the problem where the observations follow a Poisson arrival process of homogeneous rate 1. Under some mild assumptions, the corresponding value function w ( t ) {\displaystyle w(t)} is bounded and Lipschitz continuous, and the differential equation for this value function is derived. The limiting value of w ( t ) {\displaystyle w(t)} presents the solution of Robbins’ problem. It is shown that for large t {\displaystyle t} , 1 ≤ w ( t ) ≤ 2.33183 {\displaystyle 1\leq w(t)\leq 2.33183} . This estimation coincides with the bounds mentioned above.
The advantage of the continuous time version lies in the fact that the answer can be expressed in terms of the solution of a differential equation, i.e. the answer appears in a closed form. However, since the obtained differential equation contains, apart from the "objective function", another (small) unknown function, the approach does not seem so far to give a decisive advantage for finding the optimal limiting value.
A simple suboptimal rule, which performs almost as well as the optimal rule within the class of memoryless stopping rules, was proposed by Krieger & Samuel-Cahn. The rule stops with the smallest i {\displaystyle i} such that R i < i c / ( n + i ) {\displaystyle R_{i}<ic/(n+i)} for a given constant c, where R i {\displaystyle R_{i}} is the relative rank of the ith observation and n is the total number of items. This rule has added flexibility. A curtailed version thereof can be used to select an item with a given probability P {\displaystyle P} , P < 1 {\displaystyle P<1} . The rule can be used to select two or more items. The problem of selecting a fixed percentage α {\displaystyle \alpha } , 0 < α < 1 {\displaystyle 0<\alpha <1} , of n, is also treated.