Plausibility of Recursive Priors

Next: Plausibility of Cumulatively Enumerable Up: Consequences for Physics Previous: Consequences for Physics

Plausibility of Recursive Priors

The first number is 2, the second is 4, the third is 6, the fourth is 8. What is the fifth? The correct answer is ``250,'' because the nth number is n⁵ - 5n⁴ -15n³ + 125n² -224n + 120. In certain IQ tests, however, the answer ``250'' will not yield maximal score, because it does not seem to be the ``simplest'' answer consistent with the data (compare [#!Schmidhuber:97nn!#]). And physicists and others favor ``simple'' explanations of observations.

Roughly fourty years ago Solomonoff set out to provide a theoretical justification of this quest for simplicity [#!Solomonoff:64!#]. He and others have made substantial progress over the past decades. In particular, technical problems of Solomonoff's original approach were partly overcome by Levin [#!Levin:74!#] who introduced self-delimiting programs, m and $\mu^M$ mentioned above, as well as several theorems relating probabilities to complexities -- see also Chaitin's and Gács' independent papers on prefix complexity and m [#!Gacs:74!#,#!Chaitin:75!#]. Solomonoff's work on inductive inference helped to inspire less general yet practically more feasible principles of minimum description length [#!Wallace:68!#,#!Rissanen:86!#,#!Hochreiter:97nc1!#] as well as time-bounded restrictions of Kolmogorov complexity, e.g., [#!Hartmanis:83!#,#!Allender:92!#,#!Watanabe:92!#,#!LiVitanyi:97!#], as well as the concept of ``logical depth'' of x, the runtime of the shortest program of x [#!Bennett:88!#].

Equation (15) makes predictions of the entire future, given the past. This seems to be the most general approach. Solomonoff [#!Solomonoff:78!#] focuses just on the next bit in a sequence. Although this provokes surprisingly nontrivial problems associated with translating the bitwise approach to alphabets other than the binary one -- only recently Hutter managed to do this [#!Hutter:00e!#] -- it is sufficient for obtaining essential insights [#!Solomonoff:78!#].

Given an observed bitstring x, Solomonoff assumes the data are drawn according to a recursive measure $\mu$ ; that is, there is a MTM program that reads $x \in B^*$ and computes $\mu(x)$ and halts. He estimates the probability of the next bit (assuming there will be one), using the fact that the enumerable $\mu^M$ dominates the less general recursive measures:

$\begin{displaymath}K\mu^M(x) \leq -log \mu (x) + c_{\mu}, \end{displaymath}$

(49)

where $c_{\mu}$ is a constant depending on $\mu$ but not on x. Compare [#!LiVitanyi:97!#, p. 282 ff]. Solomonoff showed that the $\mu^M$ -probability of a particular continuation converges towards $\mu$ as the observation size goes to infinity [#!Solomonoff:78!#]. Hutter recently extended his results by showing that the number of prediction errors made by universal Solomonoff prediction is essentially bounded by the number of errors made by any other recursive prediction scheme, including the optimal scheme based on the true distribution $\mu$ [#!Hutter:99!#]. Hutter also extended Solomonoff's passive universal induction framework to the case of agents actively interacting with an unknown environment [#!Hutter:00f!#].

A previous paper on computable universes [#!Schmidhuber:97brauer!#, Section: Are we Run by a Short Algorithm?] applied the theory of inductive inference to entire universe histories, and predicted that simple universes are more likely; that is, observers are likely to find themselves in a simple universe compatible with their existence (compare everything mailing list archive [#!Dai:98!#], messages dated 21 Oct and 25 Oct 1999: http://www.escribe.com/science/theory/m1284.html and m1312.html). There are two ways in which one could criticize this approach. One suggests it is too general, the other suggests it is too restrictive.

1. Recursive priors too general? $\mu^M(x)$ is not recursively computable, hence there is no general practically feasible algorithm to generate optimal predictions. This suggests to look at more restrictive priors, in particular, S, which will receive additional motivation further below.

2. Recursive priors too restricted? If we want to explain the entire universe, then the assumption of a recursive P on the possible universes may even be insufficient. In particular, although our own universe seems to obey simple rules -- a discretized version of Schrödinger's wave function could be implemented by a simple recursive algorithm -- the apparently noisy fluctuations that we observe on top of the simple laws might be due to a pseudorandom generator (PRG) subroutine whose output is describable, even enumerable, but not recursive -- compare Example 2.1.

In particular, the fact that nonrecursive priors may not allow for recursive bounds on the time necessary to compute initial histories of some universe does not necessarily prohibit nonrecursive priors. Each describable initial history may be potentially relevant as there is an infinite computation during which it will be stable for all but finitely many steps. This suggests to look at more general priors such as $\mu^E, P^E, P^G$ , which will be done next, before we come back to the speed prior S.

Next: Plausibility of Cumulatively Enumerable Up: Consequences for Physics Previous: Consequences for Physics

Juergen Schmidhuber
2001-01-09