The first number is 2, the second is 4, the third is 6, the fourth
is 8. What is the fifth? The correct answer is ``250,'' because
the *n*th number is
*n*^{5} - 5*n*^{4} -15*n*^{3} + 125*n*^{2} -224*n* + 120.
In certain IQ tests, however,
the answer ``250'' will not yield maximal score, because it does
not seem to be the ``simplest'' answer consistent with the data
(compare [#!Schmidhuber:97nn!#]).
And physicists and others favor ``simple'' explanations of observations.

Roughly fourty years ago Solomonoff set out to provide a theoretical
justification of this quest for simplicity [#!Solomonoff:64!#].
He and others have made substantial progress over the past decades.
In particular, technical problems of Solomonoff's original approach were
partly overcome by Levin [#!Levin:74!#]
who introduced self-delimiting
programs, *m* and
mentioned above, as well as several
theorems relating probabilities to complexities -- see also
Chaitin's and Gács'
independent papers on prefix complexity and *m*
[#!Gacs:74!#,#!Chaitin:75!#].
Solomonoff's work on inductive inference
helped to inspire less general yet practically more
feasible principles of
minimum description length [#!Wallace:68!#,#!Rissanen:86!#,#!Hochreiter:97nc1!#]
as well as time-bounded restrictions of
Kolmogorov complexity, e.g., [#!Hartmanis:83!#,#!Allender:92!#,#!Watanabe:92!#,#!LiVitanyi:97!#],
as well as the concept of ``logical depth'' of *x*, the runtime of the shortest
program of *x* [#!Bennett:88!#].

Equation (15) makes predictions of the entire future, given the past. This seems to be the most general approach. Solomonoff [#!Solomonoff:78!#] focuses just on the next bit in a sequence. Although this provokes surprisingly nontrivial problems associated with translating the bitwise approach to alphabets other than the binary one -- only recently Hutter managed to do this [#!Hutter:00e!#] -- it is sufficient for obtaining essential insights [#!Solomonoff:78!#].

Given an observed bitstring *x*,
Solomonoff assumes the data are drawn according to
a recursive measure ;
that is, there is a MTM program that
reads
and computes
and halts. He
estimates the probability
of the next bit (assuming there will be one), using the fact
that the enumerable
dominates the less general recursive measures:

(49) |

where is a constant depending on but not on

A previous paper on computable universes
[#!Schmidhuber:97brauer!#, Section: *Are we Run by a Short Algorithm?*]
applied the theory of inductive
inference to entire universe histories, and predicted that
simple universes are more likely; that is, observers are
likely to find themselves in a simple universe compatible
with their existence
(compare *everything mailing list archive* [#!Dai:98!#],
messages dated
21 Oct and 25 Oct 1999:
*http://www.escribe.com/science/theory/m1284.html* and *m1312.html*).
There are two ways in which one could criticize this approach.
One suggests it is too general, the other suggests it is too restrictive.

**1. Recursive priors too general?**
is not recursively computable, hence there is no general
practically feasible algorithm to generate optimal predictions.
This suggests to look at more
restrictive priors, in particular, *S*,
which will receive additional motivation further below.

**2. Recursive priors too restricted?** If we want to explain
the entire universe, then the assumption of a recursive
*P* on the possible universes may even be insufficient. In particular,
although our own universe seems to obey simple rules
-- a discretized version of
Schrödinger's wave function could
be implemented by a simple recursive algorithm -- the apparently noisy
fluctuations that we observe on top of the simple laws might be due to
a pseudorandom generator (PRG) subroutine whose output is describable,
even enumerable, but not recursive -- compare Example 2.1.

In particular, the fact that nonrecursive priors may not allow for
recursive bounds on the time necessary to compute initial histories of
some universe does not necessarily prohibit nonrecursive priors. Each
describable initial history may be potentially relevant as there
is an infinite computation during which it will be stable for
all but finitely many steps.
This suggests to look at more general priors such as
,
which will be done next, before we come back to the
speed prior *S*.

Related links: In the beginning was the code! - Zuse's thesis - Life, the universe, and everything - Generalized Algorithmic Information - Speed Prior - The New AI