.
Jürgen Schmidhuber's
.
speed prior logo

SPEED PRIOR

A New Simplicity Measure for Near-Optimal Computable Predictions (based on the fastest way of describing objects, not the shortest)

COLT
2002
Talk
slides
J. Schmidhuber. The Speed Prior: A New Simplicity Measure Yielding Near-Optimal Computable Predictions. In J. Kivinen and R. H. Sloan, editors, Proceedings of the 15th Annual Conference on Computational Learning Theory (COLT 2002), Sydney, Australia, Lecture Notes in Artificial Intelligence, pages 216--228. Springer, 2002. PS. PDF. HTML. Or in Section 6 of Algorithmic Theories of Everything (PDF) (first published in the physics arXiv (2000); the contents of sections 2-5 have also appeared in the International Journal on Foundations of Computer Science - see here ).

German home

Ray Solomonoff's optimal but non-computable method for inductive inference assumes that observation sequences x are drawn from a recursive prior distribution mu(x). Instead of using the unknown mu(x) we predict using the celebrated universal enumerable prior or Solomonoff-Levin (semi)measure M(x) which for all x exceeds any recursive mu(x), save for a constant factor independent of x. The simplicity measure M(x) naturally implements ``Occam's razor'' (simple solutions are preferred over complex ones) and is closely related to K(x), the Kolmogorov complexity or algorithmic information of x. Predictions based on M are optimal in a certain (noncomputable) sense. However, M assigns high probability to certain data x that are extremely hard to compute. This does not match our intuitive notion of simplicity.

Schmidhuber suggested a more plausible measure derived from the fastest way of computing data. In absence of contrarian evidence, he assumes that the physical world is generated by a computational process, and that any possibly infinite sequence of observations is therefore computable in the limit (this assumption is more radical and stronger than Solomonoff's). Then he replaces M by the novel Speed Prior S, under which the cumulative a priori probability of all data whose computation through an optimal algorithm requires more than O(n) resources is 1/n. To evaluate the plausibility of this, consider that most data generated on your own computer are computable within a few microseconds, some take a few seconds, few take hours, very few take days, etc...

Essentially, the Speed Prior S(x) is the probability that the output of the following probabilistic algorithm starts with x:

1. Set t:=1. Let instruction pointer IP point to some cell of the initially empty internal storage of a universal binary computer (with separate, initially empty output storage).

2. While the number of instructions executed so far exceeds t: toss a coin; if heads is up set t:=2t; otherwise exit. If IP points to a cell that already contains a bit, execute the corresponding instruction. Else if IP points to another cell, toss the coin again, set the cell's bit to 1 if heads is up (0 otherwise), and set t:=t/2.

3. Go to 2.

The Speed Prior allows for deriving a computable strategy for optimal prediction (within some given epsilon) of future y, given past x.

.
Now consider the case that the data actually stem from a non-optimal, unknown computational process. We can use Marcus Hutter's recent results on universal prediction machines to derive excellent expected loss bounds for S-based inductive inference.

SPEED PRIOR APPLICATION TO PHYSICS

Here is an extreme application. In absence of contrarian evidence, we assume the entire history of our universe is computable, and sampled from S (or a less dominant prior reflecting suboptimal computation of the history). (The legendary Konrad Zuse was the first who seriously suggested the universe is being computed on a grid of computers or cellular automaton; compare related work by Ed Fredkin who initiated the translation of Zuse's 1969 book.)

Under our assumptions, we can immediately predict that our universe won't get many times older than it is now, that large scale quantum computation will not work well (essentially because it would require too many computational resources in interfering ``parallel universes''), and that any apparent randomness in any physical observation is in fact due to some yet unknown but fast pseudo-random generator which we should try to discover. (Details in the 2002 COLT paper or in the 2000 physics paper above).

Such ideas have recently attracted a lot of attention --- check out Wei Dai's "everything" mailing list archive and the Great Programmer Religion.

Randomness and Kolmogorov complexity Universal AI Computing the Universe