The Speed Prior: A New Simplicity Measure
Yielding Near-Optimal Computable Predictions.
In J. Kivinen and R. H. Sloan, editors, Proceedings of the 15th
Annual Conference on Computational Learning Theory (COLT 2002), Sydney, Australia,
Lecture Notes in Artificial Intelligence, pages 216--228. Springer, 2002.
Section 6 of
Algorithmic Theories of Everything
(first published in the
(2000); the contents of sections 2-5 have also appeared in the International
Journal on Foundations of Computer Science -
optimal but non-computable method
for inductive inference assumes that observation sequences x are
drawn from a recursive prior distribution mu(x).
Instead of using the unknown mu(x) we predict using
the celebrated universal enumerable prior or Solomonoff-Levin
which for all x exceeds any recursive mu(x), save for a constant
factor independent of x. The simplicity measure M(x)
naturally implements ``Occam's razor'' (simple solutions
are preferred over complex ones) and
is closely related to K(x), the Kolmogorov complexity or algorithmic
information of x.
Predictions based on M are
in a certain (noncomputable) sense.
However, M assigns high probability to
certain data x that are extremely hard to compute. This does not match
our intuitive notion of simplicity.
Schmidhuber suggested a more plausible
measure derived from the fastest way of computing data. In absence
of contrarian evidence, he assumes that the physical world is generated
by a computational process, and that any possibly infinite sequence of
observations is therefore computable in the limit (this assumption is
more radical and stronger than Solomonoff's). Then he replaces M by the
Speed Prior S,
under which the cumulative a priori probability
of all data whose computation through an optimal algorithm requires more
than O(n) resources is 1/n. To evaluate the plausibility of this,
consider that most data generated on your own computer are computable
within a few microseconds, some take a few seconds, few take hours,
very few take days, etc...
Essentially, the Speed Prior S(x) is
the probability that the output of the following
probabilistic algorithm starts with x:
Let instruction pointer IP point to some cell of
the initially empty internal storage of a universal
binary computer (with separate, initially empty
While the number of instructions executed so far exceeds t:
toss a coin; if heads is up set t:=2t; otherwise exit.
If IP points to a cell that already contains a bit,
execute the corresponding instruction.
Else if IP points to another cell,
toss the coin again,
set the cell's bit to 1 if heads is up (0 otherwise),
and set t:=t/2.
Go to 2.
The Speed Prior allows for
deriving a computable strategy for optimal prediction (within
some given epsilon) of future
y, given past x.
Now consider the case that the data actually stem
from a non-optimal, unknown computational process.
We can use
Marcus Hutter's recent results on
universal prediction machines
to derive excellent expected loss bounds for S-based
SPEED PRIOR APPLICATION TO PHYSICS
Here is an extreme application.
In absence of contrarian evidence,
the entire history of
our universe is computable,
and sampled from S (or a less
dominant prior reflecting
suboptimal computation of the history).
was the first who
the universe is being computed on a grid of computers or cellular
compare related work by
who initiated the translation of Zuse's 1969 book.)
Under our assumptions, we can immediately predict
that our universe won't get many times older than it is now,
that large scale quantum computation will not
work well (essentially because it would require too many computational resources
in interfering ``parallel universes''), and that any apparent randomness
in any physical observation is in fact due to some yet unknown but
fast pseudo-random generator which we
should try to discover. (Details in the 2002 COLT paper or in the 2000 physics paper above).
Such ideas have recently attracted a lot of attention --- check out Wei Dai's
"everything" mailing list archive and the
Great Programmer Religion.