How to predict the future from the past? To get a grip on this
fundamental question,
let us first introduce some notation. denotes the set of finite
sequences over the binary alphabet
,
the set of
infinite sequences over
,
the empty string,
.
stand for strings in
.
If
then
is the concatenation of
and
(e.g.,
if
and
then
). For
,
denotes the number of bits in
, where
for
;
.
is the prefix of
consisting of the first
bits, if
, and
otherwise
(
).
denotes the logarithm with basis 2,
functions mapping integers to integers. We write
if
there exist positive constants
such that
for
all
. For simplicity let us consider universal Turing Machines
(TMs) with input alphabet
and trinary output alphabet including the
symbols ``0'', ``1'', and `` '' (blank). For efficiency reasons,
the TMs should have several work tapes
to avoid potential quadratic slowdowns associated with 1-tape TMs.
The remainder of this paper refers assumes a fixed universal reference TM.
Now suppose bitstring represents the data observed so far.
What is its most likely continuation
? Bayes' theorem yields
The next section will offer an alternative to the celebrated but noncomputable algorithmic simplicity measure or Solomonoff-Levin measure [24,29,25]. But let us first review Solomonoff's traditional approach.
Roughly fourty years ago Solomonoff started the theory of
universal optimal induction based on the apparently harmless
simplicity assumption that is computable [24].
While Equation (1) makes predictions of
the entire future, given the past,
Solomonoff [25] focuses just on the next
bit in a sequence. Although this provokes surprisingly nontrivial
problems associated with translating the bitwise approach to alphabets
other than the binary one -- only recently Hutter managed to do this
[8] -- it is sufficient for obtaining essential
insights. Given an observed bitstring
, Solomonoff assumes the data are drawn
according to a recursive measure
; that is, there is a program for
a universal Turing machine that reads
and
computes
and halts. He estimates the probability of the next bit
(assuming there will be one), using the remarkable, well-studied,
enumerable prior
[24,29,25,6,15]
![]() |
(3) |
However, while is enumerable, it is not recursive, and thus
practically infeasible. This drawback inspired less general yet
practically more feasible principles of minimum description length
(MDL) [27,17] as well as priors derived from
time-bounded restrictions [15]
of Kolmogorov complexity
[12,24,4].
No particular instance of these approaches, however, is universally
accepted or has a general convincing motivation that carries beyond rather
specialized application scenarios. For instance, typical efficient MDL
approaches require the specification of a class of computable models of
the data, say, certain types of neural networks, plus some computable
loss function expressing the coding costs of the data relative to the
model. This provokes numerous ad-hoc choices.
The novel approach pursued here agrees that Solomonoff's assumption of recursive priors without any time and space limitations is too weak, and that we somehow should specify additional resource constraints on the data-generating process to obtain a convincing basis for feasible inductive inference. But which constraints are plausible? Which reflect the ``natural'' concept of simplicity? Previous resource-oriented priors derived from, say, polynomial time bounds [15] have no obvious and plausible a priori justification.
Therefore we will suggest a novel, natural prior reflecting data-independent, optimally efficient use of computational resources. Based on this prior, Section 3 will derive a near-optimal computable strategy for making predictions, given past observations.