How to predict the future from the past? To get a grip on this fundamental question, let us first introduce some notation. denotes the set of finite sequences over the binary alphabet , the set of infinite sequences over , the empty string, . stand for strings in . If then is the concatenation of and (e.g., if and then ). For , denotes the number of bits in , where for ; . is the prefix of consisting of the first bits, if , and otherwise ( ). denotes the logarithm with basis 2, functions mapping integers to integers. We write if there exist positive constants such that for all . For simplicity let us consider universal Turing Machines (TMs) with input alphabet and trinary output alphabet including the symbols ``0'', ``1'', and `` '' (blank). For efficiency reasons, the TMs should have several work tapes to avoid potential quadratic slowdowns associated with 1-tape TMs. The remainder of this paper refers assumes a fixed universal reference TM.
Now suppose bitstring represents the data observed so far.
What is its most likely continuation
? Bayes' theorem yields
The next section will offer an alternative to the celebrated but noncomputable algorithmic simplicity measure or Solomonoff-Levin measure [24,29,25]. But let us first review Solomonoff's traditional approach.
Roughly fourty years ago Solomonoff started the theory of
universal optimal induction based on the apparently harmless
simplicity assumption that is computable [24].
While Equation (1) makes predictions of
the entire future, given the past,
Solomonoff [25] focuses just on the next
bit in a sequence. Although this provokes surprisingly nontrivial
problems associated with translating the bitwise approach to alphabets
other than the binary one -- only recently Hutter managed to do this
[8] -- it is sufficient for obtaining essential
insights. Given an observed bitstring , Solomonoff assumes the data are drawn
according to a recursive measure ; that is, there is a program for
a universal Turing machine that reads and
computes and halts. He estimates the probability of the next bit
(assuming there will be one), using the remarkable, well-studied,
enumerable prior
[24,29,25,6,15]
(3) |
However, while is enumerable, it is not recursive, and thus practically infeasible. This drawback inspired less general yet practically more feasible principles of minimum description length (MDL) [27,17] as well as priors derived from time-bounded restrictions [15] of Kolmogorov complexity [12,24,4]. No particular instance of these approaches, however, is universally accepted or has a general convincing motivation that carries beyond rather specialized application scenarios. For instance, typical efficient MDL approaches require the specification of a class of computable models of the data, say, certain types of neural networks, plus some computable loss function expressing the coding costs of the data relative to the model. This provokes numerous ad-hoc choices.
The novel approach pursued here agrees that Solomonoff's assumption of recursive priors without any time and space limitations is too weak, and that we somehow should specify additional resource constraints on the data-generating process to obtain a convincing basis for feasible inductive inference. But which constraints are plausible? Which reflect the ``natural'' concept of simplicity? Previous resource-oriented priors derived from, say, polynomial time bounds [15] have no obvious and plausible a priori justification.
Therefore we will suggest a novel, natural prior reflecting data-independent, optimally efficient use of computational resources. Based on this prior, Section 3 will derive a near-optimal computable strategy for making predictions, given past observations.