Unfortunately, while and the more general priors of Section 4 are computable in the limit, they are not recursive, and thus practically infeasible. This drawback inspired less general yet practically more feasible principles of minimum description length (MDL) [71,41] as well as priors derived from time-bounded restrictions [31] of Kolmogorov complexity [28,62,9]. No particular instance of these approaches, however, is universally accepted or has a general convincing motivation that carries beyond rather specialized application scenarios. For instance, typical efficient MDL approaches require the specification of a class of computable models of the data, say, certain types of neural networks, plus some computable loss function expressing the coding costs of the data relative to the model. This provokes numerous ad-hoc choices.
Our recent work [54], however, offers an alternative to the celebrated but noncomputable algorithmic simplicity measure or Solomonoff-Levin measure discussed above [62,77,63]. We introduced a new measure (a prior on the computable objects) which is not based on the shortest but on the fastest way of describing objects.
Let us assume that the observed data sequence is generated by a computational process, and that any possible sequence of observations is therefore computable in the limit [50]. This assumption is stronger and more radical than the traditional one: Solomonoff just insists that the probability of any sequence prefix is recursively computable, but the (infinite) sequence itself may still be generated probabilistically.
Given our starting assumption that data are deterministically generated by a machine, it seems plausible that the machine suffers from a computational resource problem. Since some things are much harder to compute than others, the resource-oriented point of view suggests the following postulate.
Initialize: Set Let the input scanning head of a universal TM point to the first cell of its initially empty input tape.
Forever repeat: While the number of instructions executed so far exceeds : toss an unbiased coin; if heads is up set ; otherwise exit. If the input scanning head points to a cell that already contains a bit, execute the corresponding instruction (of the growing self-delimiting program, e.g., [30,31]). Else toss the coin again, set the cell's bit to 1 if heads is up (0 otherwise), and set
Algorithm GUESS is very similar to a probabilistic search algorithm used in previous work on applied inductive inference [47,49]. On several toy problems it generalized extremely well in a way unmatchable by traditional neural network learning algorithms.
With comes a computable method AS for predicting optimally within accuracy [54]. Consider a finite but unknown program computing . What if Postulate 1 holds but is not optimally efficient, and/or computed on a computer that differs from our reference machine? Then we effectively do not sample beginnings from but from an alternative semimeasure . Can we still predict well? Yes, because the Speed Prior dominates . This dominance is all we need to apply the recent loss bounds [21]. The loss that we are expected to receive by predicting according to AS instead of using the true but unknown does not exceed the optimal loss by much [54].