next up previous
Next: Speed Prior-Based Inductive Inference Up: The Speed Prior: A Previous: Introduction

Speed Prior $S$

Let us assume that the observed data sequence is generated by a computational process, and that any possible sequence of observations is therefore computable in the limit [22].

This assumption is stronger and more radical than the traditional one: Solomonoff just insists that the probability of any sequence prefix is recursively computable, but the (infinite) sequence itself may still be generated probabilistically.

Under our starting assumption that data are deterministically generated by a machine, it seems plausible that the machine suffers from a computational resource problem. Since some things are much harder to compute than others, the resource-oriented point of view suggests the following postulate.

Postulate 1   The cumulative prior probability measure of all $x$ incomputable within time $t$ is at most inversely proportional to $t$.

To add some flesh to this postulate, we introduce the asymptotically fastest way of computing all computable data, for our particular universal reference TM:

FAST Algorithm (version 1): For $i = 1, 2, \ldots $ perform PHASE $i$:

PHASE $i$: Execute $2^{i - l(p)}$ instructions of all program prefixes $p$ satisfying $l(p) \leq i$, and sequentially write the outputs on adjacent sections of the output tape, separated by blanks.

Following Levin [14][15, p. 502-505], within $2^{k+1}$ TM steps FAST will generate all prefixes $x_n$ satisfying $Kt(x_n) \leq k$, where $x_n$'s Levin complexity $Kt(x_n)$ is defined as

Kt(x_n) = \min_q\{ l(q) + \log~t(q,x_n)\},
\end{displaymath} (4)

where program prefix $q$ computes $x_n$ in $t(q,x_n)$ time steps. The computational complexity of the algorithm is not essentially affected by the fact that PHASE $i = 2, 3, \ldots $, repeats the computation of PHASE $i-1$ which for large $i$ is approximately half as short (ignoring nonessential speed-ups due to halting programs if there are any).

As noted by Hutter [11], there is an essentially equivalent version of FAST which makes very clear just how simple the asymptotically optimal method really is:

FAST (version 2): Execute one instruction of the $n$-th program every $2^n$ steps on average, using blanks to separate program outputs.
To see how we can obtain universal search [14][15, p. 502-505] from FAST, suppose we are seeking the solution $y$ to some inversion problem $\phi(y) = x$. For instance, $y$ might be a path through a maze providing reward $\phi(y) = 1$, while $\phi(z) = 0$ for uneffective paths $z$. Then we can use FAST to work through all programs and check for each whether it generates a path yielding reward. This will identify each reward-generating solution as quickly as the fastest algorithm that generates and tests that particular solution, save for a constant factor (which may be huge). Compare this to Hutter's recent more complex search algorithm for all well-defined problems [11] which reduces the unknown multiplicative constant factor to a remarkably small known value, namely, 5 -- at the expense of introducing an unknown, problem class-specific, additive constant.

How does the optimal algorithm tie in with Postulate 1? Since the most efficient way of computing all $x$ is embodied by FAST, which computes each $x$ as quickly as $x$'s fastest algorithm, save for a constant factor, and since each PHASE of FAST (version 1) consumes roughly twice the time and space resources of the previous PHASE, the cumulative prior probability of things first computed in any particular PHASE should be roughly half the one of the previous PHASE; zero probability should be assigned to infinitely resource-consuming PHASEs. Postulate 1 therefore suggests Definition 2 below.

\begin{definition}[$p \to x, p \to_i x $]
Given program prefix $p$, write $p \to...
.... Write
$p \to_i x$\ if
$p \to x$\ in PHASE $i$\ of {\bf FAST}.

\begin{definition}[Speed Prior $S$]
Define the Speed Prior $S$\ on $B^*$\ as
...um_{p \to_i x} 2^{-l(p)} ~for~x \succ \lambda.
We observe that $S(x)$ is a semimeasure -- compare [15]:

S(\lambda) = 1;~
S(x0) + S(x1) \leq S(x).

The very fact that a speed prior can be defined in a meaningful way is interesting in itself, and may also be relevant in some practical situations -- see Section 5.

next up previous
Next: Speed Prior-Based Inductive Inference Up: The Speed Prior: A Previous: Introduction
Juergen Schmidhuber 2003-02-25

Back to Speed Prior page