where denotes the length of . The invariance theorem (Solomonoff, 1964; Kolmogorov, 1965; Chaitin, 1969) states that for two universal machines and and for all . Therefore we may drop the index and write instead of .

**Machine learning and the prior problem.**
In machine learning applications, we are often concerned
with the following problem:
given training data , we would like to select
the most probable hypothesis generating the data.
Bayes formula yields

(1) |

**Universal prior**
(Levin, 1974; Solomonoff, 1964; Gacs, 1974; Chaitin, 1975).
Define , the *a priori probability* of a bitstring ,
as the probability of guessing a halting program
that computes on universal TM .
Here, the way of guessing is defined
by the following procedure:
whenever the scanning head of 's input tape (initially blank)
shifts right to a field that has not been scanned before, do:
With probability fill it with a 0;
with probability fill it with a 1.
We obtain

Under different universal priors (based on different universal machines), probabilities of a given string differ by not more than a constant factor independent of the string size, due to the invariance theorem. Therefore we may drop the index and write instead of . This justifies the name ``

**Dominance of shortest programs.**
It can be shown
(Levin, 1974; Chaitin, 1975)
that

The probability of a string is dominated by the probabilities of its shortest programs. This justifies ``Occam's razor'': in equation (1), the

**Dominance of universal prior.**
Suppose there are infinitely many enumerable solution candidates (strings):
amazingly, dominates all discrete enumerable semimeasures P
(including probability distributions, see e.g. Li and Vitanyi (1993)
for details) in the following sense:
for each there is a constant such that
for
all strings .

Since Kolmogorov complexity is incomputable in general, the universal prior is so, too. A popular myth states that this fact renders useless the concepts of Kolmogorov complexity, as far as practical machine learning is concerned. But this is not so, as will be seen next. There we focus on a natural, computable, yet very general extension of Kolmogorov complexity.

**Levin complexity.**
In what follows, a (not necessarily halting) program is
a string on 's input tape which can be
scanned completely by .
Let scan a program
before it finishes printing onto the work tape.
Let be the number of steps taken before
is printed. Then

An invariance theorem similar to the one for holds for as well.

**Levin's universal optimal search algorithm**
(Levin, 1973; 1984).
Suppose we are looking for the solution (a string) to a given problem.
Levin's universal search algorithm generates
and evaluates all
strings (solution candidates) in order of their complexity,
until a solution is found.
This is essentially equivalent to
enumerating
all programs in order of decreasing probabilities, divided
by their runtimes.
Each program computes a string that
is tested to see whether it is a solution to the given problem.
If so, the search is stopped.
Amazingly, for a broad class of problems,
including inversion problems and
time-limited optimization problems,
**universal search
can be shown to be optimal with respect to total expected
search time**,
leaving aside a constant factor independent of the problem size:
if string can be computed within time steps
by a program , and the probability of guessing (as above) is ,
then within
time steps,
systematic enumeration according to Levin will
generate , run it for time steps,
and output .
In the experiments below, a probabilistic algorithm strongly inspired
by universal search will be used.

Back to Optimal Universal Search page

Back to Program Evolution page

Back to Algorithmic Information page

Back to Speed Prior page