Machine learning and the prior problem.
In machine learning applications, we are often concerned
with the following problem:
given training data , we would like to select
the most probable hypothesis generating the data.
Bayes formula yields
(Levin, 1974; Solomonoff, 1964; Gacs, 1974; Chaitin, 1975).
Define , the a priori probability of a bitstring ,
as the probability of guessing a halting program
that computes on universal TM .
Here, the way of guessing is defined
by the following procedure:
whenever the scanning head of 's input tape (initially blank)
shifts right to a field that has not been scanned before, do:
With probability fill it with a 0;
with probability fill it with a 1.
Dominance of shortest programs.
It can be shown
(Levin, 1974; Chaitin, 1975)
Dominance of universal prior. Suppose there are infinitely many enumerable solution candidates (strings): amazingly, dominates all discrete enumerable semimeasures P (including probability distributions, see e.g. Li and Vitanyi (1993) for details) in the following sense: for each there is a constant such that for all strings .
Since Kolmogorov complexity is incomputable in general, the universal prior is so, too. A popular myth states that this fact renders useless the concepts of Kolmogorov complexity, as far as practical machine learning is concerned. But this is not so, as will be seen next. There we focus on a natural, computable, yet very general extension of Kolmogorov complexity.
In what follows, a (not necessarily halting) program is
a string on 's input tape which can be
scanned completely by .
Let scan a program
before it finishes printing onto the work tape.
Let be the number of steps taken before
is printed. Then
Levin's universal optimal search algorithm (Levin, 1973; 1984). Suppose we are looking for the solution (a string) to a given problem. Levin's universal search algorithm generates and evaluates all strings (solution candidates) in order of their complexity, until a solution is found. This is essentially equivalent to enumerating all programs in order of decreasing probabilities, divided by their runtimes. Each program computes a string that is tested to see whether it is a solution to the given problem. If so, the search is stopped. Amazingly, for a broad class of problems, including inversion problems and time-limited optimization problems, universal search can be shown to be optimal with respect to total expected search time, leaving aside a constant factor independent of the problem size: if string can be computed within time steps by a program , and the probability of guessing (as above) is , then within time steps, systematic enumeration according to Levin will generate , run it for time steps, and output . In the experiments below, a probabilistic algorithm strongly inspired by universal search will be used.