next up previous
Next: Adaptive Levin Search (ALS) Up: Implementation 1: Plugging LS Previous: Implementation 1: Plugging LS

Levin Search (LS)

Unbeknownst to many machine learning researchers, there exists a search algorithm with amazing theoretical properties: for a broad class of search problems, Levin search (LS) [#!Levin:73!#,#!Levin:84!#] has the optimal order of computational complexity. See [#!LiVitanyi:93!#] for an overview. See (Schmidhuber 1995, 1997a) for recent implementations/applications.

Basic concepts. LS requires a set of $n_{ops}$ primitive, prewired instructions $b_1, ...,b_{n_{ops}}$ that can be composed to form arbitrary sequential programs. Essentially, LS generates and tests solution candidates $s$ (program outputs represented as strings over a finite alphabet) in order of their Levin complexities $Kt(s) =
\min_q\{-log D_P(q) + log~t(q,s)\}$, where $q$ stands for a program that computes $s$ in $t(q,s)$ time steps, and $D_P(q)$ is the probability of guessing $q$ according to a fixed Solomonoff-Levin distribution [#!LiVitanyi:93!#] on the set of possible programs (in section 3.2, however, we will make the distribution variable).

Optimality. Given primitives representing a universal programming language, for a broad class of problems LS can be shown to be optimal with respect to total expected search time, leaving aside a constant factor independent of the problem size [#!Levin:73!#,#!Levin:84!#,#!LiVitanyi:93!#]. More formally: a problem is a symbol string that conveys all information about another symbol string called its solution, where the solution can be extracted by some (search) algorithm, given the problem. Suppose there is an algorithm that solves certain time-limited optimization problems or inversion problems in $O(f(n))$ steps, where $f$ is a total recursive function and $n$ is a positive integer representing problem size. Then universal LS will solve the same problems in at most $O(f(n))$ steps (although a large constant may be buried in the $O()$ notation). Despite this strong result, until recently LS has not received much attention except in purely theoretical studies -- see, e.g., [#!Watanabe:92!#].

Of course, LS and any other algorithm will fail to quickly solve problems whose solutions all have high algorithmic complexity. Unfortunately, almost all possible problems are of this kind [#!Kolmogorov:65!#,#!Chaitin:69!#,#!Solomonoff:64!#]. In fact, the realm of practical computer science is limited to solving the comparatively few tasks with low-complexity solutions. Fortunately such tasks are rather common in our regular universe.

Practical implementation. In our practical LS version there is an upper bound $m$ on program length (due to obvious storage limitations). $a_i$ denotes the address of the $i$-th instruction. Each program is generated incrementally: first we select an instruction for $a_1$, then for $a_2$, etc. $D_P$ is given by a matrix $P$, where $P_{ij}$ ( $i \in {1, ..., m}$, $j \in {1, ..., n_{ops}}$) denotes the probability of selecting $b_j$ as the instruction at address $a_i$, given that the first $i-1$ instructions have already been selected. The probability of a program is the product of the probabilities of its constituents.

LS' arguments are $P$ and the representation of a problem denoted by $N$. LS' output is a program that computes a solution to the problem if it found any. In this section, all $P_{ij} = \frac{1}{n_{ops}}$ will remain fixed. LS is implemented as a sequence of longer and longer phases:

Levin search(problem $N$, probability matrix $P$)

(1) Set $Phase$, the number of the current phase, equal to 1. In what follows, let $\phi(Phase)$ denote the set of not yet executed programs $q$ satisfying $D_P(q)$ $\ge \frac{1}{Phase}$.

(2) Repeat

(2.1) While $\phi(Phase) \neq \{\}$ and no solution found do: Generate a program $q \in \phi(Phase)$, and run $q$ until it either halts or until it used up $\frac{D_P(q)*Phase}{c}$ steps. If $q$ computed a solution for $N$, return $q$ and exit.

(2.2) Set $Phase$ := 2$Phase$

until solution found or $Phase$ $\ge$ $Phase_{MAX}$.
Return empty program $\{\}$.

Here $c$ and $Phase_{MAX}$ are prespecified constants. The procedure above is essentially the same (has the same order of complexity) as the one described in the second paragraph of this section -- see, e.g., [#!Solomonoff:86!#,#!LiVitanyi:93!#].

next up previous
Next: Adaptive Levin Search (ALS) Up: Implementation 1: Plugging LS Previous: Implementation 1: Plugging LS
Juergen Schmidhuber 2003-02-25

Back to Optimal Universal Search page
Back to Reinforcement Learning page
Back to Program Evolution page