next up previous
Next: Outline Up: Introduction Previous: Introduction

Overview of main results

In a certain sense to be specified below, the Optimal Ordered Problem Solver (OOPS) solves one task after another, always optimally exploiting solutions to earlier tasks when possible. It can be used for increasingly hard problems of optimization or prediction. The initial bias is given in form of a probability distribution $P$ on programs for a universal computer. An asymptotically fastest way of solving a single given task is nonincremental Universal Search (Levin, 1973, 1984--Levin also attributes similar ideas to Adleman): In the $i$-th phase $(i=1,2,3, \ldots)$ test all programs $p$ with runtime $\leq 2^iP(p)$ until the task is solved. Now suppose there is an ordered sequence of tasks, e.g., the $n$-th task is to find a path through a maze that is shorter than the best found so far. To reduce the search time for new tasks, our previous incremental extensions of Universal Search [65] tried to modify $P$ through experience with earlier tasks--but in a heuristic and non-general and suboptimal way prone to overfitting.

OOPS, however, does it right. It is searching in the space of self-delimiting programs [31,7] that are immediately executed while being generated: a program grows by one instruction whenever the execution of the program so far requests this. To solve the $n$-th task we sacrifice half the total search time for testing (through a variant of Universal Search) programs that have the most recent successful program as a prefix. The other half remains for testing fresh programs with arbitrary beginnings. When we are searching for a universal solver for all tasks in the sequence we have to time-share the second half (but not the first!) among all tasks $1..n$.

Storage for the first found program computing a solution to the current task becomes non-writeable. Programs tested during search for solutions to later tasks may copy non-writeable code into separate modifiable storage, to edit it and execute the modified result. Prefixes may also recompute the probability distribution on their suffixes in arbitrary computable ways, thus rewriting the search procedure on the suffixes. This allows for metalearning or metasearching, that is, searching for faster search procedures.

For realistic limited computers we need efficient backtracking in program space to reset storage contents modified by tested programs. We introduce a recursive procedure for doing this in near-optimal fashion.

The method is essentially only 8 times slower than the theoretically optimal one (which never assigns to any program a testing time exceeding the program's probability times the total search time).

OOPS can solve tasks unsolvable by traditional reinforcement learners and AI planners, such as Towers of Hanoi with 30 disks (minimal solution size $> 10^9$). In our experiments OOPS demonstrates incremental learning by reusing previous solutions to discover a prefix that temporarily rewrites the distribution on its suffixes, such that Universal Search is accelerated by a factor of 1000. This illustrates the benefits of self-improvement and metasearching.

We mention several OOPS variants as well as the self-improving Gödel machine [59] which is applicable to general reinforcement learning settings and may use OOPS as a proof-searching subroutine. Since OOPS will scale to larger problems in a way that is near-optimal in a certain sense, we also examine its physical limitations.

An informed reader familiar with concepts such as universal computers [72], nonincremental Universal Search (Levin, 1973, 1984), self-delimiting programs [31,7], and backtracking, will probably understand the simple basic principles of OOPS just by carefully reading the present overview. In what follows we outline the remainder of this paper.

next up previous
Next: Outline Up: Introduction Previous: Introduction
Juergen Schmidhuber 2004-04-15

Back to OOPS main page