Overview of main results

In a certain sense to be specified below,
the Optimal Ordered Problem Solver (OOPS)
solves one task after another, always optimally exploiting solutions
to earlier tasks when possible. It can be used for increasingly hard
problems of optimization or prediction. The initial bias is given in
form of a probability distribution on programs for a universal
computer. An asymptotically fastest way of solving a *single* given task is *non*incremental Universal Search
(Levin, 1973, 1984--Levin also attributes similar ideas to Adleman):
In the -th phase
test all programs
with runtime until the task is solved. Now suppose
there is an ordered *sequence* of tasks, e.g., the -th task is to
find a path through a maze that is shorter than the best found so far.
To reduce the search time for new tasks, our previous *incremental*
extensions of Universal Search [65]
tried to modify through experience
with earlier tasks--but in a heuristic and non-general and suboptimal
way prone to overfitting.

OOPS, however, does it right. It is searching in the space of self-delimiting programs [31,7] that are immediately executed while being generated: a program grows by one instruction whenever the execution of the program so far requests this. To solve the -th task we sacrifice half the total search time for testing (through a variant of Universal Search) programs that have the most recent successful program as a prefix. The other half remains for testing fresh programs with arbitrary beginnings. When we are searching for a universal solver for all tasks in the sequence we have to time-share the second half (but not the first!) among all tasks .

Storage for the first found program computing a solution to the current task becomes non-writeable. Programs tested during search for solutions to later tasks may copy non-writeable code into separate modifiable storage, to edit it and execute the modified result. Prefixes may also recompute the probability distribution on their suffixes in arbitrary computable ways, thus rewriting the search procedure on the suffixes. This allows for metalearning or metasearching, that is, searching for faster search procedures.

For realistic limited computers we need efficient backtracking in program space to reset storage contents modified by tested programs. We introduce a recursive procedure for doing this in near-optimal fashion.

The method is essentially only 8 times slower than the theoretically optimal one (which never assigns to any program a testing time exceeding the program's probability times the total search time).

OOPS can solve tasks unsolvable by traditional reinforcement
learners and AI planners, such as *Towers of Hanoi* with 30 disks
(minimal solution size ). In our experiments OOPS
demonstrates incremental learning by reusing previous solutions to
discover a prefix that temporarily rewrites the distribution on its
suffixes, such that Universal Search is accelerated by a factor of
1000. This illustrates the benefits of self-improvement
and metasearching.

We mention several OOPS variants as well as the self-improving Gödel machine [59] which is applicable to general reinforcement learning settings and may use OOPS as a proof-searching subroutine. Since OOPS will scale to larger problems in a way that is near-optimal in a certain sense, we also examine its physical limitations.

An informed reader familiar with concepts such as universal computers [72], nonincremental Universal Search (Levin, 1973, 1984), self-delimiting programs [31,7], and backtracking, will probably understand the simple basic principles of OOPS just by carefully reading the present overview. In what follows we outline the remainder of this paper.

Back to OOPS main page