Section 2 will survey previous relevant work on general optimal search algorithms and introduce the concept of bias-optimality. Section 3 will use the framework of universal computers to explain OOPS and how OOPS benefits from incrementally extracting useful knowledge hidden in training sequences and in which sense OOPS is optimal. The remainder of the paper is devoted to ``Realistic OOPS'' which uses a recursive procedure for time-optimal planning and backtracking in program space to perform efficient storage management (Section 4) on realistic, limited computers. Section 5 discusses limitations of OOPS as well as extensions for reinforcement learning. Appendix A describes a pilot implementation of Realistic OOPS based on a stack-based universal programming language inspired by FORTH , with initial primitives for defining and calling recursive functions, iterative loops, arithmetic operations, domain-specific behavior, and for rewriting the search procedure itself.
Experiments in Section 6 use the language of Appendix A to solve 60 tasks in a row: we first teach OOPS something about recursion, by training it to construct samples of the simple context free language ( 1's followed by 2's), for up to 30. This takes roughly 0.3 days on a standard personal computer (PC). Thereafter, within a few additional days, OOPS demonstrates the benefits of incremental knowledge transfer: it exploits certain properties of its previously discovered universal -solver to greatly accelerate the search for a universal solver for all disk Towers of Hanoi problems, solving all instances up to (solution size ). Previous, less general reinforcement learners and nonlearning AI planners tend to fail for much smaller instances.