Gödel Machine vs OOPS and OOPS-RL

Next: Gödel Machine vs AIXI Up: More Relations to Previous Previous: Gödel Machine vs Success-Story

Gödel Machine vs OOPS and OOPS-RL

The Optimal Ordered Problem Solver OOPS [38,40] (used by BIOPS in Section 2.3) is a bias-optimal (see Def. 2.1) way of searching for a program that solves each problem in an ordered sequence of problems of a reasonably general type, continually organizing and managing and reusing earlier acquired knowledge. Solomonoff recently also proposed related ideas for a scientist's assistant [50] that modifies the probability distribution of universal search [22] based on experience.

As pointed out earlier [38] (section on OOPS limitations), however, OOPS-like methods are not directly applicable to general lifelong reinforcement learning (RL) tasks [18] such as those for which AIXI [15] was designed. The simple and natural but limited optimality notion of OOPS is bias-optimality (Def. 2.1): OOPS is a near-bias-optimal searcher for programs which compute solutions that one can quickly verify (costs of verification are taken into account). For example, one can quickly test whether some currently tested program has computed a solution to the towers of Hanoi problem used in the earlier paper [38]: one just has to check whether the third peg is full of disks.

But general RL tasks are harder. Here in principle the evaluation of the value of some behavior consumes the learner's entire life! That is, the naive test of whether a program is good or not would consume the entire life. That is, we could test only one program; afterwards life would be over.

So general RL machines need a more general notion of optimality, and must do things that plain OOPS does not do, such as predicting future tasks and rewards. It is possible to use two OOPS -modules as components of a rather general reinforcement learner (OOPS-RL), one module learning a predictive model of the environment, the other one using this world model to search for an action sequence maximizing expected reward [38,42]. Despite the bias-optimality properties of OOPS for certain ordered task sequences, however, OOPS-RL is not necessarily the best way of spending limited computation time in general RL situations.

A provably optimal RL machine must somehow prove properties of otherwise un-testable behaviors (such as: what is the expected reward of this behavior which one cannot naively test as there is not enough time). That is part of what the Gödel machine does: it tries to greatly cut testing time, replacing naive time-consuming tests by much faster proofs of predictable test outcomes whenever this is possible.

Proof verification itself can be performed very quickly. In particular, verifying the correctness of a found proof typically does not consume the remaining life. Hence the Gödel machine may use OOPS as a bias-optimal proof-searching submodule. Since the proofs themselves may concern quite different, arbitrary notions of optimality (not just bias-optimality), the Gödel machine is in principle more general than plain OOPS. But it is not just an extension of OOPS. Instead of OOPS it may as well use non-bias-optimal alternative methods to initialize its proof searcher. On the other hand, OOPS is not just a precursor of the Gödel machine. It is a stand-alone, incremental, bias-optimal way of allocating runtime to programs that reuse previously successful programs, and is applicable to many traditional problems, including but not limited to proof search.

Next: Gödel Machine vs AIXI Up: More Relations to Previous Previous: Gödel Machine vs Success-Story

Juergen Schmidhuber 2003-12-11

Back to Goedel machine home page