Gödel Machine vs OOPS and OOPS-RL

The Optimal Ordered Problem Solver
OOPS [38,40]
(used by BIOPS in Section 2.3) is a
bias-optimal (see Def. 2.1)
way of searching for a
program that solves each problem in an ordered sequence
of problems of a reasonably general type,
continually organizing and managing and reusing earlier acquired knowledge.
Solomonoff recently also proposed related
ideas for a *scientist's assistant*
[50] that modifies the probability
distribution of universal search [22] based
on experience.

As pointed out earlier
[38]
(section on OOPS limitations),
however, OOPS-like methods are not directly applicable
to general lifelong reinforcement learning (RL) tasks
[18] such as those for
which AIXI [15] was designed.
The simple and natural but limited optimality notion
of OOPS is *bias-optimality* (Def. 2.1):
OOPS is a near-bias-optimal searcher for programs which
compute solutions that one can quickly verify
(costs of verification are taken into account).
For example, one can quickly test whether some
currently tested program has computed a solution to
the *towers of Hanoi* problem used in the earlier
paper [38]:
one just has to check whether
the third peg is full of disks.

But general RL tasks are harder. Here in principle the evaluation of the value of some behavior consumes the learner's entire life! That is, the naive test of whether a program is good or not would consume the entire life. That is, we could test only one program; afterwards life would be over.

So general RL machines need a more general notion of
optimality, and must do things that plain OOPS does
not do, such as predicting *future* tasks and rewards.
It is possible to use two OOPS -modules
as components of a rather general reinforcement learner (OOPS-RL),
one module learning a predictive model of
the environment, the other one using
this *world model* to search for an action sequence
maximizing expected reward
[38,42].
Despite the bias-optimality properties of OOPS for
certain ordered task sequences, however, OOPS-RL
is not necessarily the best way of spending limited computation
time in general RL situations.

A provably optimal RL machine
must somehow *prove* properties of otherwise un-testable behaviors
(such as: what is the expected reward of this behavior which
one cannot naively test as there is not enough time).
That is part of what the Gödel machine does: it tries
to greatly cut testing time, replacing naive time-consuming
tests by much faster proofs of predictable test outcomes
whenever this is possible.

Proof verification
itself can be performed very quickly. In particular,
verifying the correctness of
a found proof typically does not consume the remaining life.
Hence the Gödel machine may use OOPS as a bias-optimal proof-searching
submodule. Since the proofs themselves may concern quite
different, *arbitrary* notions of optimality (not just
bias-optimality), the Gödel machine is in principle more
general than plain OOPS.
But it is not just an extension of OOPS. Instead of OOPS it may
as well use non-bias-optimal alternative methods to initialize
its proof searcher.
On the other hand, OOPS is not just a precursor of the Gödel machine.
It is a stand-alone, incremental, bias-optimal way of allocating
runtime to programs that reuse previously successful
programs, and is applicable to many traditional
problems, including but not limited to proof search.

Back to Goedel machine home page