next up previous
Next: Gödel Machine vs OOPS-RL Up: More Relations to Previous Previous: More Relations to Previous


Gödel Machine vs Success-Story Algorithm and Other Metalearners

A learner's modifiable components are called its policy. An algorithm that modifies the policy is a learning algorithm. If the learning algorithm has modifiable components represented as part of the policy, then we speak of a self-modifying policy (SMP) [44]. SMPs can modify the way they modify themselves etc. The Gödel machine has an SMP.

In previous work we used the success-story algorithm (SSA) to force some (stochastic) SMP to trigger better and better self-modifications [32,45,44,46]. During the learner's life-time, SSA is occasionally called at times computed according to SMP itself. SSA uses backtracking to undo those SMP-generated SMP-modifications that have not been empirically observed to trigger lifelong reward accelerations (measured up until the current SSA call--this evaluates the long-term effects of SMP-modifications setting the stage for later SMP-modifications). SMP-modifications that survive SSA represent a lifelong success history. Until the next SSA call, they build the basis for additional SMP-modifications. Solely by self-modifications our SMP/SSA-based learners solved a complex task in a partially observable environment whose state space is far bigger than most found in the literature [44].

The Gödel machine's training algorithm is theoretically more powerful than SSA though. SSA empirically measures the usefulness of previous self-modifications, and does not necessarily encourage provably optimal ones. Similar drawbacks hold for Lenat's human-assisted, non-autonomous, self-modifying learner [21], our Meta-Genetic Programming [29] extending Cramer's Genetic Programming [8,1], our metalearning economies [29] extending Holland's machine learning economies [14], and gradient-based metalearners for continuous program spaces of differentiable recurrent neural networks [31,12]. All these methods, however, could be used to seed $p(1)$ with an initial policy.


next up previous
Next: Gödel Machine vs OOPS-RL Up: More Relations to Previous Previous: More Relations to Previous
Juergen Schmidhuber 2003-10-28

Back to Goedel Machine Home Page