Gödel Machine vs AIXI etc.

Next: Practical Issues Up: More Relations to Previous Previous: Gödel Machine vs OOPS-RL

Gödel Machine vs AIXI etc.

Unlike Gödel machines, Hutter's recent AIXI model [15] generally needs unlimited computational resources per input update. It combines Solomonoff's universal prediction scheme [46,47] with an expectimax computation. In discrete cycle $k=1,2,3, \ldots$ , action results in perception and reward , both sampled from the unknown (reactive) environmental probability distribution $\mu$ . AIXI defines a mixture distribution $\xi$ as a weighted sum of distributions $\nu \in \cal M$ , where $\cal M$ is any class of distributions that includes the true environment $\mu$ . For example, $\cal M$ may be a sum of all computable distributions [46,47], where the sum of the weights does not exceed 1. In cycle , AIXI selects as next action the first in an action sequence maximizing $\xi$ -predicted reward up to some given horizon. Recent work [17] demonstrated AIXI's optimal use of observations as follows. The Bayes-optimal policy $p^\xi$ based on the mixture $\xi$ is self-optimizing in the sense that its average utility value converges asymptotically for all $\mu \in \cal M$ to the optimal value achieved by the (infeasible) Bayes-optimal policy $p^\mu$ which knows $\mu$ in advance. The necessary condition that $\cal M$ admits self-optimizing policies is also sufficient. Furthermore, $p^\xi$ is Pareto-optimal in the sense that there is no other policy yielding higher or equal value in all environments $\nu \in \cal M$ and a strictly higher value in at least one [17].

While AIXI clarifies certain theoretical limits of machine learning, it is computationally intractable, especially when $\cal M$ includes all computable distributions. This drawback motivated work on the time-bounded, asymptotically optimal AIXI system [15] and the related HSEARCH [16], both already discussed in the introduction. Both methods could be used to seed the Gödel machine with an initial policy. Unlike AIXI and HSEARCH, however, the Gödel machine is not only asymptotically optimal but optimal relative to any given formal self-improvement criterion (which does not have to ignore constant slowdowns just because they are constant). It is defined this way. Unlike AIXI it does not have to make the unrealistic assumption that all events of the entire life are memorizable (which is actually a minor issue considering the other costs of AIXI).

While both HSEARCH and AIXI feature hardwired `meta-algorithms' that carefully allocate time to various subroutines in an unmodifiable way, the Gödel machine is self-referential and does not have any unchangeable software. It can replace the proof searcher itself by a faster, perhaps more focused or more domain-specific proof searcher in an online fashion, once the initial axioms together with experiences collected so far yield a proof there is one. It is precisely these self-referential aspects of the Gödel machine that relieve us of much of the burden of careful algorithm design--they make the Gödel machine both conceptually simpler and more general than HSEARCH.

Next: Practical Issues Up: More Relations to Previous Previous: Gödel Machine vs OOPS-RL

Juergen Schmidhuber 2003-09-29

Back to Goedel machine home page