next up previous
Next: Practical Issues Up: More Relations to Previous Previous: Gödel Machine vs OOPS-RL


Gödel Machine vs AIXI etc.

Unlike Gödel machines, Hutter's recent AIXI model [15] generally needs unlimited computational resources per input update. It combines Solomonoff's universal prediction scheme [47,48] with an expectimax computation. In discrete cycle $k=1,2,3, \ldots$, action $y(k)$ results in perception $x(k)$ and reward $r(k)$, both sampled from the unknown (reactive) environmental probability distribution $\mu$. AIXI defines a mixture distribution $\xi$ as a weighted sum of distributions $\nu
\in \cal M$, where $\cal M$ is any class of distributions that includes the true environment $\mu$. For example, $\cal M$ may be a sum of all computable distributions [47,48], where the sum of the weights does not exceed 1. In cycle $k+1$, AIXI selects as next action the first in an action sequence maximizing $\xi$-predicted reward up to some given horizon. Recent work [17] demonstrated AIXI's optimal use of observations as follows. The Bayes-optimal policy $p^\xi$ based on the mixture $\xi$ is self-optimizing in the sense that its average utility value converges asymptotically for all $\mu \in \cal M$ to the optimal value achieved by the (infeasible) Bayes-optimal policy $p^\mu$ which knows $\mu$ in advance. The necessary condition that $\cal M$ admits self-optimizing policies is also sufficient. Furthermore, $p^\xi$ is Pareto-optimal in the sense that there is no other policy yielding higher or equal value in all environments $\nu
\in \cal M$ and a strictly higher value in at least one [17].

While AIXI clarifies certain theoretical limits of machine learning, it is computationally intractable, especially when $\cal M$ includes all computable distributions. This drawback motivated work on the time-bounded, asymptotically optimal AIXI$(t,l)$ system [15] and the related HSEARCH [16], both already discussed in the introduction. Both methods could be used to seed the Gödel machine with an initial policy. Unlike AIXI$(t,l)$ and HSEARCH, however, the Gödel machine is not only asymptotically optimal but optimal relative to any given formal self-improvement criterion (which does not have to ignore constant slowdowns just because they are constant). It is defined this way. Unlike AIXI$(t,l)$ it does not have to make the unrealistic assumption that all events of the entire life are memorizable (which is actually a minor issue considering the other costs of AIXI$(t,l)$). Moreover, the Gödel machine neither requires AIXI$(t,l)$'s assumption of an enumerable environmental distribution nor AIXI$(t,l)$'s particular utility function--we may plug in other types of utility functions as well.

While both HSEARCH and AIXI$(t,l)$ feature hardwired `meta-algorithms' that carefully allocate time to various subroutines in an unmodifiable way, the Gödel machine is self-referential and does not have any unchangeable software. It can replace the proof searcher itself by a faster, perhaps more focused or more domain-specific proof searcher in an online fashion, once the initial axioms together with experiences collected so far yield a proof there is one. It is precisely these self-referential aspects of the Gödel machine that relieve us of much of the burden of careful algorithm design--they make the Gödel machine both conceptually simpler and more general than HSEARCH.


next up previous
Next: Practical Issues Up: More Relations to Previous Previous: Gödel Machine vs OOPS-RL
Juergen Schmidhuber 2003-10-28

Back to Goedel Machine Home Page