        .
 Schmidhuber's SNF project 20-61847: Unification of universal induction and sequential decision theory (Hutter)
. ## THEORY OF UNIVERSAL LEARNING MACHINES & UNIVERSAL AI

.
 Top: from the cover of the book. Click here for a less general but practically highly feasible method for predicting financial data. German home
 Physicists and economists and other inductive scientists make predictions based on observations. So does everybody in daily life. Did you know that there is a theoretically optimal way of predicting? Every scientist should know about it. Normally we do not know the true conditional probability distribution p(next event | past). But assume we do know that p is in some set P of distributions. Choose a fixed weight w_q for each q in P such that the w_q add up to 1 (for simplicity, let P be countable). Then construct the Bayesmix M(x) = Sum_q w_q q(x), and predict using M instead of the optimal but unknown p. How wrong is it to do that? The recent exciting work of Marcus Hutter (funded through Juergen Schmidhuber's SNF research grant "Unification of Universal Induction and Sequential Decision Theory") provides general and sharp loss bounds: Let LM(n) and Lp(n) be the total expected losses of the M-predictor and the p-predictor, respectively, for the first n events. Then LM(n)-Lp(n) is at most of the order of sqrt[Lp(n)]. That is, M is not much worse than p. And in general, no other predictor can do better than that! In particular, if p is deterministic, then the M-predictor soon won't make any errors any more! If P contains ALL computable distributions, then M becomes the celebrated enumerable universal prior. That is, after decades of somewhat stagnating research we now have sharp loss bounds for Ray Solomonoff's universal (but incomputable) induction scheme (1964, 1978). Alternatively, reduce M to what you get if you just add up weighted estimated future finance data probabilities generated by 1000 commercial stock-market prediction software packages. If only one of them happens to work fine (but you do not know which) you still should get rich. Note that the approach is much more general than what is normally done in traditional statistical learning theory, where the often quite unrealistic assumption is that the observations are statistically independent.
 Here are pointers to Marcus's important new theorems: Convergence and Loss Bounds for Bayesian Sequence Prediction (pdf). IEEE Transactions on Information Theory, 49:8 (2003) 2061-2067. Optimality of Universal Bayesian Sequence Prediction for General Loss and Alphabet (pdf). Journal of Machine Learning Research 4, p 971-1000, 2003. General Loss Bounds for Universal Sequence Prediction Proceedings of the 18th International Conference on Machine Learning (ICML-2001) 210-217, cs.AI/0101019 New Error Bounds for Solomonoff Prediction Journal of Computer and System Science, 62:4 (2001) 653-667, cs.AI/9912008 Cumulatively enumerable measure: Instead of using Solomonoff's universal enumerable measure we should probably use the even more dominant cumulatively universal enumerable measure, without loss of computability in the limit: J. Schmidhuber. Hierarchies of generalized Kolmogorov complexities and nonenumerable universal measures computable in the limit. International Journal of Foundations of Computer Science 13(4):587-612 (2002). HTML overview.
. SPEED PRIOR So Solomonoff's strategy is optimal but noncomputable. Let us make the more radical assumption that the data is generated by a computational process, and that the cumulative prior probability of all data whose computation costs at least O(n) time is inversely proportional to n. Since in fact there exists a simple, general, asymptotically optimal algorithm for all computable data, we can explicitly extract the corresponding Speed Prior, and derive a computable strategy for optimal inductive reasoning. Details can be found in Section 6 of Algorithmic Theories of Everything as well as in the reference to the right:
 J. Schmidhuber. The Speed Prior: A New Simplicity Measure Yielding Near-Optimal Computable Predictions. In J. Kivinen and R. H. Sloan, editors, Proceedings of the 15th Annual Conference on Computational Learning Theory (COLT 2002), Sydney, Australia, Lecture Notes in Artificial Intelligence, pages 216-228. Springer, 2002. PDF. HTML overview.
. Related links: 1. TU Munich Cogbotlab 2. All Publications 3. Gödel machines 4. OOPS 5. Artificial intelligence 6. Statistical robotics 7. Reinforcement learning 8. Metalearning 9. Active learning
 UNIVERSAL REINFORCEMENT LEARNERS AND DECISION MAKERS Once we have an optimal predictor, in principle we also should have an optimal decision maker or reinforcement learner, always picking those action sequences with the highest predicted success - a universal AI! So we want to generalize Solomonoff's passive case to the case of active universal reinforcement learning machines that perform at least as well as any other learner when it comes to maximizing expected reward in arbitrary unknown environments. This is exactly what the AIXI model achieves (by Marcus Hutter funded through Schmidhuber's grant "Unification of Universal Induction and Sequential Decision Theory"): An optimal predictor using a universal Bayesmix M predicts future events including reward. Here M again is just a weighted sum of all distributions nu in a set P. AIXI now simply selects those action sequences that maximize predicted reward. It turns out that this method really is self-optimizing in the following sense: for all nu in the mix, the average value of actions, given the history, asymptotically converges to the optimal value achieved by the unknown policy which knows the true mu in advance! The necessary condition is that P does admit self-optimizing policies. This is also sufficient! And there is no other policy yielding higher or equal value in all environments nu and a strictly higher value in at least one. Here are pointers to Hutter's results: Towards a Universal Theory of Artificial Intelligence based on Algorithmic Probability and Sequential Decisions Proceedings of the 12th Eurpean Conference on Machine Learning (ECML-2001), p. 226-238. Self-optimizing and Pareto-optimal policies in general environments based on Bayes-mixtures. In J. Kivinen and R. H. Sloan, editors, Proceedings of the 15th Annual Conference on Computational Learning Theory (COLT 2002), Sydney, Australia, Lecture Notes in Artificial Intelligence, p. 364-379. Springer, 2002. The book: Universal Artificial Intelligence: Sequential Decisions based on Algorithmic Probability. Springer, Berlin, 2004. HTML. Interestingly, the right way of treating the temporal horizon is not to discount it exponentially, as done in most traditional work on reinforcement learning, but to let the future horizon grow in proportion to the learner's lifetime so far. To quote some referees: "[...] Clearly fundamental and potentially interesting research direction with practical applications. [...] Great theory. Extends a major theoretical direction that led to practical MDL and MML. This approach may do the same thing (similar thing) wrt to decision theory and reinforcement learning, to name a few." "[...] this could be the foundation of a theory which might inspire AI and MC for years (decades?)" Unfortunately, general AIXI is incomputable. So we are not quite finished yet with the grand problem of Artificial Intelligence. This motivates Schmidhuber's recent work on the Gödel machine which may use the Optimal Ordered Problem Solver for searching provably optimal self-improvements, given limited resources, to obtain an optimal, general reinforcement learner.
.