UNIVERSAL LEARNING MACHINES - OPTIMAL UNIVERSAL AI

Schmidhuber's SNF project 20-61847: Unification of universal induction and sequential decision theory (Hutter)

THEORY OF UNIVERSAL
LEARNING MACHINES
& UNIVERSAL AI

Top: from the cover of the book.

Click here for a less general but practically highly feasible method for predicting financial data.

German home

Physicists and economists and other inductive scientists make predictions based on observations. So does everybody in daily life.

Did you know that there is a theoretically optimal way of predicting? Every scientist should know about it.

Normally we do not know the true conditional probability distribution p(next event | past). But assume we do know that p is in some set P of distributions. Choose a fixed weight w_q for each q in P such that the w_q add up to 1 (for simplicity, let P be countable). Then construct the Bayesmix M(x) = Sum_q w_q q(x), and predict using M instead of the optimal but unknown p.

How wrong is it to do that? The recent exciting work of Marcus Hutter (funded through Juergen Schmidhuber's SNF research grant "Unification of Universal Induction and Sequential Decision Theory") provides general and sharp loss bounds:

Let LM(n) and Lp(n) be the total expected losses of the M-predictor and the p-predictor, respectively, for the first n events. Then LM(n)-Lp(n) is at most of the order of sqrt[Lp(n)]. That is, M is not much worse than p. And in general, no other predictor can do better than that!

In particular, if p is deterministic, then the M-predictor soon won't make any errors any more!

If P contains ALL computable distributions, then M becomes the celebrated enumerable universal prior. That is, after decades of somewhat stagnating research we now have sharp loss bounds for Ray Solomonoff's universal (but incomputable) induction scheme (1964, 1978).

Alternatively, reduce M to what you get if you just add up weighted estimated future finance data probabilities generated by 1000 commercial stock-market prediction software packages. If only one of them happens to work fine (but you do not know which) you still should get rich.

Note that the approach is much more general than what is normally done in traditional statistical learning theory, where the often quite unrealistic assumption is that the observations are statistically independent.

Here are pointers to Marcus's important new theorems:

Convergence and Loss Bounds for Bayesian Sequence Prediction (pdf). IEEE Transactions on Information Theory, 49:8 (2003) 2061-2067.

Optimality of Universal Bayesian Sequence Prediction for General Loss and Alphabet (pdf).
Journal of Machine Learning Research 4, p 971-1000, 2003.

General Loss Bounds for Universal Sequence Prediction
Proceedings of the 18th International Conference on Machine Learning (ICML-2001) 210-217, cs.AI/0101019

New Error Bounds for Solomonoff Prediction
Journal of Computer and System Science, 62:4 (2001) 653-667, cs.AI/9912008

Cumulatively enumerable measure:
Instead of using Solomonoff's universal enumerable measure we should probably use the even more dominant cumulatively universal enumerable measure, without loss of computability in the limit:

J. Schmidhuber. Hierarchies of generalized Kolmogorov complexities and nonenumerable universal measures computable in the limit. International Journal of Foundations of Computer Science 13(4):587-612 (2002). HTML overview.

SPEED PRIOR

So Solomonoff's strategy is optimal but noncomputable. Let us make the more radical assumption that the data is generated by a computational process, and that the cumulative prior probability of all data whose computation costs at least O(n) time is inversely proportional to n. Since in fact there exists a simple, general, asymptotically optimal algorithm for all computable data, we can explicitly extract the corresponding Speed Prior, and derive a computable strategy for optimal inductive reasoning. Details can be found in Section 6 of Algorithmic Theories of Everything as well as in the reference to the right:

J. Schmidhuber. The Speed Prior: A New Simplicity Measure Yielding Near-Optimal Computable Predictions. In J. Kivinen and R. H. Sloan, editors, Proceedings of the 15th Annual Conference on Computational Learning Theory (COLT 2002), Sydney, Australia, Lecture Notes in Artificial Intelligence, pages 216-228. Springer, 2002. PDF. HTML overview.

THEORY OF UNIVERSAL LEARNING MACHINES & UNIVERSAL AI

THEORY OF UNIVERSAL
LEARNING MACHINES
& UNIVERSAL AI