Next: BASIC CONCEPTS Up: Discovering Solutions with Low Previous: Discovering Solutions with Low

INTRODUCTION

The first number is 2. The second number is 4. The third number is 6. The fourth number is 8. What is the fifth number? The answer is 34. The reason is the following law. The

th number is

$\begin{displaymath} n^4 -10n^3 + 35n^2 -48n + 24. \end{displaymath}$

But an IQ test requires you to answer ``10'' instead of ``34''. Why not ``34''? The reasons are: (1) ``simple'' solutions are preferred over ``complex'' ones. This idea is often referred to as ``Occam's razor''. (2) It is assumed that the ``simpler'' the rules, the better the generalization on test data. (3) The makers of the IQ test assume that everybody agrees on what ``simple'' means.

Similarly, many researchers agree that learning algorithms ought to extract ``simple'' rules to explain training data. But what exactly does ``simple'' mean? The only theory providing a convincing objective criterion for ``simplicity'' is the theory of Kolmogorov complexity (or algorithmic complexity). Contrary to a popular myth, the incomputability of Kolmogorov complexity (due to the halting problem) does not prevent machine learning applications, because there are tractable yet very general extensions of Kolmogorov complexity. Few machine learning researchers, however, make use of the powerful tools provided by the theory (see Li and Vitanyi (1993) for an excellent overview, see Schmidhuber (1994c) for an application to fine arts).

Purpose of paper. This work and the experiments to be presented herein are intended (1) to demonstrate that basic concepts from the theory of Kolmogorov complexity are indeed of interest for machine learning purposes, (2) to encourage machine learning researchers to study this theory, (3) to point to problems concerning ``incremental learning'', and (4) to mention initial steps towards solving them.

Outline. Section 2 briefly reviews basic concepts of algorithmic complexity theory relevant to machine learning, including Levin complexity (an extension of Kolmogorov complexity) and Levin's universal optimal search algorithm. For a broad class of problems, universal search can be shown to be optimal with respect to total expected search time, leaving aside a constant factor which does not depend on the problem. To my knowledge, section 3 presents the first general (working) implementation of (a probabilistic variant of) universal search. Experiments in section 4 focus on the task of finding ``simple'' neural nets with excellent generalization capability. Section 5 goes beyond sections 1-4, by addressing ``incremental'' learning situations.

Next: BASIC CONCEPTS Up: Discovering Solutions with Low Previous: Discovering Solutions with Low

Juergen Schmidhuber 2003-02-25

Back to Optimal Universal Search page
Back to Program Evolution page
Back to Algorithmic Information page
Back to Speed Prior page