Complexity and High Generalization Capability

**Juergen Schmidhuber
IDSIA ^{1}
juergen@idsia.ch
**

Many machine learning algorithms aim at
finding ``simple'' rules to explain
training data. The expectation is: the ``simpler'' the rules,
the better the generalization on test data ( Occam's razor).
Most practical implementations, however,
use measures for ``simplicity'' that lack the power,
universality and elegance of those based on Kolmogorov complexity
and Solomonoff's algorithmic probability.
Likewise, most previous approaches
(especially those of the ``Bayesian'' kind)
suffer from the problem of choosing appropriate priors.
This paper addresses both issues.
It first reviews some basic concepts of algorithmic
complexity theory relevant to machine learning, and
how the Solomonoff-Levin distribution (or universal
prior) deals with the prior problem. The universal prior leads to
a probabilistic method for finding ``algorithmically
simple'' problem solutions with high generalization capability.
The method is based on Levin complexity (a time-bounded extension of
Kolmogorov complexity) and inspired by Levin's optimal
universal search algorithm.
With a given problem, solution candidates are computed by
efficient ``self-sizing'' programs
that influence their own runtime and storage size.
The probabilistic search algorithm finds
the ``good'' programs (the ones quickly computing
algorithmically probable solutions fitting the training data).
Experiments focus on the task of discovering
``algorithmically simple'' neural networks with low Kolmogorov
complexity and high generalization
capability. These experiments demonstrate that the
method, at least with certain
toy problems where it is computationally feasible, can lead to
generalization results unmatchable by
previous neural net algorithms.

- INTRODUCTION
- BASIC CONCEPTS
- PROBABILISTIC SEARCH

- ``SIMPLE'' NEURAL NETS

- INCREMENTAL LEARNING

- ACKNOWLEDGEMENTS
- Bibliography
- About this document ...

Back to Optimal Universal Search page

Back to Program Evolution page

Back to Algorithmic Information page

Back to Speed Prior page