Previous work on the relationship between neural network complexity and generalization capability, e.g. Baum and Haussler (1989), Hinton (1993), Nowlan (1992), MacKay (1992), Hassibi (1993), Vapnik (1992), Maass (1994), is not general in the sense of Solomonoff, Kolmogorov, and Levin. Likewise, previous implementations use measures for ``simplicity'' that lack the universality and elegance of those based on Kolmogorov complexity and algorithmic information theory. Many previous approaches are based on ad-hoc (usually Gaussian) priors. For such reasons, most of the remainder of this paper is devoted to simulations of the more general method based on the universal prior, self-sizing programs, and the probabilistic search algorithm preferring candidates with low Levin complexity over candidates with high Levin complexity. With certain toy problems, it will be demonstrated that the approach can lead to generalization results unmatchable by more traditional neural net algorithms.
With the following experiments, an average program runs for not many more than 10 time steps before halting or being halted. But there are programs running for millions of time steps, of course.