INTRODUCTION

**Basic idea of flat minima search**.
Our algorithm
finds a large region in weight space with the property
that each weight vector from that region has *similar* small error.
Such regions are called ``flat minima''.
To get an intuitive feeling for why ``flat'' minima
are interesting, consider this (see also Wolpert [14]):
a ``sharp'' minimum corresponds to weights which have
to be specified with high precision.
A ``flat'' minimum corresponds to weights many of which can
be given with low precision.
In the terminology of the theory of minimum description length (MDL),
fewer bits of information are required to pick a
``flat'' minimum (corresponding to a ``simple'' or low complexity-network).
The MDL principle suggests that low network complexity
corresponds to high generalization performance
(see e.g. [4,10]).
Unlike Hinton and van Camp's method [3]
(see appendix A.3),
our approach does not depend on explicitly choosing
a ``good'' prior.

Our algorithm finds ``flat'' minima by
searching for weights that minimize
both training error and weight precision.
This requires the computation
of the Hessian. However,
by using Pearlmutter's and Mller's efficient second order
method [,7],
we obtain
the same order of complexity as with conventional backprop.
*Automatically, the method effectively reduces numbers of
units, weigths, and input lines, as
well as the sensitivity of outputs with respect
to remaining weights and units.* Excellent experimental
generalization results will be reported in section 4.

Back to Financial Forecasting page