Next: DERIVATION OF THE ALGORITHM
Up: FLAT MINIMA NEURAL COMPUTATION
Previous: TASK / ARCHITECTURE /
THE ALGORITHM
Let
denote the inputs of the training set.
We approximate by , where
is defined like in the previous
section (replacing by ).
For simplicity, in what follows, we will abbreviate
by .
Starting with a random initial weight vector,
flat minimum search (FMS)
tries to find a that not only has low
but also defines a box with maximal
box volume and, consequently, minimal
.
Note the relationship to MDL: is the number of bits required
to describe the weights, whereas
the number of bits needed to describe the ,
given
(with
),
can be bounded by fixing (see appendix A.1).
In the next section we derive the following algorithm.
We use gradient descent to minimize
,
where
, and
|
(1) |
Here is the activation of the th output unit
(given weight vector and input ),
is a constant, and
is
the regularization constant (or hyperparameter) which controls
the trade-off between regularization and training error (see appendix A.1).
To minimize ,
for each
we have to compute
|
(2) |
It can be shown that by
using Pearlmutter's and Mller's
efficient second order method,
the gradient of
can be computed in time (see details in A.3).
Therefore, our algorithm
has the same order of computational complexity as standard backprop.
Next: DERIVATION OF THE ALGORITHM
Up: FLAT MINIMA NEURAL COMPUTATION
Previous: TASK / ARCHITECTURE /
Juergen Schmidhuber
2003-02-13
Back to Financial Forecasting page