Next: DERIVATION OF THE ALGORITHM Up: FLAT MINIMA NEURAL COMPUTATION Previous: TASK / ARCHITECTURE /

THE ALGORITHM

Let $X_0 = \{x_p \mid (x_p,y_p) \in D_0 \}$ denote the inputs of the training set. We approximate $\Delta w(X)$ by $\Delta w(X_0)$ , where $\Delta w(X_0)$ is defined like $\Delta w(X)$ in the previous section (replacing by ). For simplicity, in what follows, we will abbreviate $\Delta w(X_0)$ by $\Delta w$ . Starting with a random initial weight vector, flat minimum search (FMS) tries to find a that not only has low but also defines a box with maximal box volume $V(\Delta w)$ and, consequently, minimal $\tilde B(w,X_0) := - \log (\frac{1}{2^L} V(\Delta w)) = \sum_{i,j} - \log \Delta w_{ij}$ . Note the relationship to MDL: $\tilde B$ is the number of bits required to describe the weights, whereas the number of bits needed to describe the , given (with $(x_p,y_p) \in D_0$ ), can be bounded by fixing $E_{tol}$ (see appendix A.1). In the next section we derive the following algorithm. We use gradient descent to minimize $E(w,D_0)= E(net(w),D_0) + \lambda B(w,X_0)$ , where $B(w,X_0)= \sum_{x_p \in X_0} B(w,x_p)$ , and

$\begin{displaymath} B(w,x_p)= \frac{1}{2} \left( - L \log \epsilon + \sum_{i,j}... ...(w,x_p)}{\partial w_{ij}})^{2}}} \right)^{2} \right) \mbox{ .} \end{displaymath}$

(1)

Here

is the activation of the

th output unit (given weight vector

and input

), $\epsilon$ is a constant, and $\lambda$ is the regularization constant (or hyperparameter) which controls the trade-off between regularization and training error (see appendix A.1). To minimize

, for each $x_p \in X_0$ we have to compute

$\begin{displaymath} \frac{\partial B(w,x_p)}{\partial w_{uv}} = \sum_{k, i,j} \f... ...tial w_{ij} \partial w_{uv}} \mbox{ for all } u,v \mbox{ . } \end{displaymath}$

(2)

It can be shown that by using Pearlmutter's and M $\o$ ller's efficient second order method, the gradient of

can be computed in

time (see details in A.3). Therefore, our algorithm has the same order of computational complexity as standard backprop.

Next: DERIVATION OF THE ALGORITHM Up: FLAT MINIMA NEURAL COMPUTATION Previous: TASK / ARCHITECTURE /

Juergen Schmidhuber 2003-02-13

Back to Financial Forecasting page