next up previous
Next: EXPERIMENTAL RESULTS (see [4] Up: SIMPLIFYING NEURAL NETS BY Previous: TASK / ARCHITECTURE /


THE ALGORITHM

The algorithm is designed to find a $w$ defining a box $M_w$ with maximal box volume $\Delta w$. This is equivalent to finding a box $M_w$ with minimal $\tilde B(w,D_0) := - \log (\Delta w / 2^W) =
\sum_{i,j} - \log \Delta w_{ij}$. Note the relationship to MDL ($\tilde B$ is the number of bits required to describe the weights). In appendix A.2, we derive the following algorithm. It minimizes $E(w,D_0)= E_{q}(w,D_0) + \lambda B(w,D_0)$, where
\begin{displaymath}
B=
\frac{1}{2} \left( - W \log \epsilon + \sum_{i,j} \log \...
...rtial o^k}{\partial w_{ij}})^{2}}} \right)^{2} \right)\mbox{.}
\end{displaymath} (1)

Here $o^k$ is the activation of the $k$th output unit, $\epsilon$ is a constant, and $\lambda$ is a positive variable ensuring either $E_{q}(w,D_0) \leq E_{tol}$, or ensuring an expected decrease of $E_{q}(.,D_0)$ during learning (see [] for adjusting $\lambda$).

$E(w,D_0)$ is minimized by gradient descent. To minimize $B(w,D_0)$, we compute

\begin{displaymath}
\frac{\partial B(w,D_0)}{\partial w_{uv}} =
\sum_{k, i,j} \f...
...tial w_{ij} \partial w_{uv}}
\mbox{ for all } u,v
\mbox{ . }
\end{displaymath} (2)

It can be shown (see [4]) that by using Pearlmutter's and M$\o$ller's efficient second order method [,7], the gradient of $B(w,D_0)$ can be computed in $O(W)$ time (see details in [4]). Therefore, our algorithm has the same order of complexity as standard backprop.
next up previous
Next: EXPERIMENTAL RESULTS (see [4] Up: SIMPLIFYING NEURAL NETS BY Previous: TASK / ARCHITECTURE /
Juergen Schmidhuber 2003-02-25


Back to Financial Forecasting page