next up previous
Next: A.4. PSEUDO CODE OF Up: A.3. EFFICIENT IMPLEMENTATION OF Previous: A.3.1 EXPLICIT DERIVATIVE OF

A.3.2. FAST MULTIPLICATION BY THE HESSIAN

Pearlmutter (1994) and M$\o$ller (1993) compute the product of a vector and the Hessian of the error in $O(L)$ time. Using Pearlmutter's notation, we do the same with the Hessian of the output. An operator $R$ is defined as follows:


\begin{displaymath}
R_{y} \{ g(x) \} \equiv \frac{\partial}{\partial t} g(x+ty) \mid_{t=0} \mbox{ .}
\end{displaymath} (42)

The Hessian of the $k$th output $o^k$ of a feedforward net is computed in 3 successive passes:

1. First backward pass ($y^l = o^k$):


\begin{displaymath}
\frac{\partial y^{l}}{\partial y^{i}} =
\left\{ {1 \atop \su...
...x{ for } i=l \atop
\mbox{ for } i \not= l} \right\} \mbox{ ,}
\end{displaymath} (43)


\begin{displaymath}
\frac{\partial y^{l}} {\partial s_{i}} =
f_{i}'(s_{i}) \frac{\partial y^{l}}{\partial y^{i}} \mbox{ ,}
\end{displaymath} (44)


\begin{displaymath}
\frac{\partial y^{l}} {\partial w_{ji}} =
y^{i} \frac{\partial y^{l}} {\partial s_{j}} \mbox{ .}
\end{displaymath} (45)

2. First forward pass:


\begin{displaymath}
R \{ s_{i} \} = \sum_{j} (w_{ij} R \{ y^{j} \} +
\frac{\par...
...ial (\frac{\partial o^{k}}{\partial w_{ij}})} y^{j}) \mbox{ ,}
\end{displaymath} (46)


\begin{displaymath}
R \{ y^{i} \} = \left\{ { 0 \atop R \{ s_{i} \} f_{i}' (s_{i...
...i} \mbox{ input } \atop
\mbox{ otherwise}} \right\} \mbox{ .}
\end{displaymath} (47)

3. Second backward pass ($y^l = o^k$):


\begin{displaymath}
R \{ \frac{\partial y^{l}}{\partial y^{i}} \} =
\left\{
{ 0...
...{ for } y^{i} \mbox{ in layers below } y^l} \right\} \mbox{ ,}
\end{displaymath} (48)


\begin{displaymath}
R \{ \frac{\partial y^{l}}{\partial s_{i}} \} =
f_{i}' (s_{i...
..._{i}'' (s_{i}) \frac{\partial y^{l}}{\partial y^{i}} \mbox{ ,}
\end{displaymath} (49)


\begin{displaymath}
R \{ \frac{\partial y^{l}}{\partial w_{ji}} \} =
y^{i} R \{ ...
...R \{ y^{i} \} \frac{ \partial y^{l}}{\partial s_{j}} \mbox{ .}
\end{displaymath} (50)

The elements of the vector $H^{k} (\nabla_{\frac{\partial
o^{k}}{\partial w_{ij}}} B(w,x_p))$ are $R \{ \frac{\partial o^{k}}{\partial w_{ji}} \}$ (see (41)). Using the technique in (Pearlmutter, 1994), recurrent networks can be dealt with as well.


next up previous
Next: A.4. PSEUDO CODE OF Up: A.3. EFFICIENT IMPLEMENTATION OF Previous: A.3.1 EXPLICIT DERIVATIVE OF
Juergen Schmidhuber 2003-02-13


Back to Financial Forecasting page