next up previous
Next: Intuitive explanation of equation Up: Exponential error decay Previous: Gradients of the error

Error path integral

Suppose we have a fully connected net whose non-input unit indices range from 1 to $n$. Let us focus on local error flow from output unit $k$ to arbitrary unit $v$ (later we will see that the analysis immediately extends to global error flow). The error occurring at $k$ at time step $t$ is propagated ``back in time'' for $t-s$ time steps, to an arbitrary unit $v$ at time $s<t$. This scales the error by the following factor:
\begin{displaymath}
\frac{\partial \delta_{v} (s)}{\partial \delta_{k} (t)} =
...
...{k} (t)} \ \ w_{lv} \right)
& t-s > 1
\end{array} \right. .
\end{displaymath} (1)

In order to solve the above equation, we will expand it by unrolling over time (as done for example in deriving BPTT). In particular, for $s<\tau< t$ let $l_{\tau}$ denote the index of a generic non input unit in the replica of the network at time $\tau$. Moreover, let $l_{s}=v$ and $l_{t}=k$. We obtain:
\begin{displaymath}
\frac{\partial \delta_{v} (s)}
{\partial \delta_{k}(t)} ...
...{l_{\tau}l_{\tau-1}}\right) f_{l_s}'(net_{l_{s}} (s))\right)
\end{displaymath} (2)

(proof by induction). It can be immediately shown that if the local error vanishes, then the global error vanishes too. To see this compute

\begin{displaymath}\sum_{k \in O}
\frac{\partial \delta_{v} (s)}{\partial \delta_{k} (t)}\end{displaymath}

where $O$ denotes the set of output units.
next up previous
Next: Intuitive explanation of equation Up: Exponential error decay Previous: Gradients of the error
Juergen Schmidhuber 2003-02-19