next up previous
Next: 3. INITIAL LEARNING ALGORITHM Up: 2. THE `INTROSPECTIVE' NETWORK Previous: 2. THE `INTROSPECTIVE' NETWORK

2.1. `SELF-REFERENTIAL' DYNAMICS AND OBJECTIVE FUNCTION

I assume that the input sequence observed by the network has length $n_{time} = n_sn_r$ (where $n_s,n_r \in {\bf N}$) and can be divided into $n_s$ equal-sized blocks of length $n_r$ during which the input pattern $x(t)$ does not change. This does not imply a loss of generality -- it just means speeding up the network's hardware such that each input pattern is presented for $n_r$ time-steps before the next pattern can be observed. This gives the architecture $n_r$ time-steps to do some sequential processing (including immediate weight changes) before seeing a new pattern of the input sequence.

In what follows, unquantized variables are assumed to take on their maximal range. The network dynamics are specified as follows:

\begin{displaymath}
net_{y_k}(1)=0,
~~\forall t \geq 1:~~x_k(t)\leftarrow environment,~~
y_k(t) = f_{y_k}(net_{y_k}(t)),
\end{displaymath}


\begin{displaymath}
\forall t>1:~~
net_{y_k}(t) = \sum_l w_{y_kl}(t-1)l(t-1),
\end{displaymath} (1)

The network can quickly read information about its current weights into the special $val$ input unit according to
\begin{displaymath}
val(1) = 0,~~\forall t\geq 1:~
val(t+1) = \sum_{i,j}g[ \Vert ana(t) - adr(w_{ij}) \Vert^2]w_{ij}(t),
\end{displaymath} (2)

where $\Vert \ldots \Vert$ denotes Euclidean length, and $g$ is a differentiable function emitting values between 0 and 1 that determines how close a connection address has to be to the activations of the analyzing units in order for its weight to contribute to $val$ at that time. Such a function $g$ might have a narrow peak at 1 around the origin and be zero (or nearly zero) everywhere else. This essentially allows the network to pick out a single connection at a time and obtain its current weight value without receiving `cross-talk' from other weights.

The network can quickly modify its current weights using $mod(t)$ and $\bigtriangleup(t)$ according to

\begin{displaymath}
~~\forall t \geq 1:~~
w_{ij}(t+1) =
w_{ij}(t) +
\bigtriangleup(t)~g[~ \Vert adr(w_{ij}) - mod(t) \Vert^2~ ].
\end{displaymath} (3)

Again, if $g$ has a narrow peak at 1 around the origin and is zero (or nearly zero) everywhere else, the network will be able to pick out a single connection at a time and change its weight without affecting other weights.

Objective function and dynamics of the eval units. As with typical supervised sequence-learning tasks, we want to minimize

\begin{displaymath}
E^{total}(n_rn_s),
~~where~~
E^{total}(t) = \sum_{\tau = 1}^{t} E(\tau),
~~where~~
E(t) = \frac{1}{2} \sum_k (eval_k(t+1))^2,
\end{displaymath}

where
\begin{displaymath}
eval_k(1) = 0,~~\forall t \geq 1:
eval_k(t+1) =
d_k(t) - o_k(t)~~if~d_k(t)~exists,~and~0~else.
\end{displaymath} (4)

Here $d_k(t)$ may be a desired target value for the $k$-th output unit at time step $t$.


next up previous
Next: 3. INITIAL LEARNING ALGORITHM Up: 2. THE `INTROSPECTIVE' NETWORK Previous: 2. THE `INTROSPECTIVE' NETWORK
Juergen Schmidhuber 2003-02-21


Back to Metalearning page
Back to Recurrent Neural Networks page