2.1. `SELF-REFERENTIAL' DYNAMICS AND OBJECTIVE FUNCTION

Next: 3. INITIAL LEARNING ALGORITHM Up: 2. THE `INTROSPECTIVE' NETWORK Previous: 2. THE `INTROSPECTIVE' NETWORK

2.1. `SELF-REFERENTIAL' DYNAMICS AND OBJECTIVE FUNCTION

I assume that the input sequence observed by the network has length $n_{time} = n_sn_r$ (where $n_s,n_r \in {\bf N}$ ) and can be divided into

equal-sized blocks of length

during which the input pattern

does not change. This does not imply a loss of generality -- it just means speeding up the network's hardware such that each input pattern is presented for

time-steps before the next pattern can be observed. This gives the architecture

time-steps to do some sequential processing (including immediate weight changes) before seeing a new pattern of the input sequence.

In what follows, unquantized variables are assumed to take on their maximal range. The network dynamics are specified as follows:

$\begin{displaymath} net_{y_k}(1)=0, ~~\forall t \geq 1:~~x_k(t)\leftarrow environment,~~ y_k(t) = f_{y_k}(net_{y_k}(t)), \end{displaymath}$

$\begin{displaymath} \forall t>1:~~ net_{y_k}(t) = \sum_l w_{y_kl}(t-1)l(t-1), \end{displaymath}$

(1)

The network can quickly read information about its current weights into the special

input unit according to

$\begin{displaymath} val(1) = 0,~~\forall t\geq 1:~ val(t+1) = \sum_{i,j}g[ \Vert ana(t) - adr(w_{ij}) \Vert^2]w_{ij}(t), \end{displaymath}$

(2)

where $\Vert \ldots \Vert$ denotes Euclidean length, and

is a differentiable function emitting values between 0 and 1 that determines how close a connection address has to be to the activations of the analyzing units in order for its weight to contribute to

at that time. Such a function

might have a narrow peak at 1 around the origin and be zero (or nearly zero) everywhere else. This essentially allows the network to pick out a single connection at a time and obtain its current weight value without receiving `cross-talk' from other weights.

The network can quickly modify its current weights using and $\bigtriangleup(t)$ according to

$\begin{displaymath} ~~\forall t \geq 1:~~ w_{ij}(t+1) = w_{ij}(t) + \bigtriangleup(t)~g[~ \Vert adr(w_{ij}) - mod(t) \Vert^2~ ]. \end{displaymath}$

(3)

Again, if

has a narrow peak at 1 around the origin and is zero (or nearly zero) everywhere else, the network will be able to pick out a single connection at a time and change its weight without affecting other weights.

Objective function and dynamics of the eval units. As with typical supervised sequence-learning tasks, we want to minimize

$\begin{displaymath} E^{total}(n_rn_s), ~~where~~ E^{total}(t) = \sum_{\tau = 1}^{t} E(\tau), ~~where~~ E(t) = \frac{1}{2} \sum_k (eval_k(t+1))^2, \end{displaymath}$

where

$\begin{displaymath} eval_k(1) = 0,~~\forall t \geq 1: eval_k(t+1) = d_k(t) - o_k(t)~~if~d_k(t)~exists,~and~0~else. \end{displaymath}$

(4)

Here

may be a desired target value for the

-th output unit at time step

Next: 3. INITIAL LEARNING ALGORITHM Up: 2. THE `INTROSPECTIVE' NETWORK Previous: 2. THE `INTROSPECTIVE' NETWORK

Juergen Schmidhuber 2003-02-21

Back to Metalearning page
Back to Recurrent Neural Networks page