Next: 3. INITIAL LEARNING ALGORITHM
Up: 2. THE `INTROSPECTIVE' NETWORK
Previous: 2. THE `INTROSPECTIVE' NETWORK
I assume that the input sequence observed by the network
has length
(where
) and
can be divided into equal-sized blocks of
length during which the input pattern does not change.
This does not imply a loss of generality --
it just means speeding up the network's hardware such that
each input pattern is presented for time-steps before the
next pattern can be observed. This gives
the architecture time-steps to do some
sequential processing (including immediate weight changes)
before seeing a new pattern
of the input sequence.
In what follows, unquantized
variables are assumed to take on their maximal range. The network
dynamics are specified as follows:
|
(1) |
The network can quickly read information about
its current weights
into the special input unit according to
|
(2) |
where denotes Euclidean length,
and is a differentiable function emitting values
between 0 and 1 that determines
how close a connection address
has to be to the activations of the
analyzing units in order for its weight to contribute to
at that time. Such a function
might have a narrow peak at 1 around the origin and be zero (or
nearly zero)
everywhere else. This essentially allows the network to
pick out a single connection at a time
and obtain its current weight value without receiving
`cross-talk' from other weights.
The network can quickly modify its current weights using
and
according to
|
(3) |
Again, if has a narrow peak at 1 around the origin and is zero (or
nearly zero) everywhere else, the network will be able to
pick out a single connection at a time and change its weight without
affecting other weights.
Objective function and dynamics of the eval units.
As with typical supervised sequence-learning tasks,
we want to minimize
where
|
(4) |
Here may be a desired target value for the -th
output unit at time step .
Next: 3. INITIAL LEARNING ALGORITHM
Up: 2. THE `INTROSPECTIVE' NETWORK
Previous: 2. THE `INTROSPECTIVE' NETWORK
Juergen Schmidhuber
2003-02-21
Back to Metalearning page
Back to Recurrent Neural Networks page