Next: Ring's approach
Up: Remedies
Previous: Remedies
To deal with long time lags, Mozer [17]
uses time constants influencing changes of unit activations
(deVries and Principe's related approach [7]
may be viewed as a mixture of
time-delay neural networks (TDNN) [14]
and time constants).
For long time lags, however, the time constants need
external fine tuning [17].
Sun et al.'s alternative approach [25]
updates the activation of a recurrent unit by adding
the old activation and the (scaled) current net input.
The net input, however, tends to perturb the stored information,
which makes long-term storage impractical.
Lin et al. [16] also propose variants of time-delay
networks, called NARX networks (crossreference see also Chapter 11).
Gradient flow in this architecture can be improved because embedded
memories effectively introduce ``shortcuts'' in the error propagation
path through time. The same idea can be applied to other
architectures, by inserting multiple delays in the connections among
hidden state units rather than output units [15]. However,
these architectures cannot solve the general problem since they can
only increase by a constant multiplicative factor the duration of the
temporal dependencies that can be learned.
Finally, El Hihi & Bengio [9]
looked at hierarchically organized recurrent neural networks
with different levels of time-constants or time-delays.
Next: Ring's approach
Up: Remedies
Previous: Remedies
Juergen Schmidhuber
2003-02-19