next up previous
Next: Ring's approach Up: Remedies Previous: Remedies

Time constants

To deal with long time lags, Mozer [17] uses time constants influencing changes of unit activations (deVries and Principe's related approach [7] may be viewed as a mixture of time-delay neural networks (TDNN) [14] and time constants). For long time lags, however, the time constants need external fine tuning [17]. Sun et al.'s alternative approach [25] updates the activation of a recurrent unit by adding the old activation and the (scaled) current net input. The net input, however, tends to perturb the stored information, which makes long-term storage impractical. Lin et al. [16] also propose variants of time-delay networks, called NARX networks (crossreference see also Chapter 11). Gradient flow in this architecture can be improved because embedded memories effectively introduce ``shortcuts'' in the error propagation path through time. The same idea can be applied to other architectures, by inserting multiple delays in the connections among hidden state units rather than output units [15]. However, these architectures cannot solve the general problem since they can only increase by a constant multiplicative factor the duration of the temporal dependencies that can be learned. Finally, El Hihi & Bengio [9] looked at hierarchically organized recurrent neural networks with different levels of time-constants or time-delays.
next up previous
Next: Ring's approach Up: Remedies Previous: Remedies
Juergen Schmidhuber 2003-02-19