Previous: Adaptive sequence chunkers
There is a novel, efficient, gradient-based method called
``Long Short-Term Memory'' (LSTM)
. LSTM is designed to
get rid of the vanishing error problem.
Truncating the gradient where this does not do harm,
LSTM can learn to bridge minimal
time lags in excess of 1000 discrete time steps
by enforcing constant error flow through
``constant error carrousels'' within special units.
Multiplicative gate units learn to open and close access
to the constant error flow.
LSTM is local in space and time;
its computational complexity per time step and weight is .
So far, experiments with
artificial data involved local, distributed, real-valued, and noisy
pattern representations. In comparisons with RTRL, BPTT,
Recurrent Cascade-Correlation, Elman networks, and Neural Sequence
LSTM led to many more successful runs, and learned much faster.
LSTM also solved complex, artificial long time lag tasks that
have never been solved by previous recurrent network algorithms.
It will be interesting to examine to which extent LSTM is applicable
to real world problems such as speech recognition.