1. Recognition of temporally extended patterns in noisy input sequences
2. Recognition of simple regular and context free and context sensitive languages
( Felix Gers, 2000)
3. Recognition of the temporal order of widely separated events in noisy input streams
4. Extraction of information conveyed by the temporal distance between events
5. Stable generation of precisely timed rhythms, smooth and nonsmooth periodic trajectories
6. Robust storage of highprecision real numbers across extended time intervals
7. Reinforcement learning in partially observable environments
(Schmidhuber's postdoc
Bram Bakker
,
2001)
8. Metalearning of fast online learning algorithms
(
Sepp Hochreiter
, 2001)
9.
Music improvisation and music composition
(Schmidhuber's former postdoc
Doug Eck
, 2002)
10.
Aspects of speech segmentation and speech recognition (Alex Graves, Nicole
Beringer, 2004).
Typical LSTM cell (right):
LSTM networks usually consist of many connected LSTM cells.
Each cell is very simple. At its core there is a linear
unit or neuron (orange).
At any given time it just sums up the inputs that it sees
via its incoming weighted connections.
Its selfrecurrent connection has a fixed weight of 1.0
(except when modulated  via the violet
dot  through the left green
unit which is not mandatory and which we may ignore for the moment). The 1.0 weight
overcomes THE major problem of previous RNNs by making
sure that training signals "from the future" cannot vanish
as they are being "propagated back in time"
(if this jargon does not make any sense to you,
please consult some RNN papers, e.g., those below).
Suffice it to say here
that the simple linear unit is THE reason why LSTM nets can
learn to discover the importance of events that happened 1000 discrete
time steps ago, while previous RNNs already fail in case of time lags exceeding
as few as 10 steps!
The linear unit is typically surrounded by a cloud of
nonlinear adaptive units which are responsible for learning the nonlinear
aspects of sequence processing. Here we see an input unit (blue)
and three (green) multiplicative gate units (small violet dots
represent multiplications). The gates essentially
learn to protect the central linear unit from irrelevant
input events and error signals.
The LSTM learning algorithm is very efficient 
not more than O(1)
computations
per time step and weight!
Some recent publications on LSTM RNNs:
14.
A. Graves, D. Eck and N. Beringer, J. Schmidhuber.
Isolated Digit Recognition with LSTM Recurrent Networks.
First Intl. Workshop on Biologically
Inspired Approaches to Advanced Information Technology,
2004, in press.
13.
A. Graves, N. Beringer, J. Schmidhuber.
A Comparison Between Spiking and Differentiable Recurrent
Neural Networks on Spoken Digit Recognition.
In Proc. 23rd International Conference on modelling, identification,
and control (IASTED), 2004, in press.
12.
B. Bakker and J. Schmidhuber.
Hierarchical Reinforcement
Learning Based on Subgoal Discovery and Subpolicy Specialization
(PDF).
In F. Groen, N. Amato, A. Bonarini, E. Yoshida, and B. Kröse (Eds.),
Proceedings of the 8th Conference on Intelligent Autonomous Systems,
IAS8, Amsterdam, The Netherlands, p. 438445, 2004.
11.
D. Eck, A. Graves, J. Schmidhuber.
A New Approach to Continuous Speech Recognition Using
LSTM Recurrent Neural Networks.
TR IDSIA1403, 2003.
10. J. A. PerezOrtiz, F. A. Gers, D. Eck, J. Schmidhuber.
Kalman filters improve LSTM network performance in
problems unsolvable by traditional recurrent nets.
Neural Networks 16(2):241250, 2003.
PDF.
9.
F. Gers, N. Schraudolph, J. Schmidhuber.
Learning precise timing with
LSTM recurrent networks.
Journal of Machine Learning Research 3:115143, 2002.
PDF.
8.
J. Schmidhuber, F. Gers, D. Eck.
J. Schmidhuber, F. Gers, D. Eck.
Learning nonregular languages:
A comparison of simple recurrent networks and LSTM.
Neural Computation, 14(9):20392041, 2002.
PDF.
7.
B. Bakker.
Reinforcement Learning with Long ShortTerm Memory.
Advances in Neural Information Processing
Systems 13 (NIPS'13), 2002.
(On J. Schmidhuber's CSEM grant 2002.)
6.
D. Eck and J. Schmidhuber.
Learning The LongTerm Structure of the Blues.
In J. Dorronsoro, ed.,
Proceedings of Int. Conf. on Artificial Neural Networks
ICANN'02, Madrid, pages 284289, Springer, Berlin, 2002.
PDF.
5.
F. A. Gers and J. Schmidhuber.
LSTM Recurrent Networks Learn Simple Context Free and
Context Sensitive Languages.
IEEE Transactions on Neural Networks 12(6):13331340, 2001.
PDF.
4.
F. A. Gers and J. Schmidhuber and F. Cummins.
Learning to Forget: Continual Prediction with LSTM.
Neural Computation, 12(10):24512471, 2000.
PDF.
3.
S. Hochreiter and J. Schmidhuber.
Long ShortTerm Memory.
Neural Computation, 9(8):17351780, 1997.
PDF .
2.
S. Hochreiter and J. Schmidhuber.
LSTM can solve hard long time lag problems.
In M. C. Mozer, M. I. Jordan, T. Petsche, eds.,
Advances in Neural Information Processing Systems 9, NIPS'9,
pages 473479, MIT Press, Cambridge MA, 1997.
PDF .
HTML.
1.
S. Hochreiter and J. Schmidhuber.
Bridging long time lags by weight guessing and "Long ShortTerm
Memory".
In F. L. Silva, J. C. Principe, L. B. Almeida, eds.,
Frontiers in Artificial Intelligence and Applications, Volume 37,
pages 6572, IOS Press, Amsterdam, Netherlands, 1996.
Please also find numerous additional publications on LSTM in the
home pages of
Juergen Schmidhuber,
Doug Eck,
and
Felix Gers.
Felix's home page also has pointers to LSTM source code.
Additional RNN publications (more
here):
13.
J. Schmidhuber and S. Hochreiter.
Guessing can outperform many long time lag algorithms.
Technical Note IDSIA1996, IDSIA, May 1996.
See also NIPS'96 HTML.
12.
J. Schmidhuber.
A selfreferential weight matrix.
In Proceedings of the International Conference on Artificial
Neural Networks, Amsterdam, pages 446451. Springer, 1993.
PDF .
HTML.
11.
J. Schmidhuber.
Reducing the ratio between learning complexity and number of
timevarying variables in fully recurrent nets.
In Proceedings of the International Conference on Artificial
Neural Networks, Amsterdam, pages 460463. Springer, 1993.
PDF.
HTML.
10.
J. Schmidhuber.
Netzwerkarchitekturen, Zielfunktionen und Kettenregel.
(Net architectures, objective functions, and chain rule.)
Habilitation (postdoctoral thesis  qualification for a
tenure professorship),
Institut für Informatik, Technische Universität
München, 1993 (496 K).
PDF .
HTML.
9.
J. Schmidhuber.
Learning complex,
extended sequences using the principle of history compression.
Neural Computation, 4(2):234242, 1992 (41 K).
PDF.
HTML.
8.
J. Schmidhuber.
Learning unambiguous reduced sequence descriptions.
In J. E. Moody, S. J. Hanson, and R. P. Lippman, editors,
Advances in Neural Information Processing Systems 4, NIPS'4, pages 291298. San
Mateo, CA: Morgan Kaufmann, 1992.
PDF .
HTML.
7.
J. Schmidhuber.
A fixed size
storage O(n^3) time complexity learning algorithm for fully recurrent
continually running networks.
Neural Computation, 4(2):243248, 1992 (33 K).
PDF.
HTML.
6.
J. Schmidhuber.
Learning to
control fastweight memories: An alternative to recurrent nets.
Neural Computation, 4(1):131139, 1992 (39 K).
PDF.
HTML.
Pictures (German).
5.
J. Schmidhuber.
Learning temporary variable binding with dynamic links.
In Proc. International Joint Conference on Neural Networks,
Singapore, volume 3, pages 20752079. IEEE, 1991.
4.
J. Schmidhuber.
An online algorithm for dynamic reinforcement learning and planning
in reactive environments.
In Proc. IEEE/INNS International Joint Conference on Neural
Networks, San Diego, volume 2, pages 253258, 1990.
3.
J. Schmidhuber.
Learning algorithms for networks with internal and external feedback.
In D. S. Touretzky, J. L. Elman, T. J. Sejnowski,
and G. E. Hinton,
editors, Proc. of the 1990 Connectionist Models Summer School, pages
5261. San Mateo, CA: Morgan Kaufmann, 1990.
2.
J. Schmidhuber.
Dynamische neuronale Netze und das fundamentale raumzeitliche
Lernproblem. (341 K),
(Dynamic neural nets and the fundamental spatiotemporal
credit assignment problem.) Dissertation,
Institut für Informatik, Technische
Universität München, 1990.
PDF .
HTML.
1.
J. Schmidhuber.
A local learning algorithm for dynamic feedforward and
recurrent networks.
Connection Science, 1(4):403412, 1989.
(The Neural Bucket Brigade  figures omitted!).
PDF.
HTML.
