next up previous
Next: About this document ... Up: LSTM CAN SOLVE HARD Previous: ACKNOWLEDGMENTS


Y. Bengio and P. Frasconi.
Credit assignment through time: Alternatives to backpropagation.
In J. D. Cowan, G. Tesauro, and J. Alspector, editors, Advances in Neural Information Processing Systems 6, pages 75-82. Morgan Kaufmann, 1994.

Y. Bengio and P. Frasconi.
An input output HMM architecture.
In G. Tesauro, D. S. Touretzky, and T. K. Leen, editors, Advances in Neural Information Processing Systems 7, pages 427-434. MIT Press, 1995.

Y. Bengio, P. Simard, and P. Frasconi.
Learning long-term dependencies with gradient descent is difficult.
IEEE Transactions on Neural Networks, 5(2):157-166, 1994.

A. Cleeremans, D. Servan-Schreiber, and J. L. McClelland.
Finite-state automata and simple recurrent networks.
Neural Computation, 1:372-381, 1989.

S. E. Fahlman.
The recurrent cascade-correlation learning algorithm.
In R. P. Lippmann, J. E. Moody, and D. S. Touretzky, editors, Advances in Neural Information Processing Systems 3, pages 190-196. Morgan Kaufmann, 1991.

S. E. Hihi and Y. Bengio.
Hierarchical recurrent neural networks for long-term dependencies.
In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems 8, pages 493-499. MIT Press, 1996.

S. Hochreiter.
Untersuchungen zu dynamischen neuronalen Netzen. Diploma thesis, Institut für Informatik, Lehrstuhl Prof. Brauer, Technische Universität München, 1991.

S. Hochreiter and J. Schmidhuber.
Long short-term memory.
Technical Report FKI-207-95, Fakultät für Informatik, Technische Universität München, 1995.
Revised 1996 (see,

T. Lin, B. G. Horne, P. Tino, and C. L. Giles.
Learning long-term dependencies is not as difficult with NARX recurrent neural networks.
Technical Report UMIACS-TR-95-78 and CS-TR-3500, Institute for Advanced Computer Studies, University of Maryland, College Park, MD 20742, 1995.

P. Manolios and R. Fanelli.
First-order recurrent neural networks and deterministic finite state automata.
Neural Computation, 6:1155-1173, 1994.

C. B. Miller and C. L. Giles.
Experimental comparison of the effect of order in recurrent neural networks.
International Journal of Pattern Recognition and Artificial Intelligence, 7(4):849-872, 1993.

M. C. Mozer.
Induction of multiscale temporal structure.
In D. S. Lippman, J. E. Moody, and D. S. Touretzky, editors, Advances in Neural Information Processing Systems 4, pages 275-282. Morgan Kaufmann, 1992.

B. A. Pearlmutter.
Gradient calculations for dynamic recurrent neural networks: A survey.
IEEE Transactions on Neural Networks, 6(5):1212-1228, 1995.

J. B. Pollack.
The induction of dynamical recognizers.
Machine Learning, 7:227-252, 1991.

A. J. Robinson and F. Fallside.
The utility driven dynamic error propagation network.
Technical Report CUED/F-INFENG/TR.1, Cambridge University Engineering Department, 1987.

J. Schmidhuber.
Learning complex, extended sequences using the principle of history compression.
Neural Computation, 4(2):234-242, 1992.

A. W. Smith and D. Zipser.
Learning sequential structures with the real-time recurrent learning algorithm.
International Journal of Neural Systems, 1(2):125-131, 1989.

M. Tomita.
Dynamic construction of finite automata from examples using hill-climbing.
In Proceedings of the Fourth Annual Cognitive Science Conference, pages 105-108. Ann Arbor, MI, 1982.

R. L. Watrous and G. M. Kuhn.
Induction of finite-state automata using second-order recurrent networks.
In J. E. Moody, S. J. Hanson, and R. P. Lippman, editors, Advances in Neural Information Processing Systems 4, pages 309-316. Morgan Kaufmann, 1992.

R. J. Williams and J. Peng.
An efficient gradient-based algorithm for on-line training of recurrent network trajectories.
Neural Computation, 4:491-501, 1990.

Juergen Schmidhuber 2003-02-25

Back to Recurrent Neural Networks page