Next: About this document ...
Up: Gradient Flow in Recurrent
Previous: Conclusions
- 1
-
P. J. Angeline, G. M. Saunders, and J. P. Pollack.
An evolutionary algorithm that constructs recurrent neural networks.
IEEE Transactions on Neural Networks, 5(1):54-65, 1994.
- 2
-
P. Baldi and F. Pineda.
Contrastive learning and neural oscillator.
Neural Computation, 3:526-545, 1991.
- 3
-
Y. Bengio.
Markovian models for sequential data.
Neural Computing Surveys, 2:129-162, 1999.
- 4
-
Y. Bengio and P. Frasconi.
Credit assignment through time: Alternatives to backpropagation.
In J. D. Cowan, G. Tesauro, and J. Alspector, editors, Advances
in Neural Information Processing Systems 6, pages 75-82. San Mateo, CA:
Morgan Kaufmann, 1994.
- 5
-
Y. Bengio and P. Frasconi.
Diffusion of context and credit information in Markovian models.
Journal of Artificial Intelligence Research, 3:249-270, 1995.
- 6
-
Y. Bengio, P. Simard, and P. Frasconi.
Learning long-term dependencies with gradient descent is difficult.
IEEE Transactions on Neural Networks, 5(2):157-166, 1994.
- 7
-
B. de Vries and J. C. Principe.
A theory for neural networks with time delays.
In R. P. Lippmann, J. E. Moody, and D. S. Touretzky, editors, Advances in Neural Information Processing Systems 3, pages 162-168. San
Mateo, CA: Morgan Kaufmann, 1991.
- 8
-
K. Doya.
Bifurcations in the learning of recurrent neural networks.
In Proceedings of 1992 IEEE International Symposium on Circuits
and Systems, pages 2777-2780, 1992.
- 9
-
S. El Hihi and Y. Bengio.
Hierarchical recurrent neural networks for long-term dependencies.
In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems 8, pages 493-499. MIT
Press, Cambridge MA, 1996.
- 10
-
F. A. Gers, J. Schmidhuber, and F. Cummins.
Learning to forget: Continual prediction with LSTM.
In Proc. ICANN'99, Int. Conf. on Artificial Neural Networks,
pages 850-855, Edinburgh, Scotland, 1999. IEE, London.
- 11
-
S. Hochreiter.
Untersuchungen zu dynamischen neuronalen Netzen. Diploma thesis,
Institut für Informatik, Lehrstuhl Prof. Brauer, Technische
Universität München, 1991.
See www7.informatik.tu-muenchen.de/~hochreit.
- 12
-
S. Hochreiter and J. Schmidhuber.
Long short-term memory.
Neural Computation, 9(8):1735-1780, 1997.
- 13
-
S. Hochreiter and J. Schmidhuber.
LSTM can solve hard long time lag problems.
In M. C. Mozer, M. I. Jordan, and T. Petsche, editors, Advances
in Neural Information Processing Systems 9, pages 473-479. MIT Press, 1997.
- 14
-
K. Lang, A. Waibel, and G. E. Hinton.
A time-delay neural network architecture for isolated word
recognition.
Neural Networks, 3:23-43, 1990.
- 15
-
T. Lin, B. G. Horne, and C. L. Giles.
How embedded memory in recurrent neural network architectures helps
learning long-term temporal dependencies.
Neural Networks, 11(5):861-868, 1998.
- 16
-
T. Lin, B. G. Horne, P. Tiño, and C. L. Giles.
Learning long-term dependencies in NARX recurrent neural networks.
IEEE Transactions on Neural Networks, 7(6):1329-1338, November
1996.
- 17
-
M. C. Mozer.
Induction of multiscale temporal structure.
In D. S. Lippman, J. E. Moody, and D. S. Touretzky, editors, Advances in Neural Information Processing Systems 4, pages 275-282. San
Mateo, CA: Morgan Kaufmann, 1992.
- 18
-
J.M. Ortega and W.C. Rheinboldt.
Iterative Solution of Non-linear Equations in Several Variables
and Systems.
Academic Press, New York, 1970.
- 19
-
F. J. Pineda.
Dynamics and architecture for neural computation.
Journal of Complexity, 4:216-245, 1988.
- 20
-
M. B. Ring.
Learning sequential tasks by incrementally adding higher orders.
In J. D. Cowan S. J. Hanson and C. L. Giles, editors, Advances
in Neural Information Processing Systems 5, pages 115-122. Morgan Kaufmann,
1993.
- 21
-
A. J. Robinson and F. Fallside.
The utility driven dynamic error propagation network.
Technical Report CUED/F-INFENG/TR.1, Cambridge University Engineering
Department, 1987.
- 22
-
D. E. Rumelhart, G. E. Hinton, and R. J. Williams.
Learning internal representations by error propagation.
In Parallel Distributed Processing, volume 1, pages 318-362.
MIT Press, 1986.
- 23
-
J. Schmidhuber.
Learning complex, extended sequences using the principle of history
compression.
Neural Computation, 4(2):234-242, 1992.
- 24
-
J. Schmidhuber.
Netzwerkarchitekturen, Zielfunktionen und Kettenregel.
Habilitationsschrift, Institut für Informatik, Technische
Universität München, 1993.
- 25
-
G. Sun, H. Chen, and Y. Lee.
Time warping invariant neural networks.
In J. D. Cowan S. J. Hanson and C. L. Giles, editors, Advances
in Neural Information Processing Systems 5, pages 180-187. San Mateo, CA:
Morgan Kaufmann, 1993.
- 26
-
P. J. Werbos.
Generalization of backpropagation with application to a recurrent gas
market model.
Neural Networks, 1, 1988.
- 27
-
R. J. Williams and D. Zipser.
Gradient-based learning algorithms for recurrent networks and their
computational complexity.
In Back-propagation: Theory, Architectures and Applications.
Hillsdale, NJ: Erlbaum, 1992.
Juergen Schmidhuber
2003-02-19