Bibliography

Next: About this document ... Up: Gradient Flow in Recurrent Previous: Conclusions

Bibliography

1: P. J. Angeline, G. M. Saunders, and J. P. Pollack.
An evolutionary algorithm that constructs recurrent neural networks.
IEEE Transactions on Neural Networks, 5(1):54-65, 1994.
2: P. Baldi and F. Pineda.
Contrastive learning and neural oscillator.
Neural Computation, 3:526-545, 1991.
3: Y. Bengio.
Markovian models for sequential data.
Neural Computing Surveys, 2:129-162, 1999.
4: Y. Bengio and P. Frasconi.
Credit assignment through time: Alternatives to backpropagation.
In J. D. Cowan, G. Tesauro, and J. Alspector, editors, Advances in Neural Information Processing Systems 6, pages 75-82. San Mateo, CA: Morgan Kaufmann, 1994.
5: Y. Bengio and P. Frasconi.
Diffusion of context and credit information in Markovian models.
Journal of Artificial Intelligence Research, 3:249-270, 1995.
6: Y. Bengio, P. Simard, and P. Frasconi.
Learning long-term dependencies with gradient descent is difficult.
IEEE Transactions on Neural Networks, 5(2):157-166, 1994.
7: B. de Vries and J. C. Principe.
A theory for neural networks with time delays.
In R. P. Lippmann, J. E. Moody, and D. S. Touretzky, editors, Advances in Neural Information Processing Systems 3, pages 162-168. San Mateo, CA: Morgan Kaufmann, 1991.
8: K. Doya.
Bifurcations in the learning of recurrent neural networks.
In Proceedings of 1992 IEEE International Symposium on Circuits and Systems, pages 2777-2780, 1992.
9: S. El Hihi and Y. Bengio.
Hierarchical recurrent neural networks for long-term dependencies.
In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems 8, pages 493-499. MIT Press, Cambridge MA, 1996.
10: F. A. Gers, J. Schmidhuber, and F. Cummins.
Learning to forget: Continual prediction with LSTM.
In Proc. ICANN'99, Int. Conf. on Artificial Neural Networks, pages 850-855, Edinburgh, Scotland, 1999. IEE, London.
11: S. Hochreiter.
Untersuchungen zu dynamischen neuronalen Netzen. Diploma thesis, Institut für Informatik, Lehrstuhl Prof. Brauer, Technische Universität München, 1991.
See www7.informatik.tu-muenchen.de/~hochreit.
12: S. Hochreiter and J. Schmidhuber.
Long short-term memory.
Neural Computation, 9(8):1735-1780, 1997.
13: S. Hochreiter and J. Schmidhuber.
LSTM can solve hard long time lag problems.
In M. C. Mozer, M. I. Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems 9, pages 473-479. MIT Press, 1997.
14: K. Lang, A. Waibel, and G. E. Hinton.
A time-delay neural network architecture for isolated word recognition.
Neural Networks, 3:23-43, 1990.
15: T. Lin, B. G. Horne, and C. L. Giles.
How embedded memory in recurrent neural network architectures helps learning long-term temporal dependencies.
Neural Networks, 11(5):861-868, 1998.
16: T. Lin, B. G. Horne, P. Tiño, and C. L. Giles.
Learning long-term dependencies in NARX recurrent neural networks.
IEEE Transactions on Neural Networks, 7(6):1329-1338, November 1996.
17: M. C. Mozer.
Induction of multiscale temporal structure.
In D. S. Lippman, J. E. Moody, and D. S. Touretzky, editors, Advances in Neural Information Processing Systems 4, pages 275-282. San Mateo, CA: Morgan Kaufmann, 1992.
18: J.M. Ortega and W.C. Rheinboldt.
Iterative Solution of Non-linear Equations in Several Variables and Systems.
Academic Press, New York, 1970.
19: F. J. Pineda.
Dynamics and architecture for neural computation.
Journal of Complexity, 4:216-245, 1988.
20: M. B. Ring.
Learning sequential tasks by incrementally adding higher orders.
In J. D. Cowan S. J. Hanson and C. L. Giles, editors, Advances in Neural Information Processing Systems 5, pages 115-122. Morgan Kaufmann, 1993.
21: A. J. Robinson and F. Fallside.
The utility driven dynamic error propagation network.
Technical Report CUED/F-INFENG/TR.1, Cambridge University Engineering Department, 1987.
22: D. E. Rumelhart, G. E. Hinton, and R. J. Williams.
Learning internal representations by error propagation.
In Parallel Distributed Processing, volume 1, pages 318-362. MIT Press, 1986.
23: J. Schmidhuber.
Learning complex, extended sequences using the principle of history compression.
Neural Computation, 4(2):234-242, 1992.
24: J. Schmidhuber.
Netzwerkarchitekturen, Zielfunktionen und Kettenregel. Habilitationsschrift, Institut für Informatik, Technische Universität München, 1993.
25: G. Sun, H. Chen, and Y. Lee.
Time warping invariant neural networks.
In J. D. Cowan S. J. Hanson and C. L. Giles, editors, Advances in Neural Information Processing Systems 5, pages 180-187. San Mateo, CA: Morgan Kaufmann, 1993.
26: P. J. Werbos.
Generalization of backpropagation with application to a recurrent gas market model.
Neural Networks, 1, 1988.
27: R. J. Williams and D. Zipser.
Gradient-based learning algorithms for recurrent networks and their computational complexity.
In Back-propagation: Theory, Architectures and Applications. Hillsdale, NJ: Erlbaum, 1992.

Juergen Schmidhuber 2003-02-19