Bibliography

Next: About this document ... Up: Reinforcement Learning in Markovian Previous: Acknowledgements

Bibliography

1: C. W. Anderson.
Learning and Problem Solving with Multilayer Connectionist Systems.
PhD thesis, University of Massachusetts, Dept. of Comp. and Inf. Sci., 1986.
2: M. I. Jordan.
Supervised learning and systems with excess degrees of freedom.
Technical Report COINS TR 88-27, Massachusetts Institute of Technology, 1988.
3: M. I. Jordan and R. A. Jacobs.
Learning to control an unstable system with forward modeling.
In Proc. of the 1990 Connectionist Models Summer School, in press. Morgan Kaufmann, 1990.
4: S. W. Piché.
Draft: First order gradient descent training of adaptive discrete time dynamic networks.
Technical report, Dept. of Electrical Engineering, Stanford University, 1990.
5: A. J. Robinson and F. Fallside.
The utility driven dynamic error propagation network.
Technical Report CUED/F-INFENG/TR.1, Cambridge University Engineering Department, 1987.
6: T. Robinson and F. Fallside.
Dynamic reinforcement driven error propagation networks with application to game playing.
In Proceedings of the 11th Conference of the Cognitive Science Society, Ann Arbor, pages 836-843, 1989.
7: J. Schmidhuber.
Making the world differentiable: On using fully recurrent self-supervised neural networks for dynamic reinforcement learning and planning in non-stationary environments.
Technical Report FKI-126-90 (revised), Institut für Informatik, Technische Universität München, November 1990.
(Revised and extended version of an earlier report from February.).
8: J. Schmidhuber.
Networks adjusting networks.
In J. Kindermann and A. Linden, editors, Proceedings of `Distributed Adaptive Neural Information Processing', St.Augustin, 24.-25.5. 1989, pages 197-208. Oldenbourg, 1990.
In November 1990 a revised and extended version appeared as FKI-Report FKI-125-90 (revised) at the Institut für Informatik, Technische Universität München.
9: J. Schmidhuber.
Towards compositional learning with dynamic neural networks.
Technical Report FKI-129-90, Institut für Informatik, Technische Universität München, 1990.
10: R. S. Sutton.
Learning to predict by the methods of temporal differences.
Machine Learning, 3:9-44, 1988.
11: P. J. Werbos.
Building and understanding adaptive systems: A statistical/numerical approach to factory automation and brain research.
IEEE Transactions on Systems, Man, and Cybernetics, 17, 1987.
12: R. J. Williams.
On the use of backpropagation in associative reinforcement learning.
In IEEE International Conference on Neural Networks, San Diego, volume 2, pages 263-270, 1988.
13: R. J. Williams and D. Zipser.
Experimental analysis of the real-time recurrent learning algorithm.
Connection Science, 1(1):87-111, 1989.

Juergen Schmidhuber 2003-02-25

Back to Reinforcement Learning POMDP page