Next: About this document ...
Up: Reinforcement Learning in Markovian
Previous: Acknowledgements
- 1
-
C. W. Anderson.
Learning and Problem Solving with Multilayer Connectionist
Systems.
PhD thesis, University of Massachusetts, Dept. of Comp. and Inf.
Sci., 1986.
- 2
-
M. I. Jordan.
Supervised learning and systems with excess degrees of freedom.
Technical Report COINS TR 88-27, Massachusetts Institute of
Technology, 1988.
- 3
-
M. I. Jordan and R. A. Jacobs.
Learning to control an unstable system with forward modeling.
In Proc. of the 1990 Connectionist Models Summer School, in
press. Morgan Kaufmann, 1990.
- 4
-
S. W. Piché.
Draft: First order gradient descent training of adaptive discrete
time dynamic networks.
Technical report, Dept. of Electrical Engineering, Stanford
University, 1990.
- 5
-
A. J. Robinson and F. Fallside.
The utility driven dynamic error propagation network.
Technical Report CUED/F-INFENG/TR.1, Cambridge University Engineering
Department, 1987.
- 6
-
T. Robinson and F. Fallside.
Dynamic reinforcement driven error propagation networks with
application to game playing.
In Proceedings of the 11th Conference of the Cognitive Science
Society, Ann Arbor, pages 836-843, 1989.
- 7
-
J. Schmidhuber.
Making the world differentiable: On using fully recurrent
self-supervised neural networks for dynamic reinforcement learning and
planning in non-stationary environments.
Technical Report FKI-126-90 (revised), Institut für Informatik,
Technische Universität München, November 1990.
(Revised and extended version of an earlier report from February.).
- 8
-
J. Schmidhuber.
Networks adjusting networks.
In J. Kindermann and A. Linden, editors, Proceedings of
`Distributed Adaptive Neural Information Processing', St.Augustin, 24.-25.5.
1989, pages 197-208. Oldenbourg, 1990.
In November 1990 a revised and extended version appeared as
FKI-Report FKI-125-90 (revised) at the Institut für Informatik,
Technische Universität München.
- 9
-
J. Schmidhuber.
Towards compositional learning with dynamic neural networks.
Technical Report FKI-129-90, Institut für Informatik, Technische
Universität München, 1990.
- 10
-
R. S. Sutton.
Learning to predict by the methods of temporal differences.
Machine Learning, 3:9-44, 1988.
- 11
-
P. J. Werbos.
Building and understanding adaptive systems: A statistical/numerical
approach to factory automation and brain research.
IEEE Transactions on Systems, Man, and Cybernetics, 17, 1987.
- 12
-
R. J. Williams.
On the use of backpropagation in associative reinforcement learning.
In IEEE International Conference on Neural Networks, San Diego,
volume 2, pages 263-270, 1988.
- 13
-
R. J. Williams and D. Zipser.
Experimental analysis of the real-time recurrent learning algorithm.
Connection Science, 1(1):87-111, 1989.
Juergen Schmidhuber
2003-02-25
Back to Reinforcement Learning