next up previous
Next: About this document ... Up: CURIOUS MODEL-BUILDING CONTROL SYSTEMS Previous: ACKNOWLEDGEMENTS

Bibliography

1
A. G. Barto, R. S. Sutton, and C. W. Anderson.
Neuronlike adaptive elements that can solve difficult learning control problems.
IEEE Transactions on Systems, Man, and Cybernetics, SMC-13:834-846, 1983.

2
M. I. Jordan and D. E. Rumelhart.
Supervised learning with a distal teacher.
Technical Report Occasional Paper #40, Center for Cog. Sci., Massachusetts Institute of Technology, 1990.

3
Nguyen and B. Widrow.
The truck backer-upper: An example of self learning in neural networks.
In IEEE/INNS International Joint Conference on Neural Networks, Washington, D.C., volume 1, pages 357-364, 1989.

4
J. H. Schmidhuber.
Dynamische neuronale Netze und das fundamentale raumzeitliche Lernproblem. Dissertation, Institut für Informatik, Technische Universität München, 1990.

5
J. H. Schmidhuber.
Talk at the NIPS'90 workshop on dynamic networks led by R. Rohwer, 1990.

6
J. H. Schmidhuber.
Adaptive curiosity and adaptive confidence.
Technical Report FKI-149-91, Institut für Informatik, Technische Universität München, April 1991.

7
J. H. Schmidhuber.
Adaptive decomposition of time.
In T. Kohonen, K. Mäkisara, O. Simula, and J. Kangas, editors, Artificial Neural Networks, pages 909-914. Elsevier Science Publishers B.V., North-Holland, 1991.

8
J. H. Schmidhuber.
A possibility for implementing curiosity and boredom in model-building neural controllers.
In J. A. Meyer and S. W. Wilson, editors, Proc. of the International Conference on Simulation of Adaptive Behavior: From Animals to Animats, pages 222-227. MIT Press/Bradford Books, 1991.

9
J. H. Schmidhuber.
Reinforcement learning in markovian and non-markovian environments.
In D. S. Lippman, J. E. Moody, and D. S. Touretzky, editors, Advances in Neural Information Processing Systems 3, pages 500-506. San Mateo, CA: Morgan Kaufmann, 1991.

10
J. H. Schmidhuber and R. Huber.
Learning to generate artificial fovea trajectories for target detection.
International Journal of Neural Systems, 2(1 & 2):135-141, 1991.

11
R. S. Sutton.
First results with DYNA, an integrated architecture for learning, planning and reacting.
In Proceedings of the AAAI Spring Symposium on Planning in Uncertain, Unpredictable, or Changing Environments, 1990.

12
S. Thrun and K. Möller.
On planning and exploration in non-discrete environments.
Technical report, Gesellschaft für Mathematik und Datenverarbeitung, D-5205 St. Augustin, Germany, March 1991.

13
C. Watkins.
Learning from Delayed Rewards.
PhD thesis, King's College, 1989.

14
P. J. Werbos.
Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences.
PhD thesis, Harvard University, 1974.

15
P. J. Werbos.
Building and understanding adaptive systems: A statistical/numerical approach to factory automation and brain research.
IEEE Transactions on Systems, Man, and Cybernetics, 17, 1987.

16
R. J. Williams.
Toward a theory of reinforcement-learning connectionist systems.
Technical Report NU-CCS-88-3, College of Comp. Sci., Northeastern University, Boston, MA, 1988.


Juergen Schmidhuber 2003-02-28


Back to Active Learning - Exploration - Curiosity page
Back to Reinforcement Learning page