next up previous
Next: About this document ... Up: Sequential Decision Making Based Previous: Acknowledgments

Bibliography

AndreAndre1998
Andre, D. 1998.
Learning hierarchical behaviors
In NIPS'98 Workshop on Abstraction and Hierarchy in Reinforcement Learning.

Banzhaf, Nordin, Keller, FranconeBanzhaf et al.1998
Banzhaf, W., Nordin, P., Keller, R. E., Francone, F. D. 1998.
Genetic Programming - An Introduction.
Morgan Kaufmann Publishers, San Francisco, CA, USA.

Barto, Sutton, AndersonBarto et al.1983
Barto, A. G., Sutton, R. S., Anderson, C. W. 1983.
Neuronlike adaptive elements that can solve difficult learning control problems
IEEE Transactions on Systems, Man, and Cybernetics, SMC-13, 834-846.

Baum DurdanovicBaum Durdanovic1998
Baum, E. B. Durdanovic, I. 1998.
Toward code evolution by artificial economies
, NEC Research Institute, Princeton, NJ.
Extension of a paper in Proc. 13th ICML'1996, Morgan Kaufmann, CA.

BellmanBellman1961
Bellman, R. 1961.
Adaptive Control Processes.
Princeton University Press.

Bertsekas TsitsiklisBertsekas Tsitsiklis1996
Bertsekas, D. P. Tsitsiklis, J. N. 1996.
Neuro-dynamic Programming.
Athena Scientific, Belmont, MA.

Bowling VelosoBowling Veloso1998
Bowling, M. Veloso, M. 1998.
Bounding the suboptimality of reusing subproblems
In NIPS'98 Workshop on Abstraction and Hierarchy in Reinforcement Learning.

ChaitinChaitin1969
Chaitin, G. 1969.
On the length of programs for computing finite binary sequences: statistical considerations
Journal of the ACM, 16, 145-159.

Coelho GrupenCoelho Grupen1998
Coelho, J. Grupen, R. A. 1998.
Control abstractions as state representation
In NIPS'98 Workshop on Abstraction and Hierarchy in Reinforcement Learning.

CohnCohn1994
Cohn, D. A. 1994.
Neural network exploration using optimal experiment design
In Cowan, J., Tesauro, G., Alspector, J., Advances in Neural Information Processing Systems 6, 679-686. San Mateo, CA: Morgan Kaufmann.

CramerCramer1985
Cramer, N. L. 1985.
A representation for the adaptive generation of simple sequential programs
In Grefenstette, J., Proceedings of an International Conference on Genetic Algorithms and Their Applications Hillsdale NJ. Lawrence Erlbaum Associates.

Dayan HintonDayan Hinton1993
Dayan, P. Hinton, G. 1993.
Feudal reinforcement learning
In Lippman, D. S., Moody, J. E., Touretzky, D. S., Advances in Neural Information Processing Systems 5, 271-278. San Mateo, CA: Morgan Kaufmann.

Dayan SejnowskiDayan Sejnowski1996
Dayan, P. Sejnowski, T. J. 1996.
Exloration bonuses and dual control
Machine Learning, 25, 5-22.

Dickmanns, Schmidhuber, WinklhoferDickmanns et al.1987
Dickmanns, D., Schmidhuber, J., Winklhofer, A. 1987.
Der genetische Algorithmus: Eine Implementierung in Prolog. Fortgeschrittenenpraktikum, Institut für Informatik, Lehrstuhl Prof. Radig, Technische Universität München.

DigneyDigney1996
Digney, B. 1996.
Emergent hierarchical control structures: Learning reactive/hierarchical relationships in reinforcement environments
In Maes, P., Mataric, M., Meyer, J.-A., Pollack, J., Wilson, S. W., From Animals to Animats 4: Proceedings of the Fourth International Conference on Simulation of Adaptive Behavior, Cambridge, MA, 363-372. MIT Press, Bradford Books.

Eldracher BaginskiEldracher Baginski1993
Eldracher, M. Baginski, B. 1993.
Neural subgoal generation using backpropagation
In Lendaris, G. G., Grossberg, S., Kosko, B., World Congress on Neural Networks, III-145-III-148. Lawrence Erlbaum Associates, Inc., Publishers, Hillsdale.

FedorovFedorov1972
Fedorov, V. V. 1972.
Theory of optimal experiments.
Academic Press.

GittinsGittins1989
Gittins, J. C. 1989.
Multi-armed Bandit Allocation Indices.
Wiley-Interscience series in systems and optimization. Wiley, Chichester, NY.

Harada RussellHarada Russell1998
Harada, D. Russell, S. 1998.
Meta-level reinforcement learning
In NIPS'98 Workshop on Abstraction and Hierarchy in Reinforcement Learning.

Hochreiter SchmidhuberHochreiter Schmidhuber1997
Hochreiter, S. Schmidhuber, J. 1997.
LSTM can solve hard long time lag problems
In Mozer, M. C., Jordan, M. I., Petsche, T., Advances in Neural Information Processing Systems 9, 473-479. MIT Press, Cambridge MA.

HollandHolland1975
Holland, J. H. 1975.
Adaptation in Natural and Artificial Systems.
University of Michigan Press, Ann Arbor.

HollandHolland1985
Holland, J. H. 1985.
Properties of the bucket brigade
In Proceedings of an International Conference on Genetic Algorithms. Hillsdale, NJ.

Huber GrupenHuber Grupen1998
Huber, M. Grupen, R. A. 1998.
Learning robot control using control policies as abstract actions
In NIPS'98 Workshop on Abstraction and Hierarchy in Reinforcement Learning.

HumphrysHumphrys1996
Humphrys, M. 1996.
Action selection methods using reinforcement learning
In Maes, P., Mataric, M., Meyer, J.-A., Pollack, J., Wilson, S. W., From Animals to Animats 4: Proceedings of the Fourth International Conference on Simulation of Adaptive Behavior, Cambridge, MA, 135-144. MIT Press, Bradford Books.

Hwang, Choi, Oh, IIHwang et al.1991
Hwang, J., Choi, J., Oh, S., II, R. J. M. 1991.
Query-based learning applied to partially trained multilayer perceptrons
IEEE Transactions on Neural Networks, 2(1), 131-136.

Jaakkola, Singh, JordanJaakkola et al.1995
Jaakkola, T., Singh, S. P., Jordan, M. I. 1995.
Reinforcement learning algorithm for partially observable Markov decision problems
In Tesauro, G., Touretzky, D. S., Leen, T. K., Advances in Neural Information Processing Systems 7, 345-352. MIT Press, Cambridge MA.

Juels WattenbergJuels Wattenberg1996
Juels, A. Wattenberg, M. 1996.
Stochastic hillclimbing as a baseline method for evaluating genetic algorithms
In Touretzky, D. S., Mozer, M. C., Hasselmo, M. E., Advances in Neural Information Processing Systems,  8, 430-436. The MIT Press, Cambridge, MA.

KaelblingKaelbling1993
Kaelbling, L. 1993.
Learning in Embedded Systems.
MIT Press.

Kaelbling, Littman, CassandraKaelbling et al.1995
Kaelbling, L., Littman, M., Cassandra, A. 1995.
Planning and acting in partially observable stochastic domains
, Brown University, Providence RI.

Kearns SinghKearns Singh1999
Kearns, M. Singh, S. 1999.
Finite-sample convergence rates for Q-learning and indirect algorithms
In Kearns, M., Solla, S. A., Cohn, D., Advances in Neural Information Processing Systems 12. MIT Press, Cambridge MA.

KirchnerKirchner1998
Kirchner, F. 1998.
Q-learning of complex behaviors on a six-legged walking machine
In NIPS'98 Workshop on Abstraction and Hierarchy in Reinforcement Learning.

Koenig SimmonsKoenig Simmons1996
Koenig, S. Simmons, R. G. 1996.
The effect of representation and knowedge on goal-directed exploration with reinforcement learnign algorithm
Machine Learning, 22, 228-250.

KolmogorovKolmogorov1965
Kolmogorov, A. 1965.
Three approaches to the quantitative definition of information
Problems of Information Transmission, 1, 1-11.

Koumoutsakos P. D.Koumoutsakos P. D.1998
Koumoutsakos P., F. J. D., P. 1998.
Evolution strategies for parameter optimization in jet flow control
Center for Turbulence Research - Proceedings of the Summer program 1998, 10, 121-132.

LenatLenat1983
Lenat, D. 1983.
Theory formation by heuristic search
Machine Learning, 21.

LevinLevin1973
Levin, L. A. 1973.
Universal sequential search problems
Problems of Information Transmission, 9(3), 265-266.

LevinLevin1984
Levin, L. A. 1984.
Randomness conservation inequalities: Information and independence in mathematical theories
Information and Control, 61, 15-37.

Li VitányiLi Vitányi1993
Li, M. Vitányi, P. M. B. 1993.
An Introduction to Kolmogorov Complexity and its Applications.
Springer.

LinLin1993
Lin, L. 1993.
Reinforcement Learning for Robots Using Neural Networks.
Ph.D. thesis, Carnegie Mellon University, Pittsburgh.

LittmanLittman1996
Littman, M. 1996.
Algorithms for Sequential Decision Making.
Ph.D. thesis, Brown University.

Littman, Cassandra, KaelblingLittman et al.1995
Littman, M., Cassandra, A., Kaelbling, L. 1995.
Learning policies for partially observable environments: Scaling up
In Prieditis, A. Russell, S., Machine Learning: Proceedings of the Twelfth International Conference, 362-370. Morgan Kaufmann Publishers, San Francisco, CA.

MacKayMacKay1992
MacKay, D. J. C. 1992.
Information-based objective functions for active data selection
Neural Computation, 4(2), 550-604.

McCallumMcCallum1996
McCallum, R. A. 1996.
Learning to use selective attention and short-term memory in sequential tasks
In Maes, P., Mataric, M., Meyer, J.-A., Pollack, J., Wilson, S. W., From Animals to Animats 4: Proceedings of the Fourth International Conference on Simulation of Adaptive Behavior, Cambridge, MA, 315-324. MIT Press, Bradford Books.

McGovernMcGovern1998
McGovern, A. 1998.
acquire-macros: An algorithm for automatically learning macro-action
In NIPS'98 Workshop on Abstraction and Hierarchy in Reinforcement Learning.

Moore AtkesonMoore Atkeson1993
Moore, A. Atkeson, C. G. 1993.
Prioritized sweeping: Reinforcement learning with less data and less time
Machine Learning, 13, 103-130.

Moore, Baird, KaelblingMoore et al.1998
Moore, A. W., Baird, L., Kaelbling, L. P. 1998.
Multi-value-functions: Efficient automatic action hierarchies for multiple goal mdps
In NIPS'98 Workshop on Abstraction and Hierarchy in Reinforcement Learning.

Plutowski, Cottrell, WhitePlutowski et al.1994
Plutowski, M., Cottrell, G., White, H. 1994.
Learning Mackey-Glass from 25 examples, plus or minus 2
In Cowan, J., Tesauro, G., Alspector, J., Advances in Neural Information Processing Systems 6, 1135-1142. San Mateo, CA: Morgan Kaufmann.

RayRay1992
Ray, T. S. 1992.
An approach to the synthesis of life
In Langton, C., Taylor, C., Farmer, J. D., Rasmussen, S., Artificial Life II, 371-408. Addison Wesley Publishing Company.

RechenbergRechenberg1971
Rechenberg, I. 1971.
Evolutionsstrategie - Optimierung technischer Systeme nach Prinzipien der biologischen Evolution. Dissertation.
Published 1973 by Fromman-Holzboog.

RingRing1991
Ring, M. B. 1991.
Incremental development of complex behaviors through automatic construction of sensory-motor hierarchies
In Birnbaum, L. Collins, G., Machine Learning: Proceedings of the Eighth International Workshop, 343-347. Morgan Kaufmann.

RingRing1993
Ring, M. B. 1993.
Learning sequential tasks by incrementally adding higher orders
In S. J. Hanson, J. D. C. Giles, C. L., Advances in Neural Information Processing Systems 5, 115-122. Morgan Kaufmann.

RingRing1994
Ring, M. B. 1994.
Continual Learning in Reinforcement Environments.
Ph.D. thesis, University of Texas at Austin, Austin, Texas 78712.

Sa\lustowicz SchmidhuberSa\lustowicz Schmidhuber1997
Sa\lustowicz, R. P. Schmidhuber, J. 1997.
Probabilistic incremental program evolution
Evolutionary Computation, 5(2), 123-141.

SamuelSamuel1959
Samuel, A. L. 1959.
Some studies in machine learning using the game of checkers
IBM Journal on Research and Development, 3, 210-229.

SchmidhuberSchmidhuber1987
Schmidhuber, J. 1987.
Evolutionary principles in self-referential learning, or on learning how to learn: the meta-meta-... hook. Institut für Informatik, Technische Universität München.

SchmidhuberSchmidhuber1989
Schmidhuber, J. 1989.
A local learning algorithm for dynamic feedforward and recurrent networks
Connection Science, 1(4), 403-412.

SchmidhuberSchmidhuber1991a
Schmidhuber, J. 1991a.
Curious model-building control systems
In Proc. International Joint Conference on Neural Networks, Singapore,  2, 1458-1463. IEEE.

SchmidhuberSchmidhuber1991b
Schmidhuber, J. 1991b.
Learning to generate sub-goals for action sequences
In Kohonen, T., Mäkisara, K., Simula, O., Kangas, J., Artificial Neural Networks, 967-972. Elsevier Science Publishers B.V., North-Holland.

SchmidhuberSchmidhuber1991c
Schmidhuber, J. 1991c.
Reinforcement learning in Markovian and non-Markovian environments
In Lippman, D. S., Moody, J. E., Touretzky, D. S., Advances in Neural Information Processing Systems 3, 500-506. San Mateo, CA: Morgan Kaufmann.

SchmidhuberSchmidhuber1995
Schmidhuber, J. 1995.
Discovering solutions with low Kolmogorov complexity and high generalization capability
In Prieditis, A. Russell, S., Machine Learning: Proceedings of the Twelfth International Conference, 488-496. Morgan Kaufmann Publishers, San Francisco, CA.

SchmidhuberSchmidhuber1997
Schmidhuber, J. 1997.
Discovering neural nets with low Kolmogorov complexity and high generalization capability
Neural Networks, 10(5), 857-873.

SchmidhuberSchmidhuber1999
Schmidhuber, J. 1999.
Artificial curiosity based on discovering novel algorithmic predictability through coevolution
In Angeline, P., Michalewicz, Z., Schoenauer, M., Yao, X., Zalzala, Z., Congress on Evolutionary Computation, 1612-1618. IEEE Press, Piscataway, NJ.

Schmidhuber PrelingerSchmidhuber Prelinger1993
Schmidhuber, J. Prelinger, D. 1993.
Discovering predictable classifications
Neural Computation, 5(4), 625-635.

Schmidhuber ZhaoSchmidhuber Zhao1999
Schmidhuber, J. Zhao, J. 1999.
Direct policy search and uncertain policy evaluation
In AAAI Spring Symposium on Search under Uncertain and Incomplete Information, Stanford Univ., 119-124. American Association for Artificial Intelligence, Menlo Park, Calif.

Schmidhuber, Zhao, SchraudolphSchmidhuber et al.1997a
Schmidhuber, J., Zhao, J., Schraudolph, N. 1997a.
Reinforcement learning with self-modifying policies
In Thrun, S. Pratt, L., Learning to learn, 293-309. Kluwer.

Schmidhuber, Zhao, WieringSchmidhuber et al.1997b
Schmidhuber, J., Zhao, J., Wiering, M. 1997b.
Shifting inductive bias with success-story algorithm, adaptive Levin search, and incremental self-improvement
Machine Learning, 28, 105-130.

SchwefelSchwefel1974
Schwefel, H. P. 1974.
Numerische Optimierung von Computer-Modellen. Dissertation.
Published 1977 by Birkhäuser, Basel.

SchwefelSchwefel1995
Schwefel, H. P. 1995.
Evolution and Optimum Seeking.
Wiley Interscience.

ShannonShannon1948
Shannon, C. E. 1948.
A mathematical theory of communication (parts I and II)
Bell System Technical Journal, XXVII, 379-423.

SinghSingh1992
Singh, S. 1992.
The efficient learning of multiple task sequences
In Moody, J., Hanson, S., Lippman, R., Advances in Neural Information Processing Systems 4, 251-258 San Mateo, CA. Morgan Kaufmann.

SolomonoffSolomonoff1964
Solomonoff, R. 1964.
A formal theory of inductive inference. Part I
Information and Control, 7, 1-22.

SolomonoffSolomonoff1986
Solomonoff, R. 1986.
An application of algorithmic probability to problems in artificial intelligence
In Kanal, L. N. Lemmer, J. F., Uncertainty in Artificial Intelligence, 473-491. Elsevier Science Publishers.

Storck, Hochreiter, SchmidhuberStorck et al.1995
Storck, J., Hochreiter, S., Schmidhuber, J. 1995.
Reinforcement driven information acquisition in non-deterministic environments
In Proceedings of the International Conference on Artificial Neural Networks, Paris,  2, 159-164. EC2 & Cie, Paris.

Sun SessionsSun Sessions2000
Sun, R. Sessions, C. 2000.
Self-segmentation of sequences: automatic formation of hierarchies of sequential behaviors
IEEE Transactions on Systems, Man, and Cybernetics: Part B Cybernetics, 30(3).

SuttonSutton1988
Sutton, R. S. 1988.
Learning to predict by the methods of temporal differences
Machine Learning, 3, 9-44.

SuttonSutton1995
Sutton, R. S. 1995.
TD models: Modeling the world at a mixture of time scales
In Prieditis, A. Russell, S., Machine Learning: Proceedings of the Twelfth International Conference, 531-539. Morgan Kaufmann Publishers, San Francisco, CA.

Sutton PinetteSutton Pinette1985
Sutton, R. S. Pinette, B. 1985.
The learning of world models by connectionist networks
Proceedings of the 7th Annual Conference of the Cognitive Science Society, 54-64.

Sutton, Singh, Precup, RavindranSutton et al.1999
Sutton, R. S., Singh, S., Precup, D., Ravindran, B. 1999.
Improved switching among temporally abstract actions
In Advances in Neural Information Processing Systems 11. MIT Press.
To appear.

TellerTeller1994
Teller, A. 1994.
The evolution of mental models
In Kenneth E. Kinnear, J., Advances in Genetic Programming, 199-219. MIT Press.

TesauroTesauro1994
Tesauro, G. 1994.
TD-gammon, a self-teaching backgammon program, achieves master-level play
Neural Computation, 6(2), 215-219.

ThamTham1995
Tham, C. 1995.
Reinforcement learning of multiple tasks using a hierarchical CMAC architecture
Robotics and Autonomous Systems, 15(4), 247-274.

Thrun MöllerThrun Möller1992
Thrun, S. Möller, K. 1992.
Active exploration in dynamic environments
In Lippman, D. S., Moody, J. E., Touretzky, D. S., Advances in Neural Information Processing Systems 4, 531-538. San Mateo, CA: Morgan Kaufmann.

Wang MahadevanWang Mahadevan1998
Wang, G. Mahadevan, S. 1998.
A greedy divide-and-conquer approach to optimizing large manufacturing systems using reinforcement learning
In NIPS'98 Workshop on Abstraction and Hierarchy in Reinforcement Learning.

Watkins DayanWatkins Dayan1992
Watkins, C. J. C. H. Dayan, P. 1992.
Q-learning
Machine Learning, 8, 279-292.

WatkinsWatkins1989
Watkins, C. 1989.
Learning from Delayed Rewards.
Ph.D. thesis, King's College, Oxford.

WeissWeiss1994
Weiss, G. 1994.
Hierarchical chunking in classifier systems
In Proceedings of the 12th National Conference on Artificial Intelligence,  2, 1335-1340. AAAI Press/The MIT Press.

Weiss SenWeiss Sen1996
Weiss, G. Sen, S.. 1996.
Adaption and Learning in Multi-Agent Systems.
LNAI 1042, Springer.

Wiering SchmidhuberWiering Schmidhuber1998
Wiering, M. Schmidhuber, J. 1998.
HQ-learning
Adaptive Behavior, 6(2), 219-246.

Wiering SchmidhuberWiering Schmidhuber1996
Wiering, M. Schmidhuber, J. 1996.
Solving POMDPs with Levin search and EIRA
In Saitta, L., Machine Learning: Proceedings of the Thirteenth International Conference, 534-542. Morgan Kaufmann Publishers, San Francisco, CA.

WilliamsWilliams1992
Williams, R. J. 1992.
Simple statistical gradient-following algorithms for connectionist reinforcement learning
Machine Learning, 8, 229-256.

WilsonWilson1994
Wilson, S. 1994.
ZCS: A zeroth level classifier system
Evolutionary Computation, 2, 1-18.

WilsonWilson1995
Wilson, S. 1995.
Classifier fitness based on accuracy
Evolutionary Computation, 3(2), 149-175.

Wolpert, Tumer, FrankWolpert et al.1999
Wolpert, D. H., Tumer, K., Frank, J. 1999.
Using collective intelligence to route internet traffic
In Kearns, M., Solla, S. A., Cohn, D., Advances in Neural Information Processing Systems 12. MIT Press, Cambridge MA.



Juergen Schmidhuber 2003-02-19


Back to Reinforcement Learning and POMDP page