Bibliography

Next: About this document ... Up: Sequential Decision Making Based Previous: Acknowledgments

Bibliography

AndreAndre1998

Andre, D. 1998.
Learning hierarchical behaviors
In NIPS'98 Workshop on Abstraction and Hierarchy in Reinforcement Learning.

Banzhaf, Nordin, Keller, FranconeBanzhaf et al.1998

Banzhaf, W., Nordin, P., Keller, R. E., Francone, F. D. 1998.
Genetic Programming - An Introduction.
Morgan Kaufmann Publishers, San Francisco, CA, USA.

Barto, Sutton, AndersonBarto et al.1983

Barto, A. G., Sutton, R. S., Anderson, C. W. 1983.
Neuronlike adaptive elements that can solve difficult learning control problems
IEEE Transactions on Systems, Man, and Cybernetics, SMC-13, 834-846.

Baum DurdanovicBaum Durdanovic1998

Baum, E. B. Durdanovic, I. 1998.
Toward code evolution by artificial economies
, NEC Research Institute, Princeton, NJ.
Extension of a paper in Proc. 13th ICML'1996, Morgan Kaufmann, CA.

BellmanBellman1961

Bellman, R. 1961.
Adaptive Control Processes.
Princeton University Press.

Bertsekas TsitsiklisBertsekas Tsitsiklis1996

Bertsekas, D. P. Tsitsiklis, J. N. 1996.
Neuro-dynamic Programming.
Athena Scientific, Belmont, MA.

Bowling VelosoBowling Veloso1998

Bowling, M. Veloso, M. 1998.
Bounding the suboptimality of reusing subproblems
In NIPS'98 Workshop on Abstraction and Hierarchy in Reinforcement Learning.

ChaitinChaitin1969

Chaitin, G. 1969.
On the length of programs for computing finite binary sequences: statistical considerations
Journal of the ACM, 16, 145-159.

Coelho GrupenCoelho Grupen1998

Coelho, J. Grupen, R. A. 1998.
Control abstractions as state representation
In NIPS'98 Workshop on Abstraction and Hierarchy in Reinforcement Learning.

CohnCohn1994

Cohn, D. A. 1994.
Neural network exploration using optimal experiment design
In Cowan, J., Tesauro, G., Alspector, J., Advances in Neural Information Processing Systems 6, 679-686. San Mateo, CA: Morgan Kaufmann.

CramerCramer1985

Cramer, N. L. 1985.
A representation for the adaptive generation of simple sequential programs
In Grefenstette, J., Proceedings of an International Conference on Genetic Algorithms and Their Applications Hillsdale NJ. Lawrence Erlbaum Associates.

Dayan HintonDayan Hinton1993

Dayan, P. Hinton, G. 1993.
Feudal reinforcement learning
In Lippman, D. S., Moody, J. E., Touretzky, D. S., Advances in Neural Information Processing Systems 5, 271-278. San Mateo, CA: Morgan Kaufmann.

Dayan SejnowskiDayan Sejnowski1996

Dayan, P. Sejnowski, T. J. 1996.
Exloration bonuses and dual control
Machine Learning, 25, 5-22.

Dickmanns, Schmidhuber, WinklhoferDickmanns et al.1987

Dickmanns, D., Schmidhuber, J., Winklhofer, A. 1987.
Der genetische Algorithmus: Eine Implementierung in Prolog. Fortgeschrittenenpraktikum, Institut für Informatik, Lehrstuhl Prof. Radig, Technische Universität München.

DigneyDigney1996

Digney, B. 1996.
Emergent hierarchical control structures: Learning reactive/hierarchical relationships in reinforcement environments
In Maes, P., Mataric, M., Meyer, J.-A., Pollack, J., Wilson, S. W., From Animals to Animats 4: Proceedings of the Fourth International Conference on Simulation of Adaptive Behavior, Cambridge, MA, 363-372. MIT Press, Bradford Books.

Eldracher BaginskiEldracher Baginski1993

Eldracher, M. Baginski, B. 1993.
Neural subgoal generation using backpropagation
In Lendaris, G. G., Grossberg, S., Kosko, B., World Congress on Neural Networks, III-145-III-148. Lawrence Erlbaum Associates, Inc., Publishers, Hillsdale.

FedorovFedorov1972

Fedorov, V. V. 1972.
Theory of optimal experiments.
Academic Press.

GittinsGittins1989

Gittins, J. C. 1989.
Multi-armed Bandit Allocation Indices.
Wiley-Interscience series in systems and optimization. Wiley, Chichester, NY.

Harada RussellHarada Russell1998

Harada, D. Russell, S. 1998.
Meta-level reinforcement learning
In NIPS'98 Workshop on Abstraction and Hierarchy in Reinforcement Learning.

Hochreiter SchmidhuberHochreiter Schmidhuber1997

Hochreiter, S. Schmidhuber, J. 1997.
LSTM can solve hard long time lag problems
In Mozer, M. C., Jordan, M. I., Petsche, T., Advances in Neural Information Processing Systems 9, 473-479. MIT Press, Cambridge MA.

HollandHolland1975

Holland, J. H. 1975.
Adaptation in Natural and Artificial Systems.
University of Michigan Press, Ann Arbor.

HollandHolland1985

Holland, J. H. 1985.
Properties of the bucket brigade
In Proceedings of an International Conference on Genetic Algorithms. Hillsdale, NJ.

Huber GrupenHuber Grupen1998

Huber, M. Grupen, R. A. 1998.
Learning robot control using control policies as abstract actions
In NIPS'98 Workshop on Abstraction and Hierarchy in Reinforcement Learning.

HumphrysHumphrys1996

Humphrys, M. 1996.
Action selection methods using reinforcement learning
In Maes, P., Mataric, M., Meyer, J.-A., Pollack, J., Wilson, S. W., From Animals to Animats 4: Proceedings of the Fourth International Conference on Simulation of Adaptive Behavior, Cambridge, MA, 135-144. MIT Press, Bradford Books.

Hwang, Choi, Oh, IIHwang et al.1991

Hwang, J., Choi, J., Oh, S., II, R. J. M. 1991.
Query-based learning applied to partially trained multilayer perceptrons
IEEE Transactions on Neural Networks, 2(1), 131-136.

Jaakkola, Singh, JordanJaakkola et al.1995

Jaakkola, T., Singh, S. P., Jordan, M. I. 1995.
Reinforcement learning algorithm for partially observable Markov decision problems
In Tesauro, G., Touretzky, D. S., Leen, T. K., Advances in Neural Information Processing Systems 7, 345-352. MIT Press, Cambridge MA.

Juels WattenbergJuels Wattenberg1996

Juels, A. Wattenberg, M. 1996.
Stochastic hillclimbing as a baseline method for evaluating genetic algorithms
In Touretzky, D. S., Mozer, M. C., Hasselmo, M. E., Advances in Neural Information Processing Systems, 8, 430-436. The MIT Press, Cambridge, MA.

KaelblingKaelbling1993

Kaelbling, L. 1993.
Learning in Embedded Systems.
MIT Press.

Kaelbling, Littman, CassandraKaelbling et al.1995

Kaelbling, L., Littman, M., Cassandra, A. 1995.
Planning and acting in partially observable stochastic domains
, Brown University, Providence RI.

Kearns SinghKearns Singh1999

Kearns, M. Singh, S. 1999.
Finite-sample convergence rates for Q-learning and indirect algorithms
In Kearns, M., Solla, S. A., Cohn, D., Advances in Neural Information Processing Systems 12. MIT Press, Cambridge MA.

KirchnerKirchner1998

Kirchner, F. 1998.
Q-learning of complex behaviors on a six-legged walking machine
In NIPS'98 Workshop on Abstraction and Hierarchy in Reinforcement Learning.

Koenig SimmonsKoenig Simmons1996

Koenig, S. Simmons, R. G. 1996.
The effect of representation and knowedge on goal-directed exploration with reinforcement learnign algorithm
Machine Learning, 22, 228-250.

KolmogorovKolmogorov1965

Kolmogorov, A. 1965.
Three approaches to the quantitative definition of information
Problems of Information Transmission, 1, 1-11.

Koumoutsakos P. D.Koumoutsakos P. D.1998

Koumoutsakos P., F. J. D., P. 1998.
Evolution strategies for parameter optimization in jet flow control
Center for Turbulence Research - Proceedings of the Summer program 1998, 10, 121-132.

LenatLenat1983

Lenat, D. 1983.
Theory formation by heuristic search
Machine Learning, 21.

LevinLevin1973

Levin, L. A. 1973.
Universal sequential search problems
Problems of Information Transmission, 9(3), 265-266.

LevinLevin1984

Levin, L. A. 1984.
Randomness conservation inequalities: Information and independence in mathematical theories
Information and Control, 61, 15-37.

Li VitányiLi Vitányi1993

Li, M. Vitányi, P. M. B. 1993.
An Introduction to Kolmogorov Complexity and its Applications.
Springer.

LinLin1993

Lin, L. 1993.
Reinforcement Learning for Robots Using Neural Networks.
Ph.D. thesis, Carnegie Mellon University, Pittsburgh.

LittmanLittman1996

Littman, M. 1996.
Algorithms for Sequential Decision Making.
Ph.D. thesis, Brown University.

Littman, Cassandra, KaelblingLittman et al.1995

Littman, M., Cassandra, A., Kaelbling, L. 1995.
Learning policies for partially observable environments: Scaling up
In Prieditis, A. Russell, S., Machine Learning: Proceedings of the Twelfth International Conference, 362-370. Morgan Kaufmann Publishers, San Francisco, CA.

MacKayMacKay1992

MacKay, D. J. C. 1992.
Information-based objective functions for active data selection
Neural Computation, 4(2), 550-604.

McCallumMcCallum1996

McCallum, R. A. 1996.
Learning to use selective attention and short-term memory in sequential tasks
In Maes, P., Mataric, M., Meyer, J.-A., Pollack, J., Wilson, S. W., From Animals to Animats 4: Proceedings of the Fourth International Conference on Simulation of Adaptive Behavior, Cambridge, MA, 315-324. MIT Press, Bradford Books.

McGovernMcGovern1998

McGovern, A. 1998.
acquire-macros: An algorithm for automatically learning macro-action
In NIPS'98 Workshop on Abstraction and Hierarchy in Reinforcement Learning.

Moore AtkesonMoore Atkeson1993

Moore, A. Atkeson, C. G. 1993.
Prioritized sweeping: Reinforcement learning with less data and less time
Machine Learning, 13, 103-130.

Moore, Baird, KaelblingMoore et al.1998

Moore, A. W., Baird, L., Kaelbling, L. P. 1998.
Multi-value-functions: Efficient automatic action hierarchies for multiple goal mdps
In NIPS'98 Workshop on Abstraction and Hierarchy in Reinforcement Learning.

Plutowski, Cottrell, WhitePlutowski et al.1994

Plutowski, M., Cottrell, G., White, H. 1994.
Learning Mackey-Glass from 25 examples, plus or minus 2
In Cowan, J., Tesauro, G., Alspector, J., Advances in Neural Information Processing Systems 6, 1135-1142. San Mateo, CA: Morgan Kaufmann.

RayRay1992

Ray, T. S. 1992.
An approach to the synthesis of life
In Langton, C., Taylor, C., Farmer, J. D., Rasmussen, S., Artificial Life II, 371-408. Addison Wesley Publishing Company.

RechenbergRechenberg1971

Rechenberg, I. 1971.
Evolutionsstrategie - Optimierung technischer Systeme nach Prinzipien der biologischen Evolution. Dissertation.
Published 1973 by Fromman-Holzboog.

RingRing1991

Ring, M. B. 1991.
Incremental development of complex behaviors through automatic construction of sensory-motor hierarchies
In Birnbaum, L. Collins, G., Machine Learning: Proceedings of the Eighth International Workshop, 343-347. Morgan Kaufmann.

RingRing1993

Ring, M. B. 1993.
Learning sequential tasks by incrementally adding higher orders
In S. J. Hanson, J. D. C. Giles, C. L., Advances in Neural Information Processing Systems 5, 115-122. Morgan Kaufmann.

RingRing1994

Ring, M. B. 1994.
Continual Learning in Reinforcement Environments.
Ph.D. thesis, University of Texas at Austin, Austin, Texas 78712.

Sa $\l$ ustowicz SchmidhuberSa $\l$ ustowicz Schmidhuber1997

Sa $\l$ ustowicz, R. P. Schmidhuber, J. 1997.
Probabilistic incremental program evolution
Evolutionary Computation, 5(2), 123-141.

SamuelSamuel1959

Samuel, A. L. 1959.
Some studies in machine learning using the game of checkers
IBM Journal on Research and Development, 3, 210-229.

SchmidhuberSchmidhuber1987

Schmidhuber, J. 1987.
Evolutionary principles in self-referential learning, or on learning how to learn: the meta-meta-... hook. Institut für Informatik, Technische Universität München.

SchmidhuberSchmidhuber1989

Schmidhuber, J. 1989.
A local learning algorithm for dynamic feedforward and recurrent networks
Connection Science, 1(4), 403-412.

SchmidhuberSchmidhuber1991a

Schmidhuber, J. 1991a.
Curious model-building control systems
In Proc. International Joint Conference on Neural Networks, Singapore, 2, 1458-1463. IEEE.

SchmidhuberSchmidhuber1991b

Schmidhuber, J. 1991b.
Learning to generate sub-goals for action sequences
In Kohonen, T., Mäkisara, K., Simula, O., Kangas, J., Artificial Neural Networks, 967-972. Elsevier Science Publishers B.V., North-Holland.

SchmidhuberSchmidhuber1991c

Schmidhuber, J. 1991c.
Reinforcement learning in Markovian and non-Markovian environments
In Lippman, D. S., Moody, J. E., Touretzky, D. S., Advances in Neural Information Processing Systems 3, 500-506. San Mateo, CA: Morgan Kaufmann.

SchmidhuberSchmidhuber1995

Schmidhuber, J. 1995.
Discovering solutions with low Kolmogorov complexity and high generalization capability
In Prieditis, A. Russell, S., Machine Learning: Proceedings of the Twelfth International Conference, 488-496. Morgan Kaufmann Publishers, San Francisco, CA.

SchmidhuberSchmidhuber1997

Schmidhuber, J. 1997.
Discovering neural nets with low Kolmogorov complexity and high generalization capability
Neural Networks, 10(5), 857-873.

SchmidhuberSchmidhuber1999

Schmidhuber, J. 1999.
Artificial curiosity based on discovering novel algorithmic predictability through coevolution
In Angeline, P., Michalewicz, Z., Schoenauer, M., Yao, X., Zalzala, Z., Congress on Evolutionary Computation, 1612-1618. IEEE Press, Piscataway, NJ.

Schmidhuber PrelingerSchmidhuber Prelinger1993

Schmidhuber, J. Prelinger, D. 1993.
Discovering predictable classifications
Neural Computation, 5(4), 625-635.

Schmidhuber ZhaoSchmidhuber Zhao1999

Schmidhuber, J. Zhao, J. 1999.
Direct policy search and uncertain policy evaluation
In AAAI Spring Symposium on Search under Uncertain and Incomplete Information, Stanford Univ., 119-124. American Association for Artificial Intelligence, Menlo Park, Calif.

Schmidhuber, Zhao, SchraudolphSchmidhuber et al.1997a

Schmidhuber, J., Zhao, J., Schraudolph, N. 1997a.
Reinforcement learning with self-modifying policies
In Thrun, S. Pratt, L., Learning to learn, 293-309. Kluwer.

Schmidhuber, Zhao, WieringSchmidhuber et al.1997b

Schmidhuber, J., Zhao, J., Wiering, M. 1997b.
Shifting inductive bias with success-story algorithm, adaptive Levin search, and incremental self-improvement
Machine Learning, 28, 105-130.

SchwefelSchwefel1974

Schwefel, H. P. 1974.
Numerische Optimierung von Computer-Modellen. Dissertation.
Published 1977 by Birkhäuser, Basel.

SchwefelSchwefel1995

Schwefel, H. P. 1995.
Evolution and Optimum Seeking.
Wiley Interscience.

ShannonShannon1948

Shannon, C. E. 1948.
A mathematical theory of communication (parts I and II)
Bell System Technical Journal, XXVII, 379-423.

SinghSingh1992

Singh, S. 1992.
The efficient learning of multiple task sequences
In Moody, J., Hanson, S., Lippman, R., Advances in Neural Information Processing Systems 4, 251-258 San Mateo, CA. Morgan Kaufmann.

SolomonoffSolomonoff1964

Solomonoff, R. 1964.
A formal theory of inductive inference. Part I
Information and Control, 7, 1-22.

SolomonoffSolomonoff1986

Solomonoff, R. 1986.
An application of algorithmic probability to problems in artificial intelligence
In Kanal, L. N. Lemmer, J. F., Uncertainty in Artificial Intelligence, 473-491. Elsevier Science Publishers.

Storck, Hochreiter, SchmidhuberStorck et al.1995

Storck, J., Hochreiter, S., Schmidhuber, J. 1995.
Reinforcement driven information acquisition in non-deterministic environments
In Proceedings of the International Conference on Artificial Neural Networks, Paris, 2, 159-164. EC2 & Cie, Paris.

Sun SessionsSun Sessions2000

Sun, R. Sessions, C. 2000.
Self-segmentation of sequences: automatic formation of hierarchies of sequential behaviors
IEEE Transactions on Systems, Man, and Cybernetics: Part B Cybernetics, 30(3).

SuttonSutton1988

Sutton, R. S. 1988.
Learning to predict by the methods of temporal differences
Machine Learning, 3, 9-44.

SuttonSutton1995

Sutton, R. S. 1995.
TD models: Modeling the world at a mixture of time scales
In Prieditis, A. Russell, S., Machine Learning: Proceedings of the Twelfth International Conference, 531-539. Morgan Kaufmann Publishers, San Francisco, CA.

Sutton PinetteSutton Pinette1985

Sutton, R. S. Pinette, B. 1985.
The learning of world models by connectionist networks
Proceedings of the 7th Annual Conference of the Cognitive Science Society, 54-64.

Sutton, Singh, Precup, RavindranSutton et al.1999

Sutton, R. S., Singh, S., Precup, D., Ravindran, B. 1999.
Improved switching among temporally abstract actions
In Advances in Neural Information Processing Systems 11. MIT Press.
To appear.

TellerTeller1994

Teller, A. 1994.
The evolution of mental models
In Kenneth E. Kinnear, J., Advances in Genetic Programming, 199-219. MIT Press.

TesauroTesauro1994

Tesauro, G. 1994.
TD-gammon, a self-teaching backgammon program, achieves master-level play
Neural Computation, 6(2), 215-219.

ThamTham1995

Tham, C. 1995.
Reinforcement learning of multiple tasks using a hierarchical CMAC architecture
Robotics and Autonomous Systems, 15(4), 247-274.

Thrun MöllerThrun Möller1992

Thrun, S. Möller, K. 1992.
Active exploration in dynamic environments
In Lippman, D. S., Moody, J. E., Touretzky, D. S., Advances in Neural Information Processing Systems 4, 531-538. San Mateo, CA: Morgan Kaufmann.

Wang MahadevanWang Mahadevan1998

Wang, G. Mahadevan, S. 1998.
A greedy divide-and-conquer approach to optimizing large manufacturing systems using reinforcement learning
In NIPS'98 Workshop on Abstraction and Hierarchy in Reinforcement Learning.

Watkins DayanWatkins Dayan1992

Watkins, C. J. C. H. Dayan, P. 1992.
Q-learning
Machine Learning, 8, 279-292.

WatkinsWatkins1989

Watkins, C. 1989.
Learning from Delayed Rewards.
Ph.D. thesis, King's College, Oxford.

WeissWeiss1994

Weiss, G. 1994.
Hierarchical chunking in classifier systems
In Proceedings of the 12th National Conference on Artificial Intelligence, 2, 1335-1340. AAAI Press/The MIT Press.

Weiss SenWeiss Sen1996

Weiss, G. Sen, S.. 1996.
Adaption and Learning in Multi-Agent Systems.
LNAI 1042, Springer.

Wiering SchmidhuberWiering Schmidhuber1998

Wiering, M. Schmidhuber, J. 1998.
HQ-learning
Adaptive Behavior, 6(2), 219-246.

Wiering SchmidhuberWiering Schmidhuber1996

Wiering, M. Schmidhuber, J. 1996.
Solving POMDPs with Levin search and EIRA
In Saitta, L., Machine Learning: Proceedings of the Thirteenth International Conference, 534-542. Morgan Kaufmann Publishers, San Francisco, CA.

WilliamsWilliams1992

Williams, R. J. 1992.
Simple statistical gradient-following algorithms for connectionist reinforcement learning
Machine Learning, 8, 229-256.

WilsonWilson1994

Wilson, S. 1994.
ZCS: A zeroth level classifier system
Evolutionary Computation, 2, 1-18.

WilsonWilson1995

Wilson, S. 1995.
Classifier fitness based on accuracy
Evolutionary Computation, 3(2), 149-175.

Wolpert, Tumer, FrankWolpert et al.1999

Wolpert, D. H., Tumer, K., Frank, J. 1999.
Using collective intelligence to route internet traffic
In Kearns, M., Solla, S. A., Cohn, D., Advances in Neural Information Processing Systems 12. MIT Press, Cambridge MA.

Juergen Schmidhuber 2003-02-19

Back to Reinforcement Learning and POMDP page