Next: About this document ...
Up: Sequential Decision Making Based
Previous: Acknowledgments
-
- AndreAndre1998
-
Andre, D. 1998.
Learning hierarchical behaviors
In NIPS'98 Workshop on Abstraction and Hierarchy in
Reinforcement Learning.
- Banzhaf, Nordin, Keller, FranconeBanzhaf
et al.1998
-
Banzhaf, W., Nordin, P., Keller, R. E., Francone, F. D. 1998.
Genetic Programming - An Introduction.
Morgan Kaufmann Publishers, San Francisco, CA, USA.
- Barto, Sutton, AndersonBarto
et al.1983
-
Barto, A. G., Sutton, R. S., Anderson, C. W. 1983.
Neuronlike adaptive elements that can solve difficult learning
control problems
IEEE Transactions on Systems, Man, and Cybernetics,
SMC-13, 834-846.
- Baum DurdanovicBaum Durdanovic1998
-
Baum, E. B. Durdanovic, I. 1998.
Toward code evolution by artificial economies
, NEC Research Institute, Princeton, NJ.
Extension of a paper in Proc. 13th ICML'1996, Morgan Kaufmann, CA.
- BellmanBellman1961
-
Bellman, R. 1961.
Adaptive Control Processes.
Princeton University Press.
- Bertsekas TsitsiklisBertsekas Tsitsiklis1996
-
Bertsekas, D. P. Tsitsiklis, J. N. 1996.
Neuro-dynamic Programming.
Athena Scientific, Belmont, MA.
- Bowling VelosoBowling Veloso1998
-
Bowling, M. Veloso, M. 1998.
Bounding the suboptimality of reusing subproblems
In NIPS'98 Workshop on Abstraction and Hierarchy in
Reinforcement Learning.
- ChaitinChaitin1969
-
Chaitin, G. 1969.
On the length of programs for computing finite binary
sequences: statistical considerations
Journal of the ACM, 16, 145-159.
- Coelho GrupenCoelho Grupen1998
-
Coelho, J. Grupen, R. A. 1998.
Control abstractions as state representation
In NIPS'98 Workshop on Abstraction and Hierarchy in
Reinforcement Learning.
- CohnCohn1994
-
Cohn, D. A. 1994.
Neural network exploration using optimal experiment
design
In Cowan, J., Tesauro, G., Alspector, J., Advances
in Neural Information Processing Systems 6, 679-686. San Mateo, CA:
Morgan Kaufmann.
- CramerCramer1985
-
Cramer, N. L. 1985.
A representation for the adaptive generation of simple
sequential programs
In Grefenstette, J., Proceedings of an International
Conference on Genetic Algorithms and Their Applications Hillsdale NJ.
Lawrence Erlbaum Associates.
- Dayan HintonDayan Hinton1993
-
Dayan, P. Hinton, G. 1993.
Feudal reinforcement learning
In Lippman, D. S., Moody, J. E., Touretzky, D. S.,
Advances in Neural Information Processing Systems 5, 271-278. San
Mateo, CA: Morgan Kaufmann.
- Dayan SejnowskiDayan Sejnowski1996
-
Dayan, P. Sejnowski, T. J. 1996.
Exloration bonuses and dual control
Machine Learning, 25, 5-22.
- Dickmanns, Schmidhuber, WinklhoferDickmanns
et al.1987
-
Dickmanns, D., Schmidhuber, J., Winklhofer, A. 1987.
Der genetische Algorithmus: Eine Implementierung in Prolog.
Fortgeschrittenenpraktikum, Institut für Informatik, Lehrstuhl Prof.
Radig, Technische Universität München.
- DigneyDigney1996
-
Digney, B. 1996.
Emergent hierarchical control structures: Learning
reactive/hierarchical relationships in reinforcement environments
In Maes, P., Mataric, M., Meyer, J.-A., Pollack, J., Wilson,
S. W., From Animals to Animats 4: Proceedings of the Fourth
International Conference on Simulation of Adaptive Behavior, Cambridge, MA,
363-372. MIT Press, Bradford Books.
- Eldracher BaginskiEldracher Baginski1993
-
Eldracher, M. Baginski, B. 1993.
Neural subgoal generation using backpropagation
In Lendaris, G. G., Grossberg, S., Kosko, B., World
Congress on Neural Networks, III-145-III-148. Lawrence Erlbaum
Associates, Inc., Publishers, Hillsdale.
- FedorovFedorov1972
-
Fedorov, V. V. 1972.
Theory of optimal experiments.
Academic Press.
- GittinsGittins1989
-
Gittins, J. C. 1989.
Multi-armed Bandit Allocation Indices.
Wiley-Interscience series in systems and optimization. Wiley,
Chichester, NY.
- Harada RussellHarada Russell1998
-
Harada, D. Russell, S. 1998.
Meta-level reinforcement learning
In NIPS'98 Workshop on Abstraction and Hierarchy in
Reinforcement Learning.
- Hochreiter SchmidhuberHochreiter Schmidhuber1997
-
Hochreiter, S. Schmidhuber, J. 1997.
LSTM can solve hard long time lag problems
In Mozer, M. C., Jordan, M. I., Petsche, T.,
Advances in Neural Information Processing Systems 9, 473-479. MIT
Press, Cambridge MA.
- HollandHolland1975
-
Holland, J. H. 1975.
Adaptation in Natural and Artificial Systems.
University of Michigan Press, Ann Arbor.
- HollandHolland1985
-
Holland, J. H. 1985.
Properties of the bucket brigade
In Proceedings of an International Conference on Genetic
Algorithms. Hillsdale, NJ.
- Huber GrupenHuber Grupen1998
-
Huber, M. Grupen, R. A. 1998.
Learning robot control using control policies as abstract
actions
In NIPS'98 Workshop on Abstraction and Hierarchy in
Reinforcement Learning.
- HumphrysHumphrys1996
-
Humphrys, M. 1996.
Action selection methods using reinforcement learning
In Maes, P., Mataric, M., Meyer, J.-A., Pollack, J., Wilson,
S. W., From Animals to Animats 4: Proceedings of the Fourth
International Conference on Simulation of Adaptive Behavior, Cambridge, MA,
135-144. MIT Press, Bradford Books.
- Hwang, Choi, Oh, IIHwang
et al.1991
-
Hwang, J., Choi, J., Oh, S., II, R. J. M. 1991.
Query-based learning applied to partially trained multilayer
perceptrons
IEEE Transactions on Neural Networks, 2(1), 131-136.
- Jaakkola, Singh, JordanJaakkola
et al.1995
-
Jaakkola, T., Singh, S. P., Jordan, M. I. 1995.
Reinforcement learning algorithm for partially observable
Markov decision problems
In Tesauro, G., Touretzky, D. S., Leen, T. K.,
Advances in Neural Information Processing Systems 7, 345-352. MIT
Press, Cambridge MA.
- Juels WattenbergJuels Wattenberg1996
-
Juels, A. Wattenberg, M. 1996.
Stochastic hillclimbing as a baseline method for evaluating
genetic algorithms
In Touretzky, D. S., Mozer, M. C., Hasselmo, M. E.,
Advances in Neural Information Processing Systems, 8,
430-436. The MIT Press, Cambridge, MA.
- KaelblingKaelbling1993
-
Kaelbling, L. 1993.
Learning in Embedded Systems.
MIT Press.
- Kaelbling, Littman, CassandraKaelbling
et al.1995
-
Kaelbling, L., Littman, M., Cassandra, A. 1995.
Planning and acting in partially observable stochastic
domains
, Brown University, Providence RI.
- Kearns SinghKearns Singh1999
-
Kearns, M. Singh, S. 1999.
Finite-sample convergence rates for Q-learning and indirect
algorithms
In Kearns, M., Solla, S. A., Cohn, D., Advances in
Neural Information Processing Systems 12. MIT Press, Cambridge MA.
- KirchnerKirchner1998
-
Kirchner, F. 1998.
Q-learning of complex behaviors on a six-legged walking
machine
In NIPS'98 Workshop on Abstraction and Hierarchy in
Reinforcement Learning.
- Koenig SimmonsKoenig Simmons1996
-
Koenig, S. Simmons, R. G. 1996.
The effect of representation and knowedge on goal-directed
exploration with reinforcement learnign algorithm
Machine Learning, 22, 228-250.
- KolmogorovKolmogorov1965
-
Kolmogorov, A. 1965.
Three approaches to the quantitative definition of
information
Problems of Information Transmission, 1, 1-11.
- Koumoutsakos P. D.Koumoutsakos P. D.1998
-
Koumoutsakos P., F. J. D., P. 1998.
Evolution strategies for parameter optimization in jet flow
control
Center for Turbulence Research - Proceedings of the Summer
program 1998, 10, 121-132.
- LenatLenat1983
-
Lenat, D. 1983.
Theory formation by heuristic search
Machine Learning, 21.
- LevinLevin1973
-
Levin, L. A. 1973.
Universal sequential search problems
Problems of Information Transmission, 9(3), 265-266.
- LevinLevin1984
-
Levin, L. A. 1984.
Randomness conservation inequalities: Information and
independence in mathematical theories
Information and Control, 61, 15-37.
- Li VitányiLi Vitányi1993
-
Li, M. Vitányi, P. M. B. 1993.
An Introduction to Kolmogorov Complexity and its
Applications.
Springer.
- LinLin1993
-
Lin, L. 1993.
Reinforcement Learning for Robots Using Neural Networks.
Ph.D. thesis, Carnegie Mellon University, Pittsburgh.
- LittmanLittman1996
-
Littman, M. 1996.
Algorithms for Sequential Decision Making.
Ph.D. thesis, Brown University.
- Littman, Cassandra, KaelblingLittman
et al.1995
-
Littman, M., Cassandra, A., Kaelbling, L. 1995.
Learning policies for partially observable environments:
Scaling up
In Prieditis, A. Russell, S., Machine
Learning: Proceedings of the Twelfth International Conference, 362-370. Morgan Kaufmann Publishers, San Francisco, CA.
- MacKayMacKay1992
-
MacKay, D. J. C. 1992.
Information-based objective functions for active data
selection
Neural Computation, 4(2), 550-604.
- McCallumMcCallum1996
-
McCallum, R. A. 1996.
Learning to use selective attention and short-term memory in
sequential tasks
In Maes, P., Mataric, M., Meyer, J.-A., Pollack, J., Wilson,
S. W., From Animals to Animats 4: Proceedings of the Fourth
International Conference on Simulation of Adaptive Behavior, Cambridge, MA,
315-324. MIT Press, Bradford Books.
- McGovernMcGovern1998
-
McGovern, A. 1998.
acquire-macros: An algorithm for automatically learning
macro-action
In NIPS'98 Workshop on Abstraction and Hierarchy in
Reinforcement Learning.
- Moore AtkesonMoore Atkeson1993
-
Moore, A. Atkeson, C. G. 1993.
Prioritized sweeping: Reinforcement learning with less data and
less time
Machine Learning, 13, 103-130.
- Moore, Baird, KaelblingMoore
et al.1998
-
Moore, A. W., Baird, L., Kaelbling, L. P. 1998.
Multi-value-functions: Efficient automatic action hierarchies
for multiple goal mdps
In NIPS'98 Workshop on Abstraction and Hierarchy in
Reinforcement Learning.
- Plutowski, Cottrell, WhitePlutowski
et al.1994
-
Plutowski, M., Cottrell, G., White, H. 1994.
Learning Mackey-Glass from 25 examples, plus or minus 2
In Cowan, J., Tesauro, G., Alspector, J., Advances
in Neural Information Processing Systems 6, 1135-1142. San Mateo,
CA: Morgan Kaufmann.
- RayRay1992
-
Ray, T. S. 1992.
An approach to the synthesis of life
In Langton, C., Taylor, C., Farmer, J. D., Rasmussen, S.,
Artificial Life II, 371-408. Addison Wesley Publishing
Company.
- RechenbergRechenberg1971
-
Rechenberg, I. 1971.
Evolutionsstrategie - Optimierung technischer Systeme nach
Prinzipien der biologischen Evolution. Dissertation.
Published 1973 by Fromman-Holzboog.
- RingRing1991
-
Ring, M. B. 1991.
Incremental development of complex behaviors through automatic
construction of sensory-motor hierarchies
In Birnbaum, L. Collins, G., Machine
Learning: Proceedings of the Eighth International Workshop, 343-347.
Morgan Kaufmann.
- RingRing1993
-
Ring, M. B. 1993.
Learning sequential tasks by incrementally adding higher
orders
In S. J. Hanson, J. D. C. Giles, C. L.,
Advances in Neural Information Processing Systems 5, 115-122. Morgan
Kaufmann.
- RingRing1994
-
Ring, M. B. 1994.
Continual Learning in Reinforcement Environments.
Ph.D. thesis, University of Texas at Austin, Austin, Texas 78712.
- Saustowicz SchmidhuberSaustowicz Schmidhuber1997
-
Saustowicz, R. P. Schmidhuber, J. 1997.
Probabilistic incremental program evolution
Evolutionary Computation, 5(2), 123-141.
- SamuelSamuel1959
-
Samuel, A. L. 1959.
Some studies in machine learning using the game of
checkers
IBM Journal on Research and Development, 3, 210-229.
- SchmidhuberSchmidhuber1987
-
Schmidhuber, J. 1987.
Evolutionary principles in self-referential learning, or on
learning how to learn: the meta-meta-... hook. Institut für Informatik,
Technische Universität München.
- SchmidhuberSchmidhuber1989
-
Schmidhuber, J. 1989.
A local learning algorithm for dynamic feedforward and
recurrent networks
Connection Science, 1(4), 403-412.
- SchmidhuberSchmidhuber1991a
-
Schmidhuber, J. 1991a.
Curious model-building control systems
In Proc. International Joint Conference on Neural Networks,
Singapore, 2, 1458-1463. IEEE.
- SchmidhuberSchmidhuber1991b
-
Schmidhuber, J. 1991b.
Learning to generate sub-goals for action sequences
In Kohonen, T., Mäkisara, K., Simula, O., Kangas, J.,
Artificial Neural Networks, 967-972. Elsevier Science
Publishers B.V., North-Holland.
- SchmidhuberSchmidhuber1991c
-
Schmidhuber, J. 1991c.
Reinforcement learning in Markovian and non-Markovian
environments
In Lippman, D. S., Moody, J. E., Touretzky, D. S.,
Advances in Neural Information Processing Systems 3, 500-506. San
Mateo, CA: Morgan Kaufmann.
- SchmidhuberSchmidhuber1995
-
Schmidhuber, J. 1995.
Discovering solutions with low Kolmogorov complexity and high
generalization capability
In Prieditis, A. Russell, S., Machine
Learning: Proceedings of the Twelfth International Conference, 488-496. Morgan Kaufmann Publishers, San Francisco, CA.
- SchmidhuberSchmidhuber1997
-
Schmidhuber, J. 1997.
Discovering neural nets with low Kolmogorov complexity and
high generalization capability
Neural Networks, 10(5), 857-873.
- SchmidhuberSchmidhuber1999
-
Schmidhuber, J. 1999.
Artificial curiosity based on discovering novel algorithmic
predictability through coevolution
In Angeline, P., Michalewicz, Z., Schoenauer, M., Yao, X., Zalzala, Z., Congress on Evolutionary Computation, 1612-1618. IEEE Press, Piscataway, NJ.
- Schmidhuber PrelingerSchmidhuber Prelinger1993
-
Schmidhuber, J. Prelinger, D. 1993.
Discovering predictable classifications
Neural Computation, 5(4), 625-635.
- Schmidhuber ZhaoSchmidhuber Zhao1999
-
Schmidhuber, J. Zhao, J. 1999.
Direct policy search and uncertain policy evaluation
In AAAI Spring Symposium on Search under Uncertain and
Incomplete Information, Stanford Univ., 119-124. American
Association for Artificial Intelligence, Menlo Park, Calif.
- Schmidhuber, Zhao, SchraudolphSchmidhuber
et al.1997a
-
Schmidhuber, J., Zhao, J., Schraudolph, N. 1997a.
Reinforcement learning with self-modifying policies
In Thrun, S. Pratt, L., Learning to
learn, 293-309. Kluwer.
- Schmidhuber, Zhao, WieringSchmidhuber
et al.1997b
-
Schmidhuber, J., Zhao, J., Wiering, M. 1997b.
Shifting inductive bias with success-story algorithm, adaptive
Levin search, and incremental self-improvement
Machine Learning, 28, 105-130.
- SchwefelSchwefel1974
-
Schwefel, H. P. 1974.
Numerische Optimierung von Computer-Modellen.
Dissertation.
Published 1977 by Birkhäuser, Basel.
- SchwefelSchwefel1995
-
Schwefel, H. P. 1995.
Evolution and Optimum Seeking.
Wiley Interscience.
- ShannonShannon1948
-
Shannon, C. E. 1948.
A mathematical theory of communication (parts I and
II)
Bell System Technical Journal, XXVII, 379-423.
- SinghSingh1992
-
Singh, S. 1992.
The efficient learning of multiple task sequences
In Moody, J., Hanson, S., Lippman, R., Advances in
Neural Information Processing Systems 4, 251-258 San Mateo, CA.
Morgan Kaufmann.
- SolomonoffSolomonoff1964
-
Solomonoff, R. 1964.
A formal theory of inductive inference. Part I
Information and Control, 7, 1-22.
- SolomonoffSolomonoff1986
-
Solomonoff, R. 1986.
An application of algorithmic probability to problems in
artificial intelligence
In Kanal, L. N. Lemmer, J. F.,
Uncertainty in Artificial Intelligence, 473-491. Elsevier Science
Publishers.
- Storck, Hochreiter, SchmidhuberStorck
et al.1995
-
Storck, J., Hochreiter, S., Schmidhuber, J. 1995.
Reinforcement driven information acquisition in
non-deterministic environments
In Proceedings of the International Conference on Artificial
Neural Networks, Paris, 2, 159-164. EC2 & Cie,
Paris.
- Sun SessionsSun Sessions2000
-
Sun, R. Sessions, C. 2000.
Self-segmentation of sequences: automatic formation of
hierarchies of sequential behaviors
IEEE Transactions on Systems, Man, and Cybernetics: Part B
Cybernetics, 30(3).
- SuttonSutton1988
-
Sutton, R. S. 1988.
Learning to predict by the methods of temporal
differences
Machine Learning, 3, 9-44.
- SuttonSutton1995
-
Sutton, R. S. 1995.
TD models: Modeling the world at a mixture of time
scales
In Prieditis, A. Russell, S., Machine
Learning: Proceedings of the Twelfth International Conference, 531-539. Morgan Kaufmann Publishers, San Francisco, CA.
- Sutton PinetteSutton Pinette1985
-
Sutton, R. S. Pinette, B. 1985.
The learning of world models by connectionist networks
Proceedings of the 7th Annual Conference of the Cognitive
Science Society, 54-64.
- Sutton, Singh, Precup, RavindranSutton
et al.1999
-
Sutton, R. S., Singh, S., Precup, D., Ravindran, B. 1999.
Improved switching among temporally abstract actions
In Advances in Neural Information Processing Systems 11. MIT
Press.
To appear.
- TellerTeller1994
-
Teller, A. 1994.
The evolution of mental models
In Kenneth E. Kinnear, J., Advances in Genetic
Programming, 199-219. MIT Press.
- TesauroTesauro1994
-
Tesauro, G. 1994.
TD-gammon, a self-teaching backgammon program, achieves
master-level play
Neural Computation, 6(2), 215-219.
- ThamTham1995
-
Tham, C. 1995.
Reinforcement learning of multiple tasks using a hierarchical
CMAC architecture
Robotics and Autonomous Systems, 15(4), 247-274.
- Thrun MöllerThrun Möller1992
-
Thrun, S. Möller, K. 1992.
Active exploration in dynamic environments
In Lippman, D. S., Moody, J. E., Touretzky, D. S.,
Advances in Neural Information Processing Systems 4, 531-538. San
Mateo, CA: Morgan Kaufmann.
- Wang MahadevanWang Mahadevan1998
-
Wang, G. Mahadevan, S. 1998.
A greedy divide-and-conquer approach to optimizing large
manufacturing systems using reinforcement learning
In NIPS'98 Workshop on Abstraction and Hierarchy in
Reinforcement Learning.
- Watkins DayanWatkins Dayan1992
-
Watkins, C. J. C. H. Dayan, P. 1992.
Q-learning
Machine Learning, 8, 279-292.
- WatkinsWatkins1989
-
Watkins, C. 1989.
Learning from Delayed Rewards.
Ph.D. thesis, King's College, Oxford.
- WeissWeiss1994
-
Weiss, G. 1994.
Hierarchical chunking in classifier systems
In Proceedings of the 12th National Conference on Artificial
Intelligence, 2, 1335-1340. AAAI Press/The MIT
Press.
- Weiss SenWeiss Sen1996
-
Weiss, G. Sen, S.. 1996.
Adaption and Learning in Multi-Agent Systems.
LNAI 1042, Springer.
- Wiering SchmidhuberWiering Schmidhuber1998
-
Wiering, M. Schmidhuber, J. 1998.
HQ-learning
Adaptive Behavior, 6(2), 219-246.
- Wiering SchmidhuberWiering Schmidhuber1996
-
Wiering, M. Schmidhuber, J. 1996.
Solving POMDPs with Levin search and EIRA
In Saitta, L., Machine Learning: Proceedings of the
Thirteenth International Conference, 534-542. Morgan Kaufmann
Publishers, San Francisco, CA.
- WilliamsWilliams1992
-
Williams, R. J. 1992.
Simple statistical gradient-following algorithms for
connectionist reinforcement learning
Machine Learning, 8, 229-256.
- WilsonWilson1994
-
Wilson, S. 1994.
ZCS: A zeroth level classifier system
Evolutionary Computation, 2, 1-18.
- WilsonWilson1995
-
Wilson, S. 1995.
Classifier fitness based on accuracy
Evolutionary Computation, 3(2), 149-175.
- Wolpert, Tumer, FrankWolpert
et al.1999
-
Wolpert, D. H., Tumer, K., Frank, J. 1999.
Using collective intelligence to route internet traffic
In Kearns, M., Solla, S. A., Cohn, D., Advances in
Neural Information Processing Systems 12. MIT Press, Cambridge MA.
Juergen Schmidhuber
2003-02-19
Back to Reinforcement Learning and POMDP page