Reinforcement learning problem solver (adapted from an ad in Science):
Jürgen Schmidhuber's page on
Pybrain Machine Learning Library for Robot Learning
mouse in maze, adapted from an ad in Science

REINFORCEMENT LEARNING
AND POMDPs

Source code for some of our RL algorithms in the
Pybrain Machine Learning Library - see video.
Scroll down for papers; click on squares for related web sites
How can autonomous agents learn from pain and pleasure signals to act rationally, that is, to maximize their expected reward? How can they do so in unknown, partially observable environments? Research on Reinforcement Learning (RL) tries to answer such questions.
Work at Schmidhuber's lab has led to the first universal reinforcement learner for essentially arbitrary computable environments - the first optimal general rational agent. Its optimality properties reflect the best possible use of past experiences, but completely ignore issues of computational complexity. So this does not yet quite solve the grand problem of Artificial Intelligence, motivating the recent work on the Goedel machine for universal reinforcement learning with limited computational resources.

Most ongoing and earlier work, however, is a bit less ambitious - see below. Nevertheless, most of it focuses on the general type of situation where the current environmental state is not fully observable by the agent's sensors. This yields a partially observable Markov decision problem (POMDP). Since 1990, Schmidhuber's lab has contributed pioneering POMDP algorithms.
.
Learning Robots
Evolution
Artificial Curiosity
Related links:

Full publication list
(with additional HTML and pdf links)

Learning robots

Compressed Network Search

Evolution main page

RNN evolution

Self-modeling robots

Metasearching & metalearning & self-improvement

Reinforcement learning (RL) economies

Active exploration and curiosity- driven RL

Hierarchical learning & subgoal generation.

Artificial Intelligence

CoTeSys group of Schmidhuber

German home

REINFORCEMENT LEARNING IN PARTIALLY OBSERVABLE WORLDS

Realistic environments are not fully observable. General learning agents need an internal state to memorize important events in case of POMDPs. The essential question is: how can they learn to identify and store those events relevant for further optimal action selection? To address this issue, Schmidhuber has studied reinforcement learners with (a) recurrent neural network value function approximators (1990 -), (b) recurrent network world models (1990 -), (c) actions that address and set internal storage cells, trained by the success-story algorithm (1994 -), (d) direct search in a space of event-memorizing algorithms by policy gradients or artificial evolution or the OOPS or other methods.

78. J. Schmidhuber. On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models. Report arXiv:1210.0118 [cs.AI], 2015.

77. J. Schmidhuber. Deep Learning in Neural Networks: An Overview. (Section 6 is on Deep Reinforcement Learning.) Neural Networks, Volume 61, January 2015, Pages 85-117 (DOI: 10.1016/j.neunet.2014.09.003), published online in 2014. Draft (88 pages, 888 references): Preprint IDSIA-03-14 / arXiv:1404.7828 [cs.NE]. HTML overview page.

76. V. R. Kompella, M. Stollenga, M. Luciw, J. Schmidhuber. Continual curiosity-driven skill acquisition from high-dimensional video inputs for humanoid robots. Artificial Intelligence, 2015, Doi:10.1016/j.artint.2015.02.001.

75. J. Koutnik, G. Cuccu, J. Schmidhuber, F. Gomez. Evolving Large-Scale Neural Networks for Vision-Based Reinforcement Learning. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), Amsterdam, July 2013. PDF.

74. R. K. Srivastava, F. Gomez, J. Schmidhuber. Generalized Compressed Network Search. In C. Coello Coello, V. Cutello, K. Deb, S. Forrest, G. Nicosia, M. Pavone, eds., 12th Int. Conf. on Parallel Problem Solving from Nature - PPSN XII, Taormina, 2012. PDF.

73. J. Schmidhuber. POWERPLAY: Training an Increasingly General Problem Solver by Continually Searching for the Simplest Still Unsolvable Problem. Frontiers in Cognitive Science, 2013. ArXiv preprint (2011): arXiv:1112.5309 [cs.AI]

72. L. Pape, C. M. Oddo, M. Controzzi, C. Cipriani, A. Foerster, M. C. Carrozza, J. Schmidhuber. Learning tactile skills through curious exploration. Frontiers in Neurorobotics 6:6, 2012, doi: 10.3389/fnbot.2012.00006

71. Sun Yi, F. Gomez, J. Schmidhuber. On the Size of the Online Kernel Sparsification Dictionary. Proc. International Conference on Machine Learning ICML 2012, Edinburgh. PDF.

70. L. Gisslen, M. Ring, M. Luciw, J. Schmidhuber. Modular Value Iteration Through Regional Decomposition. In Proc. Fifth Conference on Artificial General Intelligence (AGI-12), Oxford, UK, 2012. PDF.

69. V. R. Kompella, M. Luciw, M. Stollenga, L. Pape, J. Schmidhuber. Autonomous Learning of Abstractions using Curiosity-Driven Modular Incremental Slow Feature Analysis. Proc. IEEE Conference on Development and Learning / EpiRob 2012 (ICDL-EpiRob'12), San Diego, 2012.

68. R. K. Srivastava, B. Steunebrink, J. Schmidhuber Continually Adding Self-Invented Problems to the Repertoire: First Experiments with PowerPlay. Proc. IEEE Conference on Development and Learning / EpiRob 2012 (ICDL-EpiRob'12), San Diego, 2012. PDF.

67. M. Luciw, J. Schmidhuber. Low Complexity Proto-Value Function Updating with Incremental Slow Feature Analysis. Proc. International Conference on Artificial Neural Networks (ICANN 2012), Lausanne, 2012. PDF.

66. H. Ngo, M. Luciw, A. Foerster, J. Schmidhuber. Learning Skills from Play: Artificial Curiosity on a Katana Robot Arm. Proc. IJCNN 2012. PDF. Video.

65. M. Ring, T. Schaul, J. Schmidhuber. The Two-Dimensional Organization of Behavior. In Proc. Joint IEEE International Conference on Development and Learning (ICDL) and on Epigenetic Robotics (ICDL-EpiRob 2011), Frankfurt, 2011. PDF.

64. Yi Sun, F. Gomez, M. Ring, J. Schmidhuber. Incremental Basis Construction from Temporal Difference Error. Proceedings of the 28th International Conference on Machine Learning (ICML-11), 2011. PDF.

63. V. Graziano, J. Koutnik, J. Schmidhuber. Unsupervised Modeling of Partially Observable Environments. 22nd European Conference on Machine Learning ECML, Athens, 2011. PDF.

62. T. Schaul, Yi Sun, D. Wierstra, F. Gomez, J. Schmidhuber. Curiosity-Driven Optimization. IEEE Congress on Evolutionary Computation (CEC-2011), 2011. PDF.

61. G. Cuccu, M. Luciw, J. Schmidhuber, F. Gomez. Intrinsically Motivated Evolutionary Search for Vision-Based Reinforcement Learning. In Proc. Joint IEEE International Conference on Development and Learning (ICDL) and on Epigenetic Robotics (ICDL-EpiRob 2011), Frankfurt, 2011. PDF.

60. H. Ngo, M. Ring, J. Schmidhuber. Curiosity Drive based on Compression Progress for Learning Environment Regularities. In Proc. Joint IEEE International Conference on Development and Learning (ICDL) and on Epigenetic Robotics (ICDL-EpiRob 2011), Frankfurt, 2011.

59. M. Luciw, V. Graziano, M. Ring, J. Schmidhuber. Artificial Curiosity with Planning for Autonomous Visual and Perceptual Development. In Proc. Joint IEEE International Conference on Development and Learning (ICDL) and on Epigenetic Robotics (ICDL-EpiRob 2011), Frankfurt, 2011. PDF.

58. Sun Yi, F. Gomez, J. Schmidhuber. Planning to Be Surprised: Optimal Bayesian Exploration in Dynamic Environments. In Proc. Fourth Conference on Artificial General Intelligence (AGI-11), Google, Mountain View, California, 2011. PDF.

57. T. Glasmachers, J. Schmidhuber. Optimal Direct Policy Search. In Proc. Fourth Conference on Artificial General Intelligence (AGI-11), Google, Mountain View, California, 2011. PDF.

56. L. Gisslen, M. Luciw, V. Graziano, J. Schmidhuber. Sequential Constant Size Compressors and Reinforcement Learning. In Proc. Fourth Conference on Artificial General Intelligence (AGI-11), Google, Mountain View, California, 2011. PDF. Kurzweil Prize for Best AGI Paper 2011.

55. B. Steunebrink, J. Schmidhuber. A Family of Gödel Machine Implementations. In Proc. Fourth Conference on Artificial General Intelligence (AGI-11), Google, Mountain View, California, 2011. PDF.

54. J. Schmidhuber. Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990-2010). IEEE Transactions on Autonomous Mental Development, 2(3):230-247, 2010. IEEE link. PDF of draft.

53. T. Schaul and J. Schmidhuber. Metalearning. Scholarpedia, 5(6):4650, 2010.

52. F. Sehnke, C. Osendorfer, T. Rückstiess, A. Graves, J. Peters, J. Schmidhuber. Parameter-exploring policy gradients. Neural Networks 23(2), 2010. PDF.

51. T. Schaul, J. Bayer, D. Wierstra, S. Yi, M. Felder, F. Sehnke, T. Rückstiess, J. Schmidhuber. PyBrain. Journal of Machine Learning Research (JMLR), 11:743-746, 2010. PDF. (See Pybrain video.)

50. T. Rückstiess, F. Sehnke, T. Schaul, D. Wierstra, S. Yi, J. Schmidhuber. Exploring Parameter Space in Reinforcement Learning. Paladyn Journal of Behavioral Robotics, 2010. PDF.

49. D. Wierstra, A. Förster, J. Peters, J. Schmidhuber. Recurrent Policy Gradients. Logic Journal of IGPL, 18:620-634, 2010 (doi:10.1093/jigpal/jzp049; advance access published 2009). PDF.

48. J. Schmidhuber. Ultimate Cognition à la Gödel. Cognitive Computation 1(2):177-193, 2009. PDF. (Springer.)

47. J. Schmidhuber. Simple Algorithmic Theory of Subjective Beauty, Novelty, Surprise, Interestingness, Attention, Curiosity, Creativity, Art, Science, Music, Jokes. Journal of SICE, 48(1):21-32, 2009. PDF. Extended version (2008, revised 2009): arXiv:0812.4360; PDF (Dec 2008); PDF (April 2009).

46. D. Wierstra, A. Foerster, J. Peters, J. Schmidhuber. Recurrent Policy Gradients. Journal of Algorithms, 2009, in press. PDF.

45. S. Yi, D. Wierstra, T. Schaul, J. Schmidhuber. Stochastic Search using the Natural Gradient. Proceedings of the 26th International Conference on Machine Learning (ICML-09), Montreal, 2009. PDF.

44. J. Togelius, T. Schaul, D. Wierstra, C. Igel, F. Gomez, J. Schmidhuber. Ontogenetic and Phylogenetic Reinforcement Learning. Kuenstliche Intelligenz, 2009, in press. PDF.

43. F. J. Gomez, J. Togelius, J. Schmidhuber. Measuring and Optimizing Behavioral Complexity for Evolutionary Reinforcement Learning . Proceedings of the 19th International Conference on Artificial Neural Networks (ICANN-09), Cyprus, 2009. PDF.

42. F. Gomez, J. Schmidhuber, R. Miikkulainen. Accelerated Neural Evolution through Cooperatively Coevolved Synapses. Journal of Machine Learning Research (JMLR), 9:937-965, 2008. PDF.

41. J. Schmidhuber. Driven by Compression Progress. In Knowledge-Based Intelligent Information and Engineering Systems KES-2008, Lecture Notes in Computer Science LNCS 5177, p 11, Springer, 2008. (Abstract of invited keynote talk.) PDF.

40. T. Rückstiess, M. Felder, J. Schmidhuber. State-Dependent Exploration for Policy Gradient Methods. 19th European Conference on Machine Learning ECML, 2008. PDF.

39. T. Schaul and J. Schmidhuber. A Scalable Neural Network Architecture for Board Games. Proceedings of the 2008 IEEE Symposium on Computational Intelligence in Games CIG-2008, Perth, Australia, 2008, in press. PDF.

38. F. Sehnke, C. Osendorfer, T. Rückstiess, A. Graves, J. Peters, and J. Schmidhuber. Policy gradients with parameter-based exploration for control. In J. Koutnik V. Kurkova, R. Neruda, editors, Proceedings of the International Conference on Artificial Neural Networks ICANN-2008 ICANN 2008, Prague, LNCS 5163, pages 387-396. Springer-Verlag Berlin Heidelberg, 2008. PDF.

37. D. Wierstra, T. Schaul, J. Peters, J. Schmidhuber. Episodic Reinforcement Learning by Logistic Reward-Weighted Regression. In J. Koutnik V. Kurkova, R. Neruda, editors, Proceedings of the International Conference on Artificial Neural Networks ICANN-2008 ICANN 2008, Prague. Springer-Verlag Berlin Heidelberg, 2008. PDF.

36. D. Wierstra, T. Schaul, J. Peters, J. Schmidhuber. Fitness Expectation Maximization. Proceedings of Parallel Problem Solving from Nature PPSN-2008, Dortmund, 2008. PDF.

35. D. Wierstra, T. Schaul, J. Peters, J. Schmidhuber. Natural Evolution Strategies. Proceedings of IEEE Congress on Evolutionary Computation CEC-2008, Hongkong, 2008. PDF.

34. D. Wierstra, J. Schmidhuber. Policy Gradient Critics. 18th European Conference on Machine Learning ECML, Warszaw, 2007. PDF.

33. D. Wierstra, A. Foerster, J. Peters, J. Schmidhuber. Solving Deep Memory POMDPs with Recurrent Policy Gradients. Intl. Conf. on Artificial Neural Networks ICANN'07, 2007. PDF.

32. J. Schmidhuber. Developmental Robotics, Optimal Artificial Curiosity, Creativity, Music, and the Fine Arts. Connection Science, 18(2): 173-187, June 2006. PDF.

31. F. Gomez, J. Schmidhuber, and R. Miikkulainen (2006). Efficient Non-Linear Control through Neuroevolution. Proceedings of the European Conference on Machine Learning (ECML-06, Berlin). PDF. A new, general method that outperforms many others on difficult control tasks.

30. J. Schmidhuber. Completely Self-Referential Optimal Reinforcement Learners. In W. Duch et al. (Eds.): Proc. Intl. Conf. on Artificial Neural Networks ICANN'05, LNCS 3697, pp. 223-233, Springer-Verlag Berlin Heidelberg, 2005 (plenary talk). PDF.

29. F. J. Gomez and J. Schmidhuber. Evolving modular fast-weight networks for control. In W. Duch et al. (Eds.): Proc. Intl. Conf. on Artificial Neural Networks ICANN'05, LNCS 3697, pp. 383-389, Springer-Verlag Berlin Heidelberg, 2005. Featuring a 3-wheeled reinforcement learning robot (with distance sensors) that learns without a teacher to balance two poles with a joint indefinitely in a confined 3D environment. PDF.

28. B. Bakker and J. Schmidhuber. Hierarchical Reinforcement Learning Based on Subgoal Discovery and Subpolicy Specialization (PDF). In F. Groen, N. Amato, A. Bonarini, E. Yoshida, and B. Kröse (Eds.), Proceedings of the 8-th Conference on Intelligent Autonomous Systems, IAS-8, Amsterdam, The Netherlands, p. 438-445, 2004.

27. J. Schmidhuber. Optimal Ordered Problem Solver. Machine Learning, 54, 211-254, 2004. PDF. HTML. HTML overview.

26. B. Bakker, V. Zhumatiy, G. Gruener, and J. Schmidhuber. A Robot that Reinforcement-Learns to Identify and Memorize Important Previous Observations (PDF). In Proceedings of the 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2003.

25. J. Schmidhuber. Bias-Optimal Incremental Problem Solving. In S. Becker, S. Thrun, K. Obermayer, eds., Advances in Neural Information Processing Systems 15, NIPS'15, MIT Press, Cambridge MA, p. 1571-1578, 2003. PDF . HTML. (Compact version of Optimal Ordered Problem Solver. )

24. B. Bakker, F. Linaker, J. Schmidhuber. Reinforcement Learning in Partially Observable Mobile Robot Domains Using Unsupervised Event Extraction. In Proceedings of the 2002 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2002), Lausanne, 2002. PDF .

23. B. Bakker. Reinforcement Learning with Long Short-Term Memory. Advances in Neural Information Processing Systems 13 (NIPS'13), 2002. (On J. Schmidhuber's CSEM grant 2002.)

22. J. Schmidhuber. Sequential decision making based on direct search. In R. Sun and C. L. Giles, eds., Sequence Learning: Paradigms, Algorithms, and Applications. Lecture Notes on AI 1828, p. 203-240, Springer, 2001. PDF . HTML.

21. I. Kwee, M. Hutter, J. Schmidhuber. Market-Based Reinforcement Learning in Partially Observable Worlds. In G. Dorffner, H. Bischof, K. Hornik, eds., Proceedings of Int. Conf. on Artificial Neural Networks ICANN'01, Vienna, LNCS 2130, pages 865-873, Springer, 2001.

21. M. Wiering and J. Schmidhuber. HQ-Learning. Adaptive Behavior 6(2):219-246, 1997 (122 K). PDF . HTML.

20. R. Salustowicz and M. Wiering and J. Schmidhuber. Learning team strategies: soccer case studies. Machine Learning, 1999 (127 K).

19. J. Schmidhuber, J. Zhao, and M. Wiering. Shifting inductive bias with success-story algorithm, adaptive Levin search, and incremental self-improvement. Machine Learning 28:105-130, 1997. PDF . Flawed HTML.

18. J.  Schmidhuber, J.  Zhao, N. Schraudolph. Reinforcement learning with self-modifying policies. In S. Thrun and L. Pratt, eds., Learning to learn, Kluwer, pages 293-309, 1997. Postscript; PDF; HTML.

17. R. Salustowicz and J. Schmidhuber. Probabilistic incremental program evolution. Evolutionary Computation, 5(2):123-141, 1997.

16. M. Wiering and J. Schmidhuber. Solving POMDPs using Levin search and EIRA. In L. Saitta, ed., Machine Learning: Proceedings of the 13th International Conference, pages 534-542, Morgan Kaufmann Publishers, San Francisco, CA, 1996. PDF . HTML.

15. M. Wiering and J. Schmidhuber. HQ-Learning: Discovering Markovian subgoals for non-Markovian reinforcement learning. Technical Report IDSIA-95-96, IDSIA, October 1996.

14. J.  Schmidhuber and J.  Zhao and M.  Wiering. Simple principles of metalearning. Technical Report IDSIA-69-96, IDSIA, June 1996.

13. J. Schmidhuber. On learning how to learn learning strategies. Technical Report FKI-198-94, Fakultät für Informatik, Technische Universität München, November 1994.

12. J.  Schmidhuber. Reinforcement learning in Markovian and non-Markovian environments. In D. S. Lippman, J. E. Moody, and D. S. Touretzky, editors, Advances in Neural Information Processing Systems 3, NIPS'3, pages 500-506. San Mateo, CA: Morgan Kaufmann, 1991. PDF . HTML.

11. J. Schmidhuber and R. Huber. Learning to generate artificial fovea trajectories for target detection. International Journal of Neural Systems, 2(1 & 2):135-141, 1991 (50 K - figures omitted!). PDF . HTML.

10. J.  Schmidhuber and R. Huber. Using sequential adaptive neuro-control for efficient learning of rotation and translation invariance. In T. Kohonen, K. Mäkisara, O. Simula, and J. Kangas, editors, Artificial Neural Networks, pages 315-320. Elsevier Science Publishers B.V., North-Holland, 1991.

9. J.  Schmidhuber. Learning algorithms for networks with internal and external feedback. In D. S. Touretzky, J. L. Elman, T. J. Sejnowski, and G. E. Hinton, editors, Proc. of the 1990 Connectionist Models Summer School, pages 52-61. San Mateo, CA: Morgan Kaufmann, 1990.

8. J.  Schmidhuber. An on-line algorithm for dynamic reinforcement learning and planning in reactive environments. In Proc. IEEE/INNS International Joint Conference on Neural Networks, San Diego, volume 2, pages 253-258, 1990.

7. J.  Schmidhuber. Reinforcement learning with interacting continually running fully recurrent networks. In Proc. INNC International Neural Network Conference, Paris, volume 2, pages 817-820, 1990.

6. J.  Schmidhuber. Temporal-difference-driven learning in recurrent networks. In R. Eckmiller, G. Hartmann, and G. Hauske, editors, Parallel Processing in Neural Systems and Computers, pages 209-212. North-Holland, 1990.

5. J.  Schmidhuber. Reinforcement-Lernen und adaptive Steuerung. Nachrichten Neuronale Netze, 2:1-3, 1990.

4. J.  Schmidhuber. Making the world differentiable: On using fully recurrent self-supervised neural networks for dynamic reinforcement learning and planning in non-stationary environments. Technical Report FKI-126-90, Institut für Informatik, Technische Universität München, February 1990 (revised in November). PDF.

3. J.  Schmidhuber. Networks adjusting networks. In J. Kindermann and A. Linden, editors, Proceedings of `Distributed Adaptive Neural Information Processing', St.Augustin, 24.-25.5. 1989, pages 197-208. Oldenbourg, 1990. Extended version: TR FKI-125-90 (revised), Institut für Informatik, TUM.

2. J.  Schmidhuber. Dynamische neuronale Netze und das fundamentale raumzeitliche Lernproblem. (341 K), (Dynamic neural nets and the fundamental spatio-temporal credit assignment problem.) Dissertation, Institut für Informatik, Technische Universität München, 1990. PDF . HTML.

1. J. Schmidhuber. A local learning algorithm for dynamic feedforward and recurrent networks. Connection Science, 1(4):403-412, 1989. (The Neural Bucket Brigade - figures omitted!). PDF. HTML.



REINFORCEMENT LEARNING IN FULLY OBSERVABLE WORLDS

Most mainstream reinforcement learning assumes that the learner's current input tells it everything about the environmental state (assumption of full observability). This is often unrealistic but makes things much easier. Important work on this dynamic programming-related type of RL has been done by Samuel, Barto, Sutton, Anderson, Watkins, Dayan, Kaelbling, Moore, Dietterich, Singh, Kearns, and many others. Our contributions include:

5. B. Bakker, V. Zhumatiy, G. Gruener, J. Schmidhuber. Quasi-Online Reinforcement Learning for Robots. Proceedings of the International Conference on Robotics and Automation (ICRA-06), Orlando, Florida, 2006. PDF. A reinforcement learning vision-based robot that learns to build a simple model of the world and itself. To figure out how to achieve rewards in the real world, it performs numerous `mental' experiments using the adaptive world model.

4. M. Wiering and J. Schmidhuber. Fast online Q(lambda). Machine Learning, 1998 (80 K).

3. M. Wiering and J. Schmidhuber. Efficient model-based exploration. In R. Pfeiffer, B. Blumberg, J. Meyer, S. W. Wilson, eds., From Animals to Animats 5: Proceedings of the Fifth International Conference on Simulation of Adaptive Behavior, p. 223-228, MIT Press, 1998.

2. J. Storck, S. Hochreiter, and J.  Schmidhuber. Reinforcement-driven information acquisition in non-deterministic environments. In Proc. ICANN'95, vol. 2, pages 159-164. EC2 & CIE, Paris, 1995. PDF . HTML.

1. J.  Schmidhuber. Curious model-building control systems. In Proc. International Joint Conference on Neural Networks, Singapore, volume 2, pages 1458-1463. IEEE, 1991. PDF . HTML.

Metalearning
RNN-Evolution
The formal theory of creativity by Juergen Schmidhuber explains the desire to learn motor skills, to do science, to produce art
Resilient machine with Continuous Self-Modeling
Subgoal learning
Goedel machine

Master's Degree in Informatics with a Major in Intelligent Systems -  a master's in computer science, with a specialization in Artificial Intelligence
GP helix
2011: First Superhuman Visual Pattern Recognition
My Deep Learning since 1991