Jürgen Schmidhuber's page on


Subgoals found
in reference 3
There is no teacher providing useful intermediate subgoals for our hierarchical reinforcement learning systems. Refs [1-4] use gradient-based subgoal generators, refs [5-7] search in discrete subgoal space, refs [10-11] use recurrent networks to deal with partial observability (the latter is an almost automatic consequence of realistic hierarchical reinforcement learning). Ref [12] lets many reinforcement learning modules self-organize in motor cortex-like sheets.
Our more general metalearning systems as well as the optimal search algorithms also automatically generate subgoals when necessary.

Related links:

Full publication list
(with additional HTML and pdf links)

Reinforcement Learning

Recurrent Networks


Active Exploration

Learning robots

TU Munich Cogbotlab

CoTeSys group


12. M. Ring, T. Schaul, J. Schmidhuber. The Two-Dimensional Organization of Behavior. In Proc. Joint IEEE International Conference on Development and Learning (ICDL) and on Epigenetic Robotics (ICDL-EpiRob 2011), Frankfurt, 2011. PDF.

11. B. Bakker and J. Schmidhuber. Hierarchical Reinforcement Learning Based on Subgoal Discovery and Subpolicy Specialization (PDF). In F. Groen, N. Amato, A. Bonarini, E. Yoshida, and B. Kröse (Eds.), Proceedings of the 8-th Conference on Intelligent Autonomous Systems, IAS-8, Amsterdam, The Netherlands, p. 438-445, 2004.

10. B. Bakker and J. Schmidhuber. Hierarchical Reinforcement Learning with Subpolicies Specializing for Learned Subgoals (PDF). In M. H. Hamza (Ed.), Proceedings of the 2nd IASTED International Conference on Neural Networks and Computational Intelligence, NCI 2004, Grindelwald, Switzerland, p. 125-130, 2004.

9. An optimal way of creating and solving subgoals in general reinforcement learning settings is the Goedel machine (J. Schmidhuber, 2003).

8. A bias-optimal way of creating and solving subgoals in the context of ordered problem sequences is the Optimal Ordered Problem Solver (J. Schmidhuber, 2002-2004).

7. R.  Salustowicz and J.  Schmidhuber. Learning to predict through PIPE and automatic task decomposition. Technical Report IDSIA-11-98, IDSIA, April 1998.

6. M. Wiering and J. Schmidhuber. HQ-Learning. Adaptive Behavior 6(2):219-246, 1997 (122 K). PDF . HTML.

5. M. Wiering and J. Schmidhuber. HQ-Learning: Discovering Markovian subgoals for non-Markovian reinforcement learning. Technical Report IDSIA-95-96, IDSIA, October 1996.

4. J.  Schmidhuber. Netzwerkarchitekturen, Zielfunktionen und Kettenregel. (Net architectures, objective functions, and chain rule.) Habilitation (postdoctoral thesis - qualification for a tenure professorship), Institut für Informatik, Technische Universität München, 1993 (496 K). PDF . HTML.

3. J.  Schmidhuber and R. Wahnsiedler. Planning simple trajectories using neural subgoal generators. In J. A. Meyer, H. L. Roitblat, and S. W. Wilson, editors, Proc. of the 2nd International Conference on Simulation of Adaptive Behavior, pages 196-202. MIT Press, 1992. PDF . HTML without images. HTML & images in German.

2. J.  Schmidhuber. Learning to generate sub-goals for action sequences. In T. Kohonen, K. Mäkisara, O. Simula, and J. Kangas, editors, Artificial Neural Networks, pages 967-972. Elsevier Science Publishers B.V., North-Holland, 1991. PDF . HTML. HTML & images in German.

1. J.  Schmidhuber. Towards compositional learning with dynamic neural networks. Technical Report FKI-129-90, Institut für Informatik, Technische Universität München, 1990.

Related work on hierarchies of Recurrent Neural Networks with multiple self-organizing time scales:

(B) J. Schmidhuber. Learning complex, extended sequences using the principle of history compression. Neural Computation, 4(2):234-242, 1992 (41 K). PDF. HTML.

(A) J.  Schmidhuber. Learning unambiguous reduced sequence descriptions. In J. E. Moody, S. J. Hanson, and R. P. Lippman, editors, Advances in Neural Information Processing Systems 4, NIPS'4, pages 291-298. San Mateo, CA: Morgan Kaufmann, 1992. PDF . HTML.

Reinforcement Learning Feedback Network Optimal Ordered Problem Solver Metalearning Artificial Curiosity Learning Robots Cogbotlab