.
Banquet Talk in the historic Palau
de la Musica Catalana for Joint Conferences ECML / PKDD 2010,
Barcelona:
.
TU Munich Cogbotlab
Jürgen Schmidhuber's theory of

ACTIVE EXPLORATION,
ARTIFICIAL CURIOSITY &
WHAT'S INTERESTING

Attentive vision Reinforcement Learning

See also the more recent overview blog post (2021) on artificial curiosity and creativity since 1990!


Only data with still unknown but learnable statistical or algorithmic regularities are truly novel or surprising or interesting and thus deserve attention.

Even beautiful things are not necessarily interesting. Beauty reflects low complexity with respect to the observer's current knowledge, interestingness and curiosity the learning process leading from high to low subjective complexity. More.

ON TV: Schmidhuber's theory of interestingness / curiosity / beauty / surprise / novelty / creativity was subject of a TV documentary (BR "Faszination Wissen", 29 May 2008, 21:15, plus several later repeats on other channels).

See also an interview in HPlus Magazine: Build Optimal Scientist, Then Retire. This got slashdotted in 2010.


What's interesting? Many interesting things are unexpected, but not all unexpected things are interesting or surprising. According to Schmidhuber's simple formal theory of surprise & novelty & interestingness & attention & creativity & intrinsic motivation, curious agents are interested in learnable but yet unknown regularities, and get bored by both predictable and inherently unpredictable things. His active reinforcement learners translate mismatches between expectations and reality into curiosity rewards or intrinsic rewards for curious, creative, exploring agents which like to observe / create truly surprising aspects of the world, to learn novel patterns [references 1-21 below; 1990-2010]. His first curiosity- driven, creative agents [1,2] (1990) used an adaptive predictor or data compressor to predict the next input, given some history of actions and inputs. The action- generating, reward- maximizing controller got rewarded for action sequences provoking still unpredictable inputs. To discourage the controller from focusing on truly unpredictable, random inputs (such as uninteresting details of white noise), later approaches [e.g., refs 3, 4, 6, 1991-] model the expected progress of the predictor: parts of the world where the predictor fails to learn (no data compression progress!) become less interesting than those where its predictions improve. Later systems (1997-) also take into account the computational cost of learning new skills in systems that learn when to learn and what to learn [refs 8, 11, 12]. More recent papers (2006-) focus on mathematically optimal artificial curiosity & creativity, and provide a simple formal explanation of art & science & humor [refs 14-21]. For recent variants and applications see refs [22-28].
.

Above: curiosity is not necessarily good for you!

Nevertheless, we show [e.g., refs 3, 4, 6, 8, 12 below] that intrinsic curiosity reward can speed up the construction of predictive world models and the collection of external reward.

More recent work on PowerPlay [29,30,31] uses artificial creativity & curiosity to incrementally build a more and more general problem solver. It does not just solve given tasks but keeps inventing new ones, without forgetting old skills. How? By continually searching for the simplest (fastest to find) still unsolvable task and its solution.
.

Low- complexity Art
Unsupervised learning and predictability minimization
CoTeSys: Schmidhuber's group
Resilient machine with Continuous Self-Modeling
The formal theory of creativity by Juergen Schmidhuber explains the desire to learn motor skills, to do science, to produce art
Femme Fractale as an example of low-complexity art, illustrating the formal theory of creativity (Juergen Schmidhuber)
Learning Robots
Scroll this page down for papers and videos of talks on surprise, novelty, artificial creativity and curiosity. Last update 2012.

Fundamental Principle of Artificial Curiosity and Creativity:

Reward the reward- optimizing controller for actions yielding data that cause improvements of the adaptive predictor or data compressor!

(Formulated in the early 1990s; basis of much of the recent work in Developmental Robotics since 2004)

Variant 1: Reward the controller whenever the predictor errs [1990; refs 1a, 1, 2]. The predictor minimizes the objective function maximized by the generative controller. The first Generative Adversarial Networks of 1990!

Variant 2: Reward the controller whenever the predictor improves / becomes more reliable [1991; refs 3, 4, 6, 13, 14].

Variant 3: Reward the controller in proportion to the Kullback-Leibler distance between the predictor's subjective probability distributions before and after an observation - the relative entropy between its prior and posterior [1995; ref 6].

Variant 4 (zero sum intrinsic reward games): Two reward- maximizing modules bet on outcomes of potentially surprising experiments they have agreed upon [1997-2002; refs 8, 11, 12].

Variant 5 (progress in data compression): Store entire life, keep trying to compress it, reward controller for actions that yield data causing compressor improvements [1990s - 2008; e.g., refs 14-17].

Both art and science are by-products of the desire to create / discover more data that is compressible in hitherto unknown ways! [Refs 14-21, 35]

Greedy but practical Variant 6 (PowerPlay): Incrementally build a more and more general problem solver as follows. Systematically generate pairs of new (possibly self-invented) tasks and modifications of the current problem solver (where subjectively simple, low-complexity pairs come first), until a more powerful problem solver is found that provably solves all previously learned tasks plus the new one, while the unmodified predecessor does not. New skills may (partially) re-use previously learned skills, that is, tasks and solver modifications that used to be subjectively complex may become subjectively simple. Wow-effects are achieved by continually making previously learned skills more computationally efficient such that they require less time and storage space [2011-; e.g., refs 29-31].

Related links:
1. Formal Theory of Creativity explains science, art, music, humor
2. Reinforcement learning
3. Recurrent network predictors
4. Learning attentive vision
5. Reinforcement learning economies
6. Learning to learn
7. Learning robots
8. Self-modeling robots
9. Hierarchical learning & subgoal generation
10. Beauty
11. Low-Complexity Art
12. Femmes Fractales
13. CoTeSys group
14. Full publication list

Recent videos / invited talks on Creativity, Curiosity, Beauty, Novel Patterns, True Surprise & Novelty, Art & Science & Humor:

13 June 2012: JS featured in Through the Wormhole with Morgan Freeman on the Science Channel. See Teaser Video and more.

20 Jan 2012: TEDx Talk (uploaded 10 March) at TEDx Lausanne: When creative machines overtake man (12:47).

15 Jan 2011: Winter Intelligence Conference, Oxford (on universal AI and theory of fun). See video at Vimeo of Sept 2011. Also available at youtube.

22 Sep 2010: Banquet Talk in Palau de la Musica Catalana for Joint Conferences ECML / PKDD 2010, Barcelona: Formal Theory of Fun & Creativity. 4th slide. All slides and a video of this talk (Dec 14) at videolectures.net

12 Nov 2009: Keynote in Cinema Corso (Lugano) for Multiple Ways to Design Research 09: Art & Science

3 Oct 2009: Invited talk for Singularity Summit, New York City. See original video (40 min). Or save time by watching the condensed but jagged video (20 min), also available at the ShanghAI Lectures. Save even more time by watching the short video (10 min, also at the bottom of this page).

25 Aug 2009: Dirac summer school, Leuven, Belgium

12 Jul 2009: Dagstuhl Castle Seminar on Computational Creativity

3 Sep 2008: Keynote for Knowledge-Based and Intelligent Information & Engineering Systems KES 2008, Zagreb

2 Oct 2007: Joint invited lecture for Algorithmic Learning Theory (ALT 2007) and Discovery Science (DS 2007), Sendai, Japan (the only joint invited lecture). Preprint

23 Aug 2007: Keynote for A*STAR Meeting on Expectation & Surprise, Singapore

12 July 2007: Keynote for Art Meets Science 2007: "Randomness vs simplicity & beauty in physics and the fine arts"

Fibonacci web design
by J. Schmidhuber

36. J. Schmidhuber. Maximizing Fun By Creating Data With Easily Reducible Subjective Complexity. In G. Baldassarre and M. Mirolli (eds.), Roadmap for Intrinsically Motivated Learning. Springer, 2012, in press.

35. J. Schmidhuber. A Formal Theory of Creativity to Model the Creation of Art. In J. McCormack (ed.), Computational Creativity. MIT Press, 2012. PDF of older preprint.

34. L. Pape, C. M. Oddo, M. Controzzi, C. Cipriani, A. Foerster, M. C. Carrozza, J. Schmidhuber. Learning tactile skills through curious exploration. Frontiers in Neurorobotics 6:6, 2012, doi: 10.3389/fnbot.2012.00006

33. H. Ngo, M. Luciw, A. Foerster, J. Schmidhuber. Learning Skills from Play: Artificial Curiosity on a Katana Robot Arm. Proc. IJCNN 2012. PDF. Video.

32. V. R. Kompella, M. Luciw, M. Stollenga, L. Pape, J. Schmidhuber. Autonomous Learning of Abstractions using Curiosity-Driven Modular Incremental Slow Feature Analysis. Proc. IEEE Conference on Development and Learning / EpiRob 2012 (ICDL-EpiRob'12), San Diego, 2012, in press.

31. R. K. Srivastava, B. Steunebrink, J. Schmidhuber. First Experiments with PowerPlay. Neural Networks, 2013. ArXiv preprint (2012): arXiv:1210.8385 [cs.AI].

30. R. K. Srivastava, B. R. Steunebrink, M. Stollenga, J. Schmidhuber. Continually Adding Self-Invented Problems to the Repertoire: First Experiments with POWERPLAY. Proc. IEEE Conference on Development and Learning / EpiRob 2012 (ICDL-EpiRob'12), San Diego, 2012. PDF.

29. J. Schmidhuber. POWERPLAY: Training an Increasingly General Problem Solver by Continually Searching for the Simplest Still Unsolvable Problem. Frontiers in Cognitive Science, 2013. ArXiv preprint (2011): arXiv:1112.5309 [cs.AI]

28. Sun Yi, F. Gomez, J. Schmidhuber. Planning to Be Surprised: Optimal Bayesian Exploration in Dynamic Environments. In Proc. Fourth Conference on Artificial General Intelligence (AGI-11), Google, Mountain View, California, 2011. PDF.

27. V. Graziano, T. Glasmachers, T. Schaul, L. Pape, G. Cuccu, J. Leitner, J. Schmidhuber. Artificial Curiosity for Autonomous Space Exploration. Acta Futura 4:41-51, 2011 (DOI: 10.2420/AF04.2011.41). PDF.

26. G. Cuccu, M. Luciw, J. Schmidhuber, F. Gomez. Intrinsically Motivated Evolutionary Search for Vision-Based Reinforcement Learning. In Proc. Joint IEEE International Conference on Development and Learning (ICDL) and on Epigenetic Robotics (ICDL-EpiRob 2011), Frankfurt, 2011. PDF.

25. M. Luciw, V. Graziano, M. Ring, J. Schmidhuber. Artificial Curiosity with Planning for Autonomous Visual and Perceptual Development. In Proc. Joint IEEE International Conference on Development and Learning (ICDL) and on Epigenetic Robotics (ICDL-EpiRob 2011), Frankfurt, 2011. PDF.

24. T. Schaul, L. Pape, T. Glasmachers, V. Graziano J. Schmidhuber. Coherence Progress: A Measure of Interestingness Based on Fixed Compressors. In Proc. Fourth Conference on Artificial General Intelligence (AGI-11), Google, Mountain View, California, 2011. PDF.

23. T. Schaul, Yi Sun, D. Wierstra, F. Gomez, J. Schmidhuber. Curiosity-Driven Optimization. IEEE Congress on Evolutionary Computation (CEC-2011), 2011. PDF.

22. H. Ngo, M. Ring, J. Schmidhuber. Curiosity Drive based on Compression Progress for Learning Environment Regularities. In Proc. Joint IEEE International Conference on Development and Learning (ICDL) and on Epigenetic Robotics (ICDL-EpiRob 2011), Frankfurt, 2011.

21. J. Schmidhuber. Formal Theory of Creativity, Fun, and Intrinsic Motivation (1990-2010). IEEE Transactions on Autonomous Mental Development, 2(3):230-247, 2010. IEEE link. PDF of draft.

20. J. Schmidhuber. Artificial Scientists & Artists Based on the Formal Theory of Creativity. In Proceedings of the Third Conference on Artificial General Intelligence (AGI-2010), Lugano, Switzerland. PDF.

19. J. Schmidhuber. Art & science as by-products of the search for novel patterns, or data compressible in unknown yet learnable ways. In M. Botta (ed.), Multiple ways to design research. Research cases that reshape the design discipline, Milano-Lugano, Swiss Design Network - Et al. Edizioni, 2009, pp. 98-112. (Keynote talk.) PDF of preprint.

18. J. Schmidhuber. Driven by Compression Progress: A Simple Principle Explains Essential Aspects of Subjective Beauty, Novelty, Surprise, Interestingness, Attention, Curiosity, Creativity, Art, Science, Music, Jokes. Based on keynote talk for KES 2008 (below) and joint invited lecture for ALT 2007 / DS 2007 (below). Short version: ref 17 below. Long version in G. Pezzulo, M. V. Butz, O. Sigaud, G. Baldassarre, eds.: Anticipatory Behavior in Adaptive Learning Systems, from Sensorimotor to Higher-level Cognitive Capabilities, Springer, LNAI, 2009. Preprint (2008, revised 2009): arXiv:0812.4360. PDF (Dec 2008). PDF (April 2009).

17. J. Schmidhuber. Simple Algorithmic Theory of Subjective Beauty, Novelty, Surprise, Interestingness, Attention, Curiosity, Creativity, Art, Science, Music, Jokes. Journal of SICE, 48(1):21-32, 2009. PDF.

16. J. Schmidhuber. Driven by Compression Progress. In Proc. Knowledge- Based Intelligent Information and Engineering Systems KES-2008, Lecture Notes in Computer Science LNCS 5177, p 11, Springer, 2008. (Abstract of invited keynote talk.) PDF.

15. J. Schmidhuber. Simple Algorithmic Principles of Discovery, Subjective Beauty, Selective Attention, Curiosity & Creativity. In V. Corruble, M. Takeda, E. Suzuki, eds., Proc. 10th Intl. Conf. on Discovery Science (DS 2007) p. 26-38, LNAI 4755, Springer, 2007. Also in M. Hutter, R. A. Servedio, E. Takimoto, eds., Proc. 18th Intl. Conf. on Algorithmic Learning Theory (ALT 2007) p. 32, LNAI 4754, Springer, 2007. (Joint invited lecture for DS 2007 and ALT 2007, Sendai, Japan, 2007.) Preprint: arxiv:0709.0674. PDF.
Curiosity as the drive to improve the compression of the lifelong sensory input stream: interestingness as the first derivative of subjective "beauty" or compressibility.

14. J.  Schmidhuber. Developmental Robotics, Optimal Artificial Curiosity, Creativity, Music, and the Fine Arts. Connection Science, 18(2): 173-187, June 2006. PDF.
On mathematically optimal universal artificial curiosity, based on theoretically best possible ways of maximizing learning progress in embedded agents or robots with an intrinsic motivation to learn skills that lead to a better understanding of the world and what can be done in it. It is also pointed out how music and the arts can be formally understood as a consequence of the principle of artificial curiosity and creativity.

13. J.  Schmidhuber. Self-Motivated Development Through Rewards for Predictor Errors / Improvements. Developmental Robotics 2005 AAAI Spring Symposium, March 21-23, 2005, Stanford University, CA. PDF.

12. J.  Schmidhuber. Exploring the Predictable. In Ghosh, S. Tsutsui, eds., Advances in Evolutionary Computing, p. 579-612, Springer, 2002. PDF . HTML. One of the key publications - see more details under refs [8, 11, 1997-].

11. J . Schmidhuber. Artificial Curiosity Based on Discovering Novel Algorithmic Predictability Through Coevolution. In P. Angeline, Z. Michalewicz, M. Schoenauer, X. Yao, Z. Zalzala, eds., Congress on Evolutionary Computation, p. 1612-1618, IEEE Press, Piscataway, NJ, 1999.

11a. J. Schmidhuber. What's interesting? In Abstract Collection of SNOWBIRD: Machines That Learn. Utah, April 1998.

10. M. Wiering and J. Schmidhuber. Efficient model-based exploration. In R. Pfeiffer, B. Blumberg, J. Meyer, S. W. Wilson, eds., From Animals to Animats 5: Proceedings of the Fifth International Conference on Simulation of Adaptive Behavior, p. 223-228, MIT Press, 1998.

9. M. Wiering and J. Schmidhuber. Learning exploration policies with models. In Proc. CONALD, 1998.

8. J. Schmidhuber. What's interesting? Technical Report IDSIA-35-97, IDSIA, July 1997 (23 pages, 10 figures, 157 K, 834 K gunzipped).
Here we focus on automatic creation of predictable internal abstractions of complex spatio- temporal events: two competing, intrinsically motivated agents agree on essentially arbitrary algorithmic experiments and bet on their possibly surprising (not yet predictable) outcomes in zero-sum games, each agent potentially profiting from outwitting / surprising the other by inventing experimental protocols where both modules disagree on the predicted outcome. The focus is on exploring the space of general algorithms (as opposed to traditional simple mappings from inputs to outputs); the general system [12] focuses on the interesting things by losing interest in both predictable and unpredictable aspects of the world. Unlike the previous systems with intrinsic motivation (1990, 91, 95, see below), the system also takes into account the computational cost of learning new skills, learning when to learn and what to learn. See also refs [11, 12, 1998-2002].

7. J.  Schmidhuber, J.  Zhao, N. Schraudolph. Reinforcement learning with self-modifying policies. In S. Thrun and L. Pratt, eds., Learning to learn, Kluwer, pages 293-309, 1997. PDF; HTML.

6. J. Storck, S. Hochreiter, and J.  Schmidhuber. Reinforcement-driven information acquisition in non-deterministic environments. In Proc. ICANN'95, vol. 2, pages 159-164. EC2 & CIE, Paris, 1995. PDF . HTML.
In this paper the curiosity reward is again proportional to the predictor's surprise / information gain, this time measured as the Kullback-Leibler distance between the learning predictor's subjective probability distributions before and after new observations - the relative entropy between its prior and posterior. (In 2005 Itti & Baldi called this "Bayesian surprise" and demonstrated experimentally that it explains certain patterns of human visual attention better than certain previous approaches.)
Note the differences to "Active Learning": The latter typically focuses on choosing which data points to evaluate next in order to maximize information gain (i.e., one-step look-ahead) assuming all data point evaluations are equally costly. The 1995 system, however, is more general and takes into account: (1) arbitrary delays between experimental actions agents and corresponding information gains, (2) the highly environment-dependent costs of obtaining or creating not just individual data points but entire data sequences.

5. J. Schmidhuber. On learning how to learn learning strategies. Technical Report FKI-198-94, Fakultät für Informatik, Technische Universität München, November 1994. PDF.

4. J.  Schmidhuber. Curious model-building control systems. In Proc. International Joint Conference on Neural Networks, Singapore, volume 2, pages 1458-1463. IEEE, 1991. PDF . HTML.
The second peer-reviewed English-language publication on artificial curious agents with intrinsic motivation. The system uses reinforcement learning to create behaviors that lead to parts of the environment where previous experience indicates that the prediction error can be improved (not necessarily where it is high). So the agent is neither attracted by unpredictable randomness nor by totally predictable aspects of the world. Instead it likes to go where it learnt to expect additional learning progress.
(Quite a few later publications on developmental robotics and intrinsic reward took up this basic idea, e.g., Oudeyer & Kaplan (2007), whose work is restricted to one-step look-ahead though, and doesn't allow for delayed intrinsic rewards like the 1991 paper above.)

3. J.  Schmidhuber. J. Schmidhuber. Adaptive confidence and adaptive curiosity. Technical Report FKI-149-91, Inst. f. Informatik, Tech. Univ. Munich, April 1991. PDF.

2. J.  Schmidhuber. A possibility for implementing curiosity and boredom in model-building neural controllers. In J. A. Meyer and S. W. Wilson, editors, Proc. of the International Conference on Simulation of Adaptive Behavior: From Animals to Animats, pages 222-227. MIT Press/Bradford Books, 1991. PDF . HTML.
The first peer-reviewed English-language publication on artificial curious agents with intrinsic motivation. The system uses reinforcement learning to create behaviors that lead the agent to parts of the environment where the separate predictor's prediction error is expected to be high, assuming one can learn something there.
Quite a few later publications on developmental robotics and/or intrinsic reward took up this basic idea, e.g., Singh & Barto & Chentanez (2005).

1. J.  Schmidhuber. Making the world differentiable: On using fully recurrent self-supervised neural networks for dynamic reinforcement learning and planning in non-stationary environments. Technical Report FKI-126-90, TUM, Feb 1990, revised Nov 1990. PDF. The first paper on planning with reinforcement learning recurrent neural networks (NNs) (more) and on generative adversarial networks where a generator NN is fighting a predictor NN in a minimax game (more).

1a. J. Schmidhuber. Dynamische neuronale Netze und das fundamentale raumzeitliche Lernproblem (Dynamic neural nets and the fundamental spatio-temporal credit assignment problem). Dissertation, Institut fuer Informatik, Technische Universitaet Muenchen, 1990. PDF . HTML.

Differences to Shannon / Boltzmann's notion of surprise. Since the early 1990s, the papers above have repeatedly pointed out an essential difference between our theory of surprise & novelty and Shannon's traditional information theory based on Boltzmann's entropy notion. Consider two extreme examples of uninteresting, unsurprising, boring data. A vision-based agent that always stays in the dark will experience an extremely compressible, soon totally predictable and unsurprising history of unchanging visual inputs. In front of a screen full of white noise conveying a lot of information and "novelty" and "surprise" in the traditional sense of Boltzmann (1800s) and Shannon (1948), however, it will experience highly unpredictable and fundamentally uncompressible data. In both cases the data gets boring quickly as it does not allow for learning new things or for further compression progress. Neither the arbitrary nor the fully predictable is truly novel or surprising or interesting - only data with still unknown but learnable statistical or algorithmic regularities are! That's why our theory of surprise and curiosity and creativity takes the time-varying state of the subjective, learning observer into account.

Check out related papers on adaptive visual attention with foveas (overview page):

J. Schmidhuber and R. Huber. Learning to generate artificial fovea trajectories for target detection. International Journal of Neural Systems, 2(1 & 2):135-141, 1991. Figures in overview page. PDF . HTML.

J.  Schmidhuber and R. Huber. Using sequential adaptive neuro-control for efficient learning of rotation and translation invariance. In T. Kohonen, K. Mäkisara, O. Simula, and J. Kangas, editors, Artificial Neural Networks, pages 315-320. Elsevier Science Publishers B.V., North-Holland, 1991.
.

Evolution
RNN-Evolution
Feedback Network
Computer Vision with Fast Deep Neural Nets Etc Yield Best Results on Many Visual Pattern Recognition Benchmarks
Universal AI
Goedel machine

Deutsch

Right: appetizer video on the formal theory of curiosity & creativity & beauty & surprise & humor. These are excerpts (10 min) of the original talk (40 min) mentioned above.

See also Peter Redgrave's comment on Nature (473, 450, 26 May 2011): Neuroscience: What makes us laugh.