Unsupervised Neural Nets Fight Each Other in a Minimax Game where One Net Maximizes the Value Function Minimized by the Other

Jürgen Schmidhuber (2018, updated 2022)
Pronounce: You_again Shmidhoobuh
AI Blog
@SchmidhuberAI


Unsupervised Neural Networks Fight in a Minimax Game

Edit of 2022: There is now a more detailed publication [AC20] on this topic. See also the blog post [AC] on 3 decades of artificial curiosity & creativity.

One of the most important goals of research on Artificial Neural Networks (NNs) is to create algorithms that learn the statistics of given data. To achieve this, I introduced a new type of unsupervised learning in our Annus Mirabilis of 1990-1991 [MIR]. It is based on the principles of gradient descent/ascent in a minimax game where one NN minimizes the objective function maximized by another. This duel between two unsupervised adversarial NNs was called Predictability Minimization (PM) [PM3, PM1, PM2, PM4, ZIF].

Predictability Minimization: unsupervised minimax game where one neural network minimizes the objective function maximized by another PM requires an encoder network with initially random connection strengths or weights. It receives data samples (such as images) through its input units (white circles in the figure to the right) and generates codes thereof across its output units called code units (grey). Each code unit can be active in the interval [0,1]. A separate predictor network (black output units in the figure) is trained to predict each code unit from the remaining code units. Both encoder and predictor may have hidden units (not shown here). The predictor learns the conditional expectation of each code unit, given the other code units. But the encoder is trained to maximize exactly the same objective function minimized by the predictor (e.g., mean squared error). That is, each code unit wants to become as unpredictable as possible. Predictor and encoder fight each other, to motivate the encoder to achieve a "holy grail" of unsupervised learning, namely, an ideal, disentangled, binary, factorial code of the data, where the code units are statistically independent of each other - see relevant proof by Dayan & Zemel & Pouget (1992) [PM5, PM2]. Ideally, after learning, the probability of a given data pattern is simply the product of the probabilities of its code units; these probabilities equal the predictor's outputs. (Pattern codes can be decoded again [PM1].)

Generative Adversarial Networks (GANs, 2014) [GAN1] also use an unsupervised minimax principle to model the statistics of given data. The well-known NIPS 2014 GAN paper [GAN1] claims that PM is not based on a minimax game with a value function that one agent seeks to maximise and the other seeks to minimise, and that for GANs "the competition between the networks is the sole training criterion, and is sufficient on its own to train the network," while PM "is only a regularizer that encourages the hidden units of a neural network to be statistically independent while they accomplish some other task; it is not a primary training criterion" [GAN1]. However, good old PM is indeed a pure minimax game, too, e.g., [PM3], Equation 2 (there is no "other task"). In particular, PM was also trained [PM3, PM1] (also on images [PM3, PM4]) such that "the competition between the networks is the sole training criterion, and is sufficient on its own to train the network." The GAN principle was first published in 1990 [AC90][AC90b][AC]. See the well-known priority dispute on GANs [T22].

Curiosity is not necessarily good for you Unsupervised Minimax for Artificial Curiosity in Adversarial Reinforcement Learning

I extended these ideas to the case of unsupervised adversarial Reinforcement Learning (RL), to build agents with Artificial Curiosity. The first work on this dates back to 1990 [AC90][AC90b][AC] (when Predictability Minimization was invented, too). A reward-maximising neural control network C learns to generate action sequences or experiments in an environment. It gets intrinsic reward in proportion to the prediction errors of a separate neural network called the world model M. M learns to predict future inputs, given past inputs and actions. Again, in the absence of external reward, C is maximising exactly the same value function that M is minimising. This motivates C to invent and generate experiments that lead to "novel" situations where M does not yet know how to predict well.

Many recent papers on curiosity for RL are based on this simple principle. Certain drawbacks of the approach were overcome in numerous follow-up papers starting in 1991, e.g., [AC91, AC91b], and many more.

In particular, in an adversarial approach of 1997 [AC97][AC99][AC02], two dueling, reward-maximizing modules (both general computers) called the left brain and the right brain can collectively design an experiment: a (probabilistic) program that defines how to execute an action sequence in the environment, and how to compute the final experimental outcome through an instruction sequence implementing a computable function (e.g., a binary yes/no classification) of the observation sequence triggered by the experiment. Both brains can predict experimental outcomes before they are known. If their predictions or hypotheses differ, after having generated and executed the experiment, the surprised loser pays an intrinsic reward to the winner in a zero sum game.

Two reward-maximizing modules (both general computers) called the left brain and the right brain bet on computational outcomes of experimental protocols or algorithms they have agreed upon This motivates the unsupervised two brain system to focus on the "interesting" things, losing interest in boring aspects of the world that are consistently predictable by both brains, as well as seemingly random aspects of the world that are currently still hard to predict by any brain. Again, in the absence of external reward, each unsupervised module is trying to maximise the value function minimised by the other. This type of artificial curiosity & creativity may drive artificial scientists and artists, e.g., [AC09], and may also accelerate the intake of external reward, e.g., [AC02], intuitively because a better understanding of the world can help to solve certain problems faster.

Today, many are using the unsupervised minimax principle, both for modeling data distributions and for artificial curiosity (and also for generating algorithmic art). Compare recent slides [ICML18ws].

References

[AC] J.  Schmidhuber (AI Blog, 2021). 3 decades of artificial curiosity & creativity. Schmidhuber's artificial scientists not only answer given questions but also invent new questions. They achieve curiosity through: (1990) the principle of generative adversarial networks, (1991) neural nets that maximise learning progress, (1995) neural nets that maximise information gain (optimally since 2011), (1997) adversarial design of surprising computational experiments, (2006) maximizing compression progress like scientists/artists/comedians do, (2011) PowerPlay... Since 2012: applications to real robots.

[AC90] J.  Schmidhuber. Making the world differentiable: On using fully recurrent self-supervised neural networks for dynamic reinforcement learning and planning in non-stationary environments. Technical Report FKI-126-90, TUM, Feb 1990, revised Nov 1990. PDF. The first paper on planning with reinforcement learning recurrent neural networks (NNs) (more) and on generative adversarial networks where a generator NN is fighting a predictor NN in a minimax game (more).

[AC90b] J.  Schmidhuber. A possibility for implementing curiosity and boredom in model-building neural controllers. In J. A. Meyer and S. W. Wilson, editors, Proc. of the International Conference on Simulation of Adaptive Behavior: From Animals to Animats, pages 222-227. MIT Press/Bradford Books, 1991. PDF. More.

[AC91] J. Schmidhuber. Adaptive confidence and adaptive curiosity. Technical Report FKI-149-91, Inst. f. Informatik, Tech. Univ. Munich, April 1991. PDF.

[AC91b] J.  Schmidhuber. Curious model-building control systems. In Proc. International Joint Conference on Neural Networks, Singapore, volume 2, pages 1458-1463. IEEE, 1991. PDF.

[AC95] J. Storck, S. Hochreiter, and J. Schmidhuber. Reinforcement-driven information acquisition in non-deterministic environments. In Proc. ICANN'95, vol. 2, pages 159-164. EC2 & CIE, Paris, 1995. PDF.

[AC97] J. Schmidhuber. What's interesting? Technical Report IDSIA-35-97, IDSIA, July 1997. Focus on automatic creation of predictable internal abstractions of complex spatio-temporal events: two competing, intrinsically motivated agents agree on essentially arbitrary algorithmic experiments and bet on their possibly surprising (not yet predictable) outcomes in zero-sum games, each agent potentially profiting from outwitting / surprising the other by inventing experimental protocols where both modules disagree on the predicted outcome. The focus is on exploring the space of general algorithms (as opposed to traditional simple mappings from inputs to outputs); the general system focuses on the interesting things by losing interest in both predictable and unpredictable aspects of the world. Unlike our previous systems with intrinsic motivation,[AC90-AC95] the system also takes into account the computational cost of learning new skills, learning when to learn and what to learn. See later publications.[AC99][AC02]

[AC99] J. Schmidhuber. Artificial Curiosity Based on Discovering Novel Algorithmic Predictability Through Coevolution. In P. Angeline, Z. Michalewicz, M. Schoenauer, X. Yao, Z. Zalzala, eds., Congress on Evolutionary Computation, p. 1612-1618, IEEE Press, Piscataway, NJ, 1999.

[AC02] J. Schmidhuber. Exploring the Predictable. In Ghosh, S. Tsutsui, eds., Advances in Evolutionary Computing, p. 579-612, Springer, 2002. PDF.

[AC09] J. Schmidhuber. Art & science as by-products of the search for novel patterns, or data compressible in unknown yet learnable ways. In M. Botta (ed.), Et al. Edizioni, 2009, pp. 98-112. PDF. (More on artificial scientists and artists.)

[AC20] J. Schmidhuber. Generative Adversarial Networks are Special Cases of Artificial Curiosity (1990) and also Closely Related to Predictability Minimization (1991). Neural Networks, Volume 127, p 58-66, 2020. Preprint arXiv/1906.04493.

[GAN0] O. Niemitalo. A method for training artificial neural networks to generate missing data within a variable context. Blog post, Internet Archive, 2010. A blog post describing basic ideas[AC][AC90,AC90b][AC20] of GANs.

[GAN1] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, Y. Bengio. Generative adversarial nets. NIPS 2014, 2672-2680, Dec 2014. A description of GANs that does not cite Schmidhuber's original GAN principle of 1990[AC][AC90,AC90b][AC20][R2][T22] (also containing wrong claims about Schmidhuber's adversarial NNs for Predictability Minimization[PM0-2][AC20][T22]).

[ICML18ws] J. Schmidhuber. Unsupervised Minimax: Nets That Fight Each Other. Slides for the ICML 2018 Workshop on Theoretical Foundations and Applications of Deep Generative Models. PDF.

[MIR] J. Schmidhuber (AI Blog, Oct 2019, revised 2021). Deep Learning: Our Miraculous Year 1990-1991. Preprint arXiv:2005.05744, 2020. The deep learning neural networks of our team have revolutionised pattern recognition and machine learning, and are now heavily used in academia and industry. In 2020-21, we celebrate that many of the basic ideas behind this revolution were published within fewer than 12 months in our "Annus Mirabilis" 1990-1991 at TU Munich.

[PhD] J. Schmidhuber. Dynamische neuronale Netze und das fundamentale raumzeitliche Lernproblem (Dynamic neural nets and the fundamental spatio-temporal credit assignment problem). Dissertation, TUM, 1990. PDF. HTML.

[PM1] J. Schmidhuber. Learning factorial codes by predictability minimization. Neural Computation, 4(6):863-879, 1992. Based on TR CU-CS-565-91, Univ. Colorado at Boulder, 1991. PDF. More.

[PM2] J. Schmidhuber. Netzwerkarchitekturen, Zielfunktionen und Kettenregel (Network architectures, objective functions, and chain rule). Habilitation (postdoctoral thesis), Institut for informatics, Tech Univ. Munich, 1993. PDF. HTML.

[PM3] J. Schmidhuber, M. Eldracher, B. Foltin. Semilinear predictability minimzation produces well-known feature detectors. Neural Computation, 8(4):773-786, 1996. PDF. More.

[PM4] N. N. Schraudolph, M. Eldracher, J. Schmidhuber. Processing Images by Semi-Linear Predictability Minimization. Network, 10(2): 133-169, 1999. PDF.

[PM5] P. Dayan, R. Zemel, and A. Pouget, 1992. Personal Communication. See [PM2], Sec. 6.9.3.

[R2] Reddit/ML, 2019. J. Schmidhuber really had GANs in 1990.

[T22] J. Schmidhuber (AI Blog, 2022). Scientific Integrity and the History of Deep Learning: The 2021 Turing Lecture, and the 2018 Turing Award. Technical Report IDSIA-77-21 (v3), IDSIA, Lugano, Switzerland, 22 June 2022.

[ZIF] J. Schmidhuber. Neural predictors for detecting and removing redundant information. In H. Cruse, J. Dean, and H. Ritter, editors, Adaptive Behavior and Learning. Kluwer, 1998. PDF. HTML.

PS: Earlier adversarial settings include reinforcement learners playing against themselves, e.g., [SAM59], or predator prey games in artificial evolution [HIL90]. Chess programs date back to 1945 [ZUSE45], and for many decades have used a recursive minimax procedure with continually shrinking look-ahead. Such earlier adversarial settings, however, neither were unsupervised, nor gradient-based, nor about modeling the statistics of given data, nor about intrinsic rewards for artificial curiosity:

[SAM59] A. L. Samuel. Some studies in machine learning using the game of checkers. IBM Journal on Research and Development, 3:210-229, 1959.

[HIL90] W. D. Hillis. Co-evolving parasites improve simulated evolution as an optimization procedure. Physica D: Nonlinear Phenomena, 42(1-3):228-234, 1990.

[ZUSE45] K. Zuse. Chess programs, in The Plankalkül. Rept. No. 106, Gesellschaft für Mathematik und Datenverarbeitung, pages 201-244, 1976. Translation of German original, 1945.

.

Our impact on the world's most valuable public companies (Google, Apple, Microsoft, Facebook, mazon etc)