Edit of 2022: There is now a
more detailed publication
[AC20] on this topic. See also the blog post [AC] on 3 decades of artificial curiosity & creativity.
One of the most important goals of research on
Artificial Neural Networks (NNs)
is to create algorithms that learn
the statistics of given data.
To achieve this, I introduced
a new type of unsupervised learning in our
Annus Mirabilis of
1990-1991 [MIR].
It is based on the principles of gradient descent/ascent in a minimax game where one NN minimizes the objective function maximized by another.
This duel between two unsupervised adversarial NNs was called
Predictability Minimization
(PM) [PM3, PM1, PM2, PM4, ZIF].
PM requires an encoder network with initially random connection strengths or weights.
It receives data samples (such as images) through its input units
(white circles in the figure to the right)
and generates codes thereof across its output units called code units (grey).
Each code unit can be active in the interval [0,1].
A separate predictor network (black output units in the figure)
is trained to predict each
code unit
from the remaining code units. Both encoder and predictor may have hidden
units (not shown here).
The predictor learns the conditional
expectation of each code unit, given the other code units.
But the encoder is trained to maximize exactly the
same objective function
minimized by the predictor (e.g., mean squared error). That is, each
code unit wants to become as unpredictable as possible.
Predictor and encoder fight each other,
to motivate the encoder to achieve a "holy grail" of unsupervised learning, namely,
an ideal, disentangled, binary, factorial code
of the data, where the code units are statistically independent of each other
- see relevant proof by Dayan & Zemel & Pouget (1992) [PM5, PM2].
Ideally, after learning, the probability of a given data pattern is simply the product of the probabilities of its code units; these probabilities equal the predictor's outputs.
(Pattern codes can be decoded again [PM1].)
Generative Adversarial Networks
(GANs, 2014) [GAN1] also use an
unsupervised minimax principle to model the statistics of given data.
The well-known NIPS 2014 GAN paper [GAN1]
claims that PM is not
based on a minimax game with a value function that one agent seeks to maximise and the other seeks to minimise, and that for GANs
"the competition between the networks is the sole training criterion, and is sufficient on its own to train the network," while PM "is only a regularizer that encourages the hidden units of a neural network to be statistically independent while they accomplish some other task; it is not a primary training criterion" [GAN1].
However, good old
PM is indeed a pure minimax game, too, e.g.,
[PM3], Equation 2 (there is no "other task").
In particular, PM was also trained [PM3, PM1] (also on images [PM3, PM4]) such that "the competition between the networks is the sole training criterion, and is sufficient on its own to train the network."
The GAN principle was first published in 1990 [AC90][AC90b][AC].
See the well-known
priority dispute on GANs [T22].
Unsupervised Minimax for Artificial Curiosity
in Adversarial Reinforcement Learning
I extended these ideas to the case of
unsupervised adversarial Reinforcement Learning (RL),
to build agents with Artificial Curiosity.
The first work on this
dates back to 1990
[AC90][AC90b][AC] (when Predictability Minimization was invented, too). A reward-maximising neural control network C
learns to generate action sequences or experiments
in an environment. It gets intrinsic
reward in proportion to the prediction errors of a separate neural network
called the world model M. M learns to predict future inputs, given past inputs and actions.
Again, in the absence of external reward, C is maximising exactly the same value function that M is minimising. This motivates C to invent and generate experiments that lead to "novel" situations where M does not yet know how to predict well.
Many recent papers on curiosity for RL are based on this simple principle.
Certain drawbacks of the approach were overcome in numerous follow-up papers
starting in 1991, e.g., [AC91, AC91b], and
many more.
In particular, in an adversarial approach of 1997 [AC97][AC99][AC02],
two dueling, reward-maximizing modules (both general computers)
called the left brain and the right brain
can collectively design an experiment:
a (probabilistic) program that defines how to execute an
action sequence in the environment, and how to
compute the final experimental outcome through an
instruction sequence implementing
a computable function (e.g., a binary yes/no classification)
of the observation sequence triggered by the experiment.
Both brains can
predict experimental outcomes before they are known.
If their predictions or hypotheses differ,
after having generated and executed the experiment,
the surprised loser pays an intrinsic
reward to the winner in a zero sum game.
This motivates the unsupervised two brain system to focus on the "interesting" things,
losing interest in boring aspects of the world that are consistently predictable by both brains, as well as seemingly random aspects of the world that are currently still hard to predict by any brain.
Again, in the absence of external reward, each unsupervised
module is trying to maximise the value function minimised by the other.
This type of
artificial curiosity & creativity
may drive
artificial scientists and artists,
e.g., [AC09], and
may also accelerate the intake of
external reward, e.g., [AC02], intuitively because a better understanding
of the world can help to solve certain problems faster.
Today, many are using the unsupervised minimax principle, both for modeling data distributions and for artificial curiosity (and also for generating algorithmic art).
Compare recent slides [ICML18ws].
References
[AC]
J. Schmidhuber (AI Blog, 2021). 3 decades of artificial curiosity & creativity. Schmidhuber's artificial scientists not only answer given questions but also invent new questions. They achieve curiosity through: (1990) the principle of generative adversarial networks, (1991) neural nets that maximise learning progress, (1995) neural nets that maximise information gain (optimally since 2011), (1997) adversarial design of surprising computational experiments, (2006) maximizing compression progress like scientists/artists/comedians do, (2011) PowerPlay... Since 2012: applications to real robots.
[AC90]
J. Schmidhuber.
Making the world differentiable: On using fully recurrent
self-supervised neural networks for dynamic reinforcement learning and
planning in non-stationary environments.
Technical Report FKI-126-90, TUM, Feb 1990, revised Nov 1990.
PDF.
The first paper on planning with reinforcement learning recurrent neural networks (NNs) (more) and on generative adversarial networks
where a generator NN is fighting a predictor NN in a minimax game
(more).
[AC90b]
J. Schmidhuber.
A possibility for implementing curiosity and boredom in
model-building neural controllers.
In J. A. Meyer and S. W. Wilson, editors, Proc. of the
International Conference on Simulation
of Adaptive Behavior: From Animals to
Animats, pages 222-227. MIT Press/Bradford Books, 1991.
PDF.
More.
[AC91]
J. Schmidhuber. Adaptive confidence and adaptive curiosity. Technical Report FKI-149-91, Inst. f. Informatik, Tech. Univ. Munich, April 1991.
PDF.
[AC91b]
J. Schmidhuber.
Curious model-building control systems.
In Proc. International Joint Conference on Neural Networks,
Singapore, volume 2, pages 1458-1463. IEEE, 1991.
PDF.
[AC95]
J. Storck, S. Hochreiter, and J. Schmidhuber. Reinforcement-driven information acquisition in non-deterministic environments. In Proc. ICANN'95, vol. 2, pages 159-164. EC2 & CIE, Paris, 1995. PDF.
[AC97]
J. Schmidhuber.
What's interesting?
Technical Report IDSIA-35-97, IDSIA, July 1997.
Focus
on automatic creation of predictable internal
abstractions of complex spatio-temporal events:
two competing, intrinsically motivated agents agree on essentially
arbitrary algorithmic experiments and bet
on their possibly surprising (not yet predictable)
outcomes in zero-sum games,
each agent potentially profiting from outwitting / surprising
the other by inventing experimental protocols where both
modules disagree on the predicted outcome. The focus is on exploring
the space of general algorithms (as opposed to
traditional simple mappings from inputs to
outputs); the
general system
focuses on the interesting
things by losing interest in both predictable and
unpredictable aspects of the world. Unlike our previous
systems with intrinsic motivation,[AC90-AC95] the system also
takes into account
the computational cost of learning new skills, learning when to learn and what to learn.
See later publications.[AC99][AC02]
[AC99]
J. Schmidhuber.
Artificial Curiosity Based on Discovering Novel Algorithmic
Predictability Through Coevolution.
In P. Angeline, Z. Michalewicz, M. Schoenauer, X. Yao, Z.
Zalzala, eds., Congress on Evolutionary Computation, p. 1612-1618,
IEEE Press, Piscataway, NJ, 1999.
[AC02]
J. Schmidhuber.
Exploring the Predictable.
In Ghosh, S. Tsutsui, eds., Advances in Evolutionary Computing,
p. 579-612, Springer, 2002.
PDF.
[AC09]
J. Schmidhuber. Art & science as by-products of the search for novel patterns, or data compressible in unknown yet learnable ways. In M. Botta (ed.), Et al. Edizioni, 2009, pp. 98-112.
PDF. (More on
artificial scientists and artists.)
[AC20]
J. Schmidhuber. Generative Adversarial Networks are Special Cases of Artificial Curiosity (1990) and also Closely Related to Predictability Minimization (1991).
Neural Networks, Volume 127, p 58-66, 2020.
Preprint arXiv/1906.04493.
[GAN0]
O. Niemitalo. A method for training artificial neural networks to generate missing data within a variable context.
Blog post, Internet Archive, 2010.
A blog post describing basic ideas[AC][AC90,AC90b][AC20] of GANs.
[GAN1]
I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair,
A. Courville, Y. Bengio.
Generative adversarial nets. NIPS 2014, 2672-2680, Dec 2014.
A description of GANs that does not cite Schmidhuber's original GAN principle of 1990[AC][AC90,AC90b][AC20][R2][T22] (also containing wrong claims about Schmidhuber's adversarial NNs for
Predictability Minimization[PM0-2][AC20][T22]).
[ICML18ws]
J. Schmidhuber. Unsupervised Minimax:
Nets That Fight Each Other.
Slides for the
ICML 2018 Workshop
on Theoretical Foundations and Applications
of Deep Generative Models.
PDF.
[MIR] J. Schmidhuber (AI Blog, Oct 2019, revised 2021). Deep Learning: Our Miraculous Year 1990-1991. Preprint
arXiv:2005.05744, 2020.
The deep learning neural networks of our team have revolutionised pattern recognition and machine learning, and are now heavily used in academia and industry. In 2020-21, we celebrate that many of the basic ideas behind this revolution were published within fewer than 12 months in our "Annus Mirabilis" 1990-1991 at TU Munich.
[PhD]
J. Schmidhuber. Dynamische neuronale Netze und das fundamentale raumzeitliche Lernproblem (Dynamic neural nets and the fundamental spatio-temporal credit assignment problem). Dissertation, TUM, 1990. PDF.
HTML.
[PM1] J. Schmidhuber. Learning factorial codes by predictability minimization. Neural Computation, 4(6):863-879, 1992. Based on TR CU-CS-565-91, Univ. Colorado at Boulder, 1991. PDF.
More.
[PM2]
J. Schmidhuber.
Netzwerkarchitekturen, Zielfunktionen und Kettenregel
(Network architectures, objective functions, and chain rule).
Habilitation (postdoctoral thesis),
Institut for informatics, Tech Univ. Munich, 1993.
PDF.
HTML.
[PM3] J. Schmidhuber, M. Eldracher, B. Foltin. Semilinear predictability minimzation produces well-known feature detectors. Neural Computation, 8(4):773-786, 1996.
PDF. More.
[PM4]
N. N. Schraudolph, M. Eldracher, J. Schmidhuber.
Processing Images by Semi-Linear Predictability Minimization.
Network, 10(2): 133-169, 1999.
PDF.
[PM5]
P. Dayan, R. Zemel, and A. Pouget, 1992. Personal Communication. See [PM2], Sec. 6.9.3.
[R2] Reddit/ML, 2019. J. Schmidhuber really had GANs in 1990.
[T22] J. Schmidhuber (AI Blog, 2022).
Scientific Integrity and the History of Deep Learning: The 2021 Turing Lecture, and the 2018 Turing Award. Technical Report IDSIA-77-21 (v3), IDSIA, Lugano, Switzerland, 22 June 2022.
[ZIF]
J. Schmidhuber.
Neural predictors for detecting and removing redundant information.
In H. Cruse, J. Dean, and H. Ritter, editors, Adaptive Behavior
and Learning. Kluwer, 1998.
PDF.
HTML.
PS: Earlier adversarial settings include reinforcement learners playing against themselves, e.g., [SAM59], or predator prey games in artificial evolution [HIL90]. Chess programs date back to 1945 [ZUSE45], and for many decades have used a recursive minimax procedure with continually shrinking look-ahead.
Such earlier adversarial settings, however, neither were unsupervised, nor gradient-based, nor about modeling the statistics of given data, nor about intrinsic rewards for artificial curiosity:
[SAM59]
A. L. Samuel.
Some studies in machine learning using the game of checkers.
IBM Journal on Research and Development, 3:210-229, 1959.
[HIL90]
W. D. Hillis.
Co-evolving parasites improve simulated evolution as an optimization
procedure.
Physica D: Nonlinear Phenomena, 42(1-3):228-234, 1990.
[ZUSE45]
K. Zuse.
Chess programs, in The Plankalkül.
Rept. No. 106, Gesellschaft für Mathematik und
Datenverarbeitung, pages 201-244, 1976. Translation of German original, 1945.
.