Juergen Schmidhuber's home page



Pronounce:
``Yirgan Shmidhoobuh''

ON THE NET
SINCE 1405

(muslim calendar)



Jürgen Schmidhuber
IDSIA
Galleria 2, 6928 Manno-Lugano
Switzerland
juergen@idsia.ch
+41 91 610 8662


Portrait
(Nov 2002)

More pictures
(1963-2001)
Curriculum Vitae
(Feb 2003)


Online Publications
(Sept 2003)
Unordered bibfile
(2003, + stuff I cited)


TITLES
Codirector IDSIA
Prof. SUPSI
P.D. TUM 1993
Dr. rer. nat. habil.
TUM 1993 @ CU
Dr. rer. nat. 1991
Dipl. Inf. 1987

Lab
Eck
Legg
Hutter
Poland
Bakker
Graves
Gagliolo
Gagliolo
Beringer
Zhumatiy
Salustowicz
IDSIA logo

IDSIA folk
IDSIA tech reports
IDSIA's building
(plus directions)
IDSIA's scenic environment
(pictures)
Job opening
in 2004:

Postdoc or visitor
More jobs
at IDSIA
IDSIA in the press



All cartoons / artwork
copyright © by
J. Schmidhuber


Since age 15 or so my main scientific ambition has been to build an optimal scientist, then retire. In 2028 they will force me to retire anyway. By then I shall be able to buy hardware providing more raw computing power than my brain. Will the proper self-improving software lag far behind? If so I'd be surprised. This optimism is driving my research on mathematically sound, general purpose universal learning machines and the New AI which is also relevant for physics.

RECENT WORK

Probabilistic 
OOPS search tree Optimal Ordered Problem Solver. OOPS solves one task after another, through search for solution-computing programs. The incremental method always optimally exploits solutions to earlier tasks when possible - compare principles of Levin's optimal universal search . OOPS can temporarily rewrite its own search procedure, efficiently searching for faster search methods (metasearching or metalearning ). It is applicable to problems of optimization or prediction. Talk slides.


A. Kolmogorov with K^E superimposed Super Omegas and Generalizations of Kolmogorov Complexity and Algorithmic Probability. Kolmogorov's (left) complexity K(x) of a bitstring x is the length of the shortest program that computes x and halts. Solomonoff's algorithmic probability of x is the probability of guessing a program for x. Chaitin's Omega is the halting probability of a Turing machine with random input (Omega is known as the "number of wisdom" because it compactly encodes all mathematical truth). All of this I generalized to non-halting but converging programs. This led to the shortest possible formal descriptions and to non-enumerable but limit-computable measures and Super Omegas, and even has consequences for computable universes and optimal inductive inference. Slides.


Universal Learning Algorithms. There is a theoretically optimal way of predicting the future, given the past. It can be used to define an optimal (though noncomputable) rational agent that maximizes its expected reward in almost arbitrary environments sampled from computable probability distributions. This work represents the first mathematically sound theory of universal artificial intelligence - most previous work on AI was either heuristic or very limited.


Dice clock illustrating a natural probabilistic bias towards the quickly computable things Speed Prior. Occam's Razor says: prefer simple solutions to complex ones. But what exactly does "simple" mean? According to tradition something is simple if it has a short description or program, that is, it has low Kolmogorov complexity. This leads to Solomonoff's & Levin's miraculous probability measure which yields optimal though noncomputable predictions, given past observations. The Speed Prior is different though: it is a new simplicity measure based on the fastest way of describing objects, not the shortest. Unlike the traditional one, it leads to near-optimal computable predictions, and provokes unusual prophecies concerning the future of our universe. Talk slides.


Galaxy and binary code In the Beginning was the Code! In 1997 I founded the Religion of the Great Programmer consistent with Zuse's thesis (1967) of the computable universe, against which there is no physical evidence, contrary to common belief. If everything is computable, then which exactly is our universe's program? It turns out that the simplest program computes all computable universes, not just ours. Later work (2000) on Algorithmic Theories of Everything analyzed all the universes with limit-computable probabilities as well as the very limits of formal describability. This paper led to above-mentioned generalizations of algorithmic information and probability and Super Omegas as well as the Speed Prior. The searchable "everything" mailing list archive contains numerous discussions of this and related work (see also comments on Wolfram's 2002 book). Talk slides.


Recurrent neural network and human brain State-of-the-art Artificial Recurrent Neural Networks. Most work in machine learning focuses on machines with reactive behavior. RNNs, however, are more general sequence processors inspired by human brains. They have adaptive feedback connections and are in principle as powerful as any computer. Until recently, however, RNNs could not learn to look far back into the past. But our novel RNN called "Long Short-Term Memory" (LSTM) overcomes the fundamental problems of traditional RNNs, and efficiently learns to solve many previously unlearnable tasks, including: Recognition of certain context sensitive languages; Reinforcement learning in partially observable environments; Metalearning of fast online learning algorithms; Music composition. Tutorial slides.


Reinforcement learning mouse finds a piece of cheese Reinforcement Learning in partially observable environments.
Just like humans, reinforcement learners are supposed to maximize expected pleasure and minimize expected pain. Most traditional work is limited to reactive mappings from sensory inputs to actions. Our approaches for partially observable environments are more general: they learn how to use memory and internal states. The first universal reinforcement learner is optimal if we ignore computation time, and we are trying to make one that is optimal if we don't.


one of the CSEM robots Learning Robots. Some hardwired robots achieve impressive feats. But they do not learn like babies do. Unfortunately, traditional reinforcement learning algorithms are limited to simple reactive behavior and do not work well for realistic robots. Hence robots must use novel methods for learning to identify important past events and memorize them until needed. We are focusing on our above-mentioned recurrent neural networks and the Optimal Ordered Problem Solver. We are collaborating with CSEM on attentive sensing and hierarchical control.

OLDIES BUT GOODIES
Nonlinear Independent Component Analysis (ICA). Pattern recognition works better on nonredundant data with independent components. My Predictability Minimization (1992) was the first non-linear neural algorithm for learning to encode redundant inputs in this way. It is based on co-evolution of predictors and feature detectors that fight each other: the detectors try to extract features that make them unpredictable. My neural history compressors (1991) compactly encode sequential data. And Lococode unifies regularization and unsupervised learning.


Financial Forecasting. Our most lucrative neural network application employs a second-order method for finding the simplest model of stock market training data.


Metalearning Machines - Learning to Learn - Self-Improvement. Can we construct metalearning algorithms that learn better learning algorithms? This question has been a main drive of my research since my 1987 diploma thesis. In 1993 I introduced self-referential weight matrices, and in 1994 self-modifying policies trained by the "success-story algorithm" (talk slides). My first time-optimal self-improving metalearner, however, is the above-mentioned Optimal Ordered Problem Solver (2002).


Reinforcement Learning Economies with Credit Conservation. In the late 1980s I developed the first credit-conserving machine learning system based on market principles, and also the first neural one.


Automatic Subgoal Generators - Hierarchical Learning. There is no teacher providing useful intermediate subgoals for our reinforcement learning systems. In the early 1990s we introduced both gradient-based ( pictures ) and discrete adaptive subgoal generators.


Program Evolution & Genetic Programming. As an undergrad I used Genetic Algorithms to evolve computer programs on a Symbolics LISP machine at SIEMENS AG. Two years later this was still novel: In 1987 we published world's 2nd paper on "Genetic Programming" (the first was Cramer's in 1985) and the first on Meta-GP.


Interestingness & Active Exploration. My learning agents like to go where they expect to learn something. They lose interest in both predictable and unpredictable things.


Learning attentive vision. Humans and other biological systems use sequential gaze shifts for pattern recognition. This can be much more efficient than fully parallel approaches to vision. In 1990 we built an artificial fovea controlled by an adaptive neural controller. Without a teacher, it learns to find targets in a visual scene, and to track moving targets.


Complexity-Based Theory of Beauty. In 1997 I claimed: among several patterns classified as "comparable" by some subjective observer, the subjectively most beautiful is the one with the simplest description, given the observer's particular method for encoding and memorizing it. Exemplary applications include low-complexity faces and Low-Complexity Art.


Neural Heat Exchanger. Like a physical heat exchanger, but with neurons instead of liquid.


Fast weights instead of recurrent nets. More fast weights.


Artificial Ants. IDSIA's Artificial Ant Algorithms are multiagent optimizers that use local search techniques and communicate via artificial pheromones that evaporate over time. They broke several important benchmark world records. This work got numerous reviews in journals such as Nature, Science, Scientific American, TIME, NY Times, Spiegel, etc. It led to an IDSIA spin-off company called ANTOPTIMA.

low-complexity art

Low-complexity butterfly Low-Complexity Art. In 1997 I introduced this computer-age equivalent of minimal art in the journal "Leonardo." A low-complexity artwork can be specified by a short computer algorithm. It should "look right," its algorithmic complexity should be small, and a typical observer should be able to see and understand its simplicity. This inspired a complexity-based theory of beauty.

LEGO double ring Lego Art. Did you know that one can make stable rings and other curved objects from rectangular LEGO elements only?

Uli's head of stone Sculptures. Occasionally I am making 3D portraits, such as this sculpture of my wife Uli.

Robot in swamp; tries to pull himself out Pictures of self- improving robots:

State of the art
Future
Far future

Uli Krommer, my wife.

Julia & Leonie, my kids.

My little brother is a theoretical physicist. His most famous / most readable / best / craziest papers.

Computer history speedup. Schmidhuber's law: each new breakthrough comes twice as fast - Omega point around 2040.

First Pow(d)ered Flight (Nature 421 p 689)

Brave New World of Scientific Publishing

Bavarian Poetry

Other Juergens and Schmidhubers

Public bar

Do not press the red button Dangerous red button