Search:
AI Blog
@SchmidhuberAI
What's new? 6 Dec 2021
KAUST
(17 papers at NeurIPS 2021) and its environment are now offering enormous resources to advance both fundamental and applied AI research:
we are hiring outstanding professors, postdocs, and PhD students.
(ERC Grant:
Many jobs for PhD students and PostDocs
to be hired in 2020.
Earlier jobs: 2018, 2017,
2016)
FAQ in AMA (Ask Me Anything) on
reddit (2014)
Publications (2021)
CV (2020)
Old videos (20092015)
Master's in Artificial Intelligence (Fall 2017)
Contact:
Jürgen Schmidhuber
IDSIA,
Polo universitario Lugano, Via la Santa 1,
CH6962 Lugano  Viganello,
Switzerland
Fax +41 58 666666 1
Fon +41 58 666666 2
Sec +41 58 666666 6
Send spam etc to
juergen@idsia.ch
Pronounce: You_again Shmidhoobuh (if you can say
Schwarzenegger &
Schumacher &
Schiffer,
then you can also say
Schmidhuber)
ON THE NET SINCE 1405
(muslim calendar)
Scientific Director of IDSIA,
Prof. of AI @ USI,
Prof. SUPSI,
exhead of Cog Bot Lab
@ TUM,
Dr. rer. nat. habil. 1993 @ CU,
Dr. rer. nat. 1991,
Dipl. Inf. 1987
Curriculum Vitae (2020)
Photos: Talk at Geneva Motor Show 2019:
a,b,c,d.
Talk in Football Stadium 2019:
a,b,c,d.
WEF, Davos (2018):
a,b.
National Geographic (2017):
a,d.
Others:
e,f,g. More:
2015,
2015,
2015,
2010,
more pics (19632007)
RESEARCH TOPICS (more in the columns to the right):
Feedback Neural Networks,
Deep Learning &
Computer Vision &
Pattern Recognition (numerous world records on benchmark datasets, first
superhuman results), awardwinning
GPUbased CNNs,
Gödel machines,
Universal AI,
Optimal Problem Solvers,
Evolution,
Reinforcement learning (RL),
Hierarchical RL,
MetaLearning & Recursive Self Improvement,
Artificial Curiosity & Creativity & Intrinsic Motivation & Developmental Robotics, Formal Theory of Fun & Creativity,
Theory of Beauty,
Computable Universes,
Generalized Algorithmic Information
COURSES
Machine Learning 1
Machine Learning 2
Our
Pybrain Machine Learning
Library features source code of many
new learning algorithms that cannot be found in other
libraries  see Pybrain video
ROBOTS
Learning Robots,
Elastic Robots,
Robot Population Explosion,
Statistical Robotics,
Resilient Machines,
Resilient Robots (Science 316 p 688),
CoTeSys Robots,
Cogbotlab (compare LRZ 2005),
Robot Cars,
IDSIA Robotics Lab,
also
at the EXPO21xx show room
LOWCOMPLEXITY ART
Example: Femme Fractale
(more examples),
3D Art (sculpture),
Lego Art: stable rings from LEGO bricks,
art involving JS' kids,
pics of selfimproving robots:
state of the art /
the future / the far future
HISTORY
Is history converging? Again? (2006)
Computer history speedup &
Schmidhuber's law: each new breakthrough
comes twice as fast  Omega point around 2040;
see
TEDx talk +
Transcript.
AI History.
The New AI as a formal science.
Raw computing power.
Colossus (Nature 441 p 25),
Telephone (Science 319 p 1759),
First Pow(d)ered Flight (Nature 421 p 689)
MEN who left their mark:
Einstein (general relativity, 1915),
Zuse (first computer, 193541),
Goedel (limits of math and computation, 1931),
Turing (Turing machine, 1936: Nature
429 p 501),
Gauss (mathematician of the millennium),
Leibniz (inventor of the bit),
Schickard (father of the computer age),
Solomonoff (theory of optimal prediction),
Darwin (Nature 452 p 530),
Haber & Bosch (1913:
most influential invention of the 20th century),
Archimedes (greatest scientist ever?)
NOBEL PRIZES: Evolution of national shares
by country of birth (by citizenship):
Peace (cit),
Literature (cit),
Medicine (cit),
Chemistry (cit),
Physics (cit),
Sciences (cit),
Total (cit),
English & German
OLYMPICS
London Olympics 2012: EU gold medal count,
Beijing 2008 gold count,
EU metal of Athens 2004,
Bolt,
All Time Gold Medal Counts
of 2006,
2008,
2010,
2012.
China and former empires (letters in Newsweek, 200405).
The European Union  A New Kind of Empire? (2009)
FAMILY
Ulrike Krommer (wife)
Julia & Leonie (kids)
Schmidhuber's
little brother Christof,
a theoretical physicist turned finance guru (see interview).
His papers: most
famous /
most readable /
best /
craziest;
his wife:
Prof. Beliakova, a topologist.
Closest brush with fame (1981),
Bavarian Poetry
(perfect rhyme on 8x4 syllables, and even makes sense, 1990),
Public bar,
Deutsch (rarely updated)
.




Since age 15 or so, the main goal of professor Jürgen Schmidhuber has been to build a selfimproving Artificial Intelligence (AI) smarter than himself, then retire.
His lab's Deep Learning Neural Networks
based on ideas published in the
"Annus Mirabilis" 19901991
have revolutionised machine learning and AI.
By the mid 2010s,
they were
on 3 billion devices,
and used billions of times per day
through users of the world's most valuable public companies, e.g., for greatly improved
(CTCLSTMbased) speech recognition on all Android phones, greatly improved machine translation through Google Translate and Facebook (over 4 billion LSTMbased translations per day), Apple's Siri and Quicktype on all iPhones, the answers of Amazon's Alexa, and numerous other applications.
In 2011, his team was the first to win official
computer vision contests through deep neural nets,
with
superhuman performance.
In 2012, they had the
first deep NN to win a medical imaging contest (on cancer detection).
All of this attracted enormous interest from industry.
His research group also established the fields of
mathematically rigorous universal AI and recursive selfimprovement in metalearning machines that learn to learn (since 1987). In 1990, he introduced
unsupervised adversarial neural networks that fight each other in a minimax game to achieve artificial curiosity (GANs are a special case).
In 1991, he introduced
very deep learning through unsupervised pretraining,
and
neural fast weight programmers
formally equivalent to what's now called linear Transformers.
His formal theory of creativity & curiosity & fun explains art, science, music, and humor.
He also generalized algorithmic information theory and the manyworlds theory of physics, and introduced the concept of LowComplexity Art, the information age's extreme form of minimal art.
He is recipient of numerous awards,
author of over 350 peerreviewed papers,
and Chief Scientist of the company NNAISENSE, which aims at building the first practical general purpose AI.
He is a frequent keynote speaker, and advising various governments on AI strategies.
 
Artificial Recurrent Neural Networks
(19892014).
Most work in machine learning focuses on machines
with reactive
behavior. RNNs, however, are more general sequence processors
inspired by human brains. They have adaptive
feedback connections and are
in principle as powerful as any computer.
The first RNNs could not learn to look far
back into the past. But our "Long ShortTerm
Memory" (LSTM) RNN overcomes this
fundamental problem,
and efficiently learns to solve many previously unlearnable tasks.
It can be used for
speech recognition, time series prediction, music composition, etc.
In 2009,
our LSTM RNNs became the first recurrent Deep Learning
systems to win official international competitions (with secret test set
known only to the organisers)  they
outperformed all other known methods on the difficult
problem of recognizing unsegmented cursive handwriting,
and also on aspects of speech recognition.
They learn through
gradient descent and / or
evolution or both.
Compare the RNN Book Preface.
LSTM has become popular:
Google, Apple, Microsoft, Facebook, IBM, Baidu, and many other companies
used LSTM RNNs to improve large vocabulary speech recognition, machine translation, language identification / time series prediction / texttospeech synthesis, etc.
Deep Learning & Computer Vision with
Fast Deep Neural Nets.
The future of search engines and robotics lies in image and video recognition.
Since 2009, our
Deep Learning team has won 9 (nine) first prizes
in important
and highly competitive international contests
(with secret test sets known only
to the organisers), far more than any other team.
Our neural nets also set
numerous world records, and were
the
first Deep Learners to win pattern recognition contests in general (2009),
the
first to win object detection contests (2012),
the
first to win a pure image segmentation contest (2012),
and the first machine learning methods to reach
superhuman visual recognition performance in a contest (2011).
Compare this
Google Tech Talk (2011)
and JS' first Deep Learning system of 1991,
with a Deep Learning timeline 19622013.
See also the
history of computer vision contests won by deep CNNs on GPU since 2011. And check out the amazing
Highway Networks (2015), the deepest of them all.
Gödel machine:
An old dream of computer scientists is to build an optimally
efficient universal problem solver. The
Gödel machine
can be implemented on a traditional computer and solves
any given computational problem in an optimal fashion inspired by Kurt
Gödel's celebrated selfreferential formulas (1931).
It starts with an axiomatic description of itself,
and we may plug in any utility function, such as the expected
future reward of a robot.
Using an efficient proof searcher,
the Gödel machine will rewrite any part of its software
(including the proof searcher)
as soon as it has found
a proof that this will improve its future performance,
given the utility function and the typically limited computational resources.
Selfrewrites are globally optimal (no local maxima!) since provably none
of all the alternative rewrites and proofs (those that could be found by
continuing the proof search) are worth waiting for.
The Gödel machine formalizes I. J. Good's informal remarks (1965) on
an "intelligence explosion" through selfimproving "superintelligences".
Summary. FAQ.
Optimal Ordered Problem Solver.
OOPS solves one task after another, through search for
solution computing programs. The incremental method optimally
exploits solutions to earlier tasks when possible  compare principles
of Levin's optimal universal search.
OOPS can temporarily rewrite its own search procedure, efficiently
searching for faster search methods (metasearching or
metalearning).
It is applicable to problems of optimization or prediction.
Talk slides.
Super Omegas and Generalized Kolmogorov Complexity and
Algorithmic Probability.
Kolmogorov's (left) complexity K(x) of a bitstring x is the length of the
shortest program that computes x and halts. Solomonoff's
algorithmic probability of x is the probability of guessing
a program for x. Chaitin's Omega is the halting probability
of a Turing machine with random input (Omega is known as
the "number of wisdom" because it compactly encodes all mathematical truth).
Schmidhuber generalized
all of this
to nonhalting but converging programs. This led to
the shortest possible formal descriptions and to nonenumerable but limitcomputable
measures and Super Omegas, and even has consequences for computable universes and
optimal inductive inference. Slides.
Universal Learning Algorithms.
There is a theoretically optimal way of
predicting the future, given the past.
It can be used to define an optimal (though
noncomputable)
rational agent that maximizes
its expected reward in almost arbitrary environments sampled
from computable probability distributions.
This work represents the first mathematically
sound theory of universal artificial intelligence  most
previous work on AI was either heuristic or very limited.
Speed Prior.
Occam's Razor: prefer simple solutions to complex ones.
But what exactly does "simple" mean? According to tradition something
is simple if it has a short description or program, that is,
it has low Kolmogorov complexity.
This leads to Solomonoff's & Levin's miraculous
probability measure which yields optimal though noncomputable predictions,
given past observations. The Speed Prior
is different though: it is a new simplicity measure based on
the fastest way of describing objects, not the shortest.
Unlike the traditional one, it leads to nearoptimal computable predictions,
and provokes unusual prophecies concerning the future of our universe.
Talk slides.
Transcript of
TEDx talk.
In the Beginning was the Code.
In 1996 Schmidhuber wrote the first paper
about all possible computable universes. His
`Great Programmer'
is consistent with Zuse's
thesis (1967) of computable physics, against which there is no
physical evidence, contrary to common belief. If everything is computable, then
which exactly is our universe's program? It turns out that the simplest program
computes all universes,
not just ours. Later work (2000) on
Algorithmic Theories of Everything analyzed all the
universes with limitcomputable probabilities as well as the very
limits of formal describability. This paper led to abovementioned
generalizations of algorithmic information and
probability and Super Omegas as well as the
Speed Prior.
See comments on Wolfram's 2002 book
and letter
on randomness in physics (Nature 439, 2006).
Talk slides,
TEDx video,
transcript.
Learning Robots.
Some hardwired robots achieve impressive feats.
But they do not learn like babies do.
Traditional
reinforcement learning algorithms
are limited to simple reactive behavior and do not
work well for realistic robots.
Hence robot learning requires novel methods for
learning to identify important past events and memorize them until needed.
Our group is focusing on the abovementioned
recurrent neural networks,
RNN evolution,
Compressed Network Search,
and policy gradients.
Collaborations:
with UniBW on robot cars,
with TUMAM on
humanoids learning to walk,
with DLR on artificial hands.
New IDSIA projects
on developmental robotics
with curious adaptive humanoids
have started in 2009. See
AAAI 2013 Best Student Video.
Financial Forecasting.
Our most lucrative neural network application
employs a secondorder method
for finding the simplest model of stock market training data.
Learning attentive vision.
Humans and other biological systems use sequential gaze shifts for
pattern recognition. This can be much more efficient than
fully parallel approaches to vision. In 1990 we built an
artificial fovea controlled by an adaptive neural controller. Without
a teacher, it learns to find targets
in a visual scene, and to track moving targets.
.
 
Artificial Evolution.
Stateoftheart methods for network evolution
coevolve all
neurons in parallel (excellent results in various
applications).
EVOLINO
outperforms previous methods on several
supervised learning tasks, and yields
the first recurrent support vector machines.
Probabilistic
incremental program evolution evolves
computer programs through probabilistic templates instead
of program populations (first approach to evolving entire
soccer team strategies from scratch).
As an undergrad Schmidhuber also implemented the first
genetic programming system with
loops and variable length code (1987, see below).
Our novel Natural Evolution Strategies (2008) yield excellent
results and link
policy gradients to evolution. And
while most previous algorithms can evolve only hundreds of
adaptive parameters, but not millions, our
Compressed Network Search
(1995) finds compact descriptions of huge networks. A 2013 variant was the first method to evolve neural network controllers with over a million weights.
Compare work on learning to think.
Interestingness & Active Exploration & Artificial Curiosity & Theory of Surprise
(19902010).
Schmidhuber's
curious learning agents
like to go where they
expect to learn
something. These rudimentary
artificial scientists or artists
are driven by intrinsic motivation,
losing interest in both predictable and unpredictable things.
A basis for much of the recent work in Developmental Robotics since 2004.
According to Schmidhuber's formal theory of creativity,
art and science and humor are just
byproducts of the desire to create / discover more data that is
predictable or compressible in hitherto unknown ways!
See Singularity Summit talk (2009).
Reinforcement Learning
in partially observable worlds.
Just like humans, reinforcement learners are supposed to
maximize expected pleasure and
minimize expected pain. Most traditional work is limited to
reactive mappings from sensory inputs to actions.
Our approaches (19892003) for
partially observable environments
are more general: they learn how to use memory and internal states,
sometimes through evolution of RNN.
The first universal reinforcement learner
is optimal if we ignore computation time,
and here
is one that is optimal if we don't.
The novel Natural Evolution Strategies (2008) link
policy gradients to evolution. See also
Compressed Network Search.
Unsupervised learning; nonlinear ICA; history compression.
Pattern recognition works better on nonredundant
data with independent components. Schmidhuber's
Predictability Minimization
(1992) was the first nonlinear neural algorithm for learning to encode
redundant inputs in this way. It is based on coevolution of
unsupervised adversarial neural networks that fight each other in a minimax game.
His neural
history compressors
(1991) compactly encode sequential data for
Deep Learning.
And
Lococode unifies regularization
and unsupervised learning. The feature detectors generated
by such unsupervised methods resemble those of
our more recent supervised neural computer vision systems.
Metalearning Machines / Learning to Learn / Recursive Self Improvement.
Can we construct metalearning algorithms that learn better
learning algorithms? This question has been a main drive of
Schmidhuber's research since his 1987
diploma thesis.
In 1993 he introduced
selfreferential weight matrices, and in
1994 selfmodifying policies trained by the
"successstory algorithm"
(talk slides). His first biasoptimal metalearner
was the abovementioned
Optimal Ordered Problem Solver (2002),
and the ultimate metalearner is the
Gödel machine (2003).
See also work on "learning to think."
Automatic Subgoal Generators and Hierarchical Learning.
There is no teacher providing useful intermediate subgoals
for our reinforcement learning systems. In the early 1990s
Schmidhuber introduced
gradientbased
(pictures)
adaptive subgoal generators; in 1997 also
discrete ones.
Program Evolution and Genetic Programming.
As an undergrad Schmidhuber used Genetic Algorithms
to evolve
computer
programs on a Symbolics LISP machine at
SIEMENS AG.
Two years later this was still novel: In 1987 he published
world's 2nd paper
on pure Genetic Programming
(the first was Cramer's in 1985)
and the first paper on MetaGenetic
Programming.
Learning Economies
with Credit Conservation.
In the late 1980s Schmidhuber developed the first creditconserving
reinforcement learning system based on market
principles, and also the
first neural one.
Neural Heat Exchanger.
Like a physical heat exchanger,
but with neurons instead of liquid.
Perceptions warm up, expectations cool down.
Fast weights instead of recurrent nets.
A slowly changing feedforward neural net learns to quickly
manipulate shortterm memory
in quickly changing synapses of
another net. More on fast weights.
Evolution of fast weight control.
ComplexityBased Theory of Beauty.
In 1997 Schmidhuber claimed: among several patterns classified as "comparable"
by some subjective observer, the subjectively most beautiful
is the one with the simplest description, given the
observer's particular method for encoding and memorizing it.
Exemplary applications include
lowcomplexity faces
and LowComplexity Art,
the computerage equivalent of minimal art (Leonardo, 1997).
A lowcomplexity artwork such as this Femme Fractale both
`looks right' and is computable
by a short program; a typical observer should be
able to see its simplicity. The drive to
create such art is explained by the
formal theory of creativity.
Artificial Ants & Swarm Intelligence.
IDSIA's Artificial Ant Algorithms are multiagent
optimizers that use local search techniques
and communicate via artificial
pheromones that evaporate over time. They broke several important
benchmark world records. This work got numerous
reviews in journals such as
Nature, Science, Scientific American, TIME,
NY Times, Spiegel, Economist, etc. It led to an IDSIA
spinoff company called
ANTOPTIMA.
See also the
AAAI 2011 best video on swarmbots.
All cartoons & artwork &
Fibonacci web design
templates
copyright © by Jürgen Schmidhuber
(except when indicated otherwise).
Image below taken from an interview for
Deutsche Bank during the 2019 World Economic Forum in Davos.
 

