home bar

AI Blog (2023)


What's new? (2021)

KAUST (24 papers at NeurIPS 2022) and its environment are now offering enormous resources to advance both fundamental and applied AI research: we are hiring outstanding professors, postdocs, and PhD students.

(ERC Grant: Many jobs for PhD students and PostDocs to be hired in 2020. Earlier jobs: 2017, 2016)

FAQ in AMA (Ask Me Anything) on reddit (2014)

Publications (2022)
CV (2021)
Old videos (2009-2015)

Master's in Artificial Intelligence at the Swiss AI Lab IDSIA and USI Faculty of Informatics
Master's in Artificial Intelligence (Fall 2017)

Jürgen Schmidhuber
Director, KAUST AI Initiative

Also: IDSIA, Polo universitario Lugano, Via la Santa 1, CH-6962 Lugano - Viganello, Switzerland
Fax +41 58 666666 1
Fon +41 58 666666 2
Sec +41 58 666666 6
Send spam etc to

Pronounce: You_again Shmidhoobuh (if you can say Schwarzenegger & Schumacher & Schiffer, then you can also say Schmidhuber)

(muslim calendar)

Scientific Director of IDSIA, Prof. of AI @ USI, Prof. SUPSI, ex-head of Cog Bot Lab @ TUM, Dr. rer. nat. habil. 1993 @ CU, Dr. rer. nat. 1991, Dipl. Inf. 1987

Curriculum Vitae (2021)

Photos: Talk at Geneva Motor Show 2019: a,b,c,d. Talk in Football Stadium 2019: a,b,c,d. WEF, Davos (2018): a,b. National Geographic (2017): a,d. Others: e,f,g. More: 2015, 2015, 2015, 2010, more pics (1963-2007)

2011: First Superhuman Visual Pattern Recognition - Deep Learning

RESEARCH TOPICS (more in the columns to the right):
Feedback Neural Networks, Deep Learning & Computer Vision & Pattern Recognition (numerous world records on benchmark datasets, first superhuman results), award-winning GPU-based CNNs, Gödel machines, Universal AI, Optimal Problem Solvers, Evolution, Reinforcement learning (RL), Hierarchical RL, Meta-Learning & Recursive Self- Improvement, Artificial Curiosity & Creativity & Intrinsic Motivation & Developmental Robotics, Formal Theory of Fun & Creativity, Theory of Beauty, Computable Universes, Generalized Algorithmic Information

Machine Learning 1
Machine Learning 2
Our Pybrain Machine Learning Library features source code of many new learning algorithms that cannot be found in other libraries - see Pybrain video

IDSIA Robotics Lab

Learning Robots, Elastic Robots, Robot Population Explosion, Statistical Robotics, Resilient Machines, Resilient Robots (Science 316 p 688), CoTeSys Robots, Cogbotlab (compare LRZ 2005), Robot Cars, IDSIA Robotics Lab, also at the EXPO21xx show room

Femme Fractale - Low-Complexity Art computable by a very short program

Example: Femme Fractale (more examples), 3D Art (sculpture), Lego Art: stable rings from LEGO bricks, art involving JS' kids, pics of self-improving robots: state of the art / the future / the far future

Is history converging? Again? Image: 2000-year-old marble copy of the Laokoon group from 
the Axial Age

Is history converging? Again? (2006)
Annotated History of Modern AI and Deep Learning (2022)
Computer history speedup & Schmidhuber's law: each new breakthrough comes twice as fast - Omega point around 2040; see TEDx talk + Transcript. AI History (2000). The New AI as a formal science. Raw computing power. Colossus (Nature 441 p 25), Telephone (Science 319 p 1759), First Pow(d)ered Flight (Nature 421 p 689)

MEN who left their mark: Einstein (general relativity, 1915), Zuse (first computer, 1935-41), Goedel (limits of math and computation, 1931), Turing (Turing machine, 1936: Nature 429 p 501), Gauss (mathematician of the millennium), Leibniz (1st computer scientist), Schickard (father of the computer age), Solomonoff (theory of optimal prediction), Darwin (Nature 452 p 530), Haber & Bosch (1913: most influential invention of the 20th century), Archimedes (greatest scientist ever?)

Science Nobel Prizes 1901-2000: Evolution of cumulative national shares by citizenship at the time of the award

NOBEL PRIZES: Evolution of national shares by country of birth (by citizenship): Peace (cit), Literature (cit), Medicine (cit), Chemistry (cit), Physics (cit), Sciences (cit), Total (cit), English & German

London Olympics 2012: EU gold medal count, Beijing 2008 gold count, EU metal of Athens 2004, Bolt, All Time Gold Medal Counts of 2006, 2008, 2010, 2012.

China and former empires (letters in Newsweek, 2004-05). The European Union - A New Kind of Empire? (2009)

Juergen Schmidhuber's Daughters and the Swiss South Sea I, 2007

Ulrike Krommer (wife)
Julia & Leonie (kids)
Schmidhuber's little brother Christof, a theoretical physicist turned finance guru (see interview). His papers: most famous / most readable / best / craziest; his wife: Prof. Beliakova, a topologist.

Closest brush with fame (1981), Bavarian Poetry (perfect rhyme on 8x4 syllables, and even makes sense, 1990), Public bar,
Deutsch (rarely updated) .

Since age 15 or so, the main goal of professor Jürgen Schmidhuber has been to build a self-improving Artificial Intelligence (AI) smarter than himself, then retire. His lab's Deep Learning Neural Networks (NNs) based on ideas published in the "Annus Mirabilis" 1990-1991 have revolutionised machine learning and AI. In 2009, the CTC-trained Long Short-Term Memory (LSTM) of his team was the first recurrent NN to win international pattern recognition competitions. In 2010, his lab's fast and deep feedforward NNs on GPUs greatly outperformed previous methods, without using any unsupervised pre-training, a popular deep learning strategy that he pioneered in 1991. In 2011, the DanNet of his team was the first feedforward NN to win computer vision contests, achieving superhuman performance. In 2012, they had the first deep NN to win a medical imaging contest (on cancer detection). This deep learning revolution quickly spread from Europe to North America and Asia, and attracted enormous interest from industry. By the mid 2010s, his lab's NNs were on 3 billion devices, and used billions of times per day through users of the world's most valuable public companies, e.g., for greatly improved speech recognition on all Android smartphones, greatly improved machine translation through Google Translate and Facebook (over 4 billion LSTM-based translations per day), Apple's Siri and Quicktype on all iPhones, the answers of Amazon's Alexa, and numerous other applications. In May 2015, his team published the Highway Net, the first working really deep feedforward NN with hundreds of layers—its open-gated version called ResNet (Dec 2015) has become the most cited NN of the 21st century, LSTM the most cited NN of the 20th (Bloomberg called LSTM the arguably most commercial AI achievement). His lab's NNs are now heavily used in healthcare and medicine, helping to make human lives longer and healthier. His research group also established the fields of mathematically rigorous universal AI and recursive self-improvement in metalearning machines that learn to learn (since 1987). In 1990, he introduced unsupervised generative adversarial neural networks that fight each other in a minimax game to implement artificial curiosity (the famous GANs are instances thereof). In 1991, he introduced neural fast weight programmers formally equivalent to what's now called Transformers with linearized self-attention (popular in natural language processing). His formal theory of creativity & curiosity & fun explains art, science, music, and humor. He also generalized algorithmic information theory and the many-worlds theory of physics, and introduced the concept of Low-Complexity Art, the information age's extreme form of minimal art. He is recipient of numerous awards, author of about 400 peer-reviewed papers, and Chief Scientist of the company NNAISENSE, which aims at building the first practical general purpose AI. He is a frequent keynote speaker, and advising various governments on AI strategies. More on all of this in the AI Blog.

Recurrent neural network and human brain Artificial Recurrent Neural Networks (1989-2014). Most work in machine learning focuses on machines with reactive behavior. RNNs, however, are more general sequence processors inspired by human brains. They have adaptive feedback connections and are in principle as powerful as any computer. The first RNNs could not learn to look far back into the past. But our "Long Short-Term Memory" (LSTM) RNN overcomes this fundamental problem, and efficiently learns to solve many previously unlearnable tasks. Winning Handwriting Recognition Competitions Through Recurrent and/or Deep Neural Networks It can be used for speech recognition, time series prediction, music composition, etc. In 2009, our LSTM RNNs became the first recurrent Deep Learning systems to win official international competitions (with secret test set known only to the organisers) - they outperformed all other known methods on the difficult problem of recognizing unsegmented cursive handwriting, and also on aspects of speech recognition. Our impact on the world's most valuable public companies: Apple (#1), Alphabet (Google, #2), Microsoft (#3), Amazon (#4), ... They learn through gradient descent and / or evolution or both. Compare the RNN Book Preface. LSTM has become popular: Google, Apple, Microsoft, Facebook, IBM, Baidu, and many other companies used LSTM RNNs to improve large vocabulary speech recognition, machine translation, language identification / time series prediction / text-to-speech synthesis, etc.

Deep Learning - Computer Vision - Fast Deep Neural Nets Etc Yield Best Results on Many Vision Benchmarks Deep Learning & Computer Vision with Fast Deep Neural Nets. The future of search engines and robotics lies in image and video recognition. Since 2009, our Deep Learning team has won 9 (nine) first prizes in important and highly competitive international contests (with secret test sets known only to the organisers), far more than any other team. Our neural nets also set numerous world records, and were the first Deep Learners to win pattern recognition contests in general (2009), History of computer vision contests won by deep CNNs on GPU the first to win object detection contests (2012), the first to win a pure image segmentation contest (2012), and the first machine learning methods to reach superhuman visual recognition performance in a contest (2011). Compare this Google Tech Talk (2011) and JS' first Deep Learning system of 1991, with a Deep Learning timeline 1962-2013. See also the history of computer vision contests won by deep CNNs on GPU since 2011. And check out the amazing Highway Networks (2015), the deepest of them all.

Goedel machine logo Gödel machine: An old dream of computer scientists is to build an optimally efficient universal problem solver. The Gödel machine can be implemented on a traditional computer and solves any given computational problem in an optimal fashion inspired by Kurt Gödel's celebrated self-referential formulas (1931). It starts with an axiomatic description of itself, and we may plug in any utility function, such as the expected future reward of a robot. Using an efficient proof searcher, the Gödel machine will rewrite any part of its software (including the proof searcher) GOEDEL MACHINE SUMMARY as soon as it has found a proof that this will improve its future performance, given the utility function and the typically limited computational resources. Self-rewrites are globally optimal (no local maxima!) since provably none of all the alternative rewrites and proofs (those that could be found by continuing the proof search) are worth waiting for. The Gödel machine formalizes I. J. Good's informal remarks (1965) on an "intelligence explosion" through self-improving "super-intelligences". Summary. FAQ.

OOPS search tree Optimal Ordered Problem Solver. OOPS solves one task after another, through search for solution- computing programs. The incremental method optimally exploits solutions to earlier tasks when possible - compare principles of Levin's optimal universal search. OOPS can temporarily rewrite its own search procedure, efficiently searching for faster search methods (metasearching or metalearning). It is applicable to problems of optimization or prediction. Talk slides.

A. Kolmogorov with K^E superimposed Super Omegas and Generalized Kolmogorov Complexity and Algorithmic Probability. Kolmogorov's (left) complexity K(x) of a bitstring x is the length of the shortest program that computes x and halts. Solomonoff's algorithmic probability of x is the probability of guessing a program for x. Chaitin's Omega is the halting probability of a Turing machine with random input (Omega is known as the "number of wisdom" because it compactly encodes all mathematical truth). Schmidhuber generalized all of this to non-halting but converging programs. This led to the shortest possible formal descriptions and to non-enumerable but limit-computable measures and Super Omegas, and even has consequences for computable universes and optimal inductive inference. Slides.

Universal AI Universal Learning Algorithms. There is a theoretically optimal way of predicting the future, given the past. It can be used to define an optimal (though noncomputable) rational agent that maximizes its expected reward in almost arbitrary environments sampled from computable probability distributions. This work represents the first mathematically sound theory of universal artificial intelligence - most previous work on AI was either heuristic or very limited.

Dice clock illustrating a natural probabilistic bias towards the 
quickly computable things Speed Prior. Occam's Razor: prefer simple solutions to complex ones. But what exactly does "simple" mean? According to tradition something is simple if it has a short description or program, that is, it has low Kolmogorov complexity. This leads to Solomonoff's & Levin's miraculous probability measure which yields optimal though noncomputable predictions, given past observations. The Speed Prior is different though: it is a new simplicity measure based on the fastest way of describing objects, not the shortest. Unlike the traditional one, it leads to near-optimal computable predictions, and provokes unusual prophecies concerning the future of our universe. Talk slides. Transcript of TEDx talk.

Galaxy and binary code In the Beginning was the Code. In 1996 Schmidhuber wrote the first paper about all possible computable universes. His `Great Programmer' is consistent with Zuse's thesis (1967) of computable physics, against which there is no physical evidence, contrary to common belief. If everything is computable, then which exactly is our universe's program? It turns out that the simplest program computes all universes, not just ours. Later work (2000) on Algorithmic Theories of Everything analyzed all the universes with limit-computable probabilities as well as the very limits of formal describability. This paper led to above-mentioned generalizations of algorithmic information and probability and Super Omegas as well as the Speed Prior. See comments on Wolfram's 2002 book and letter on randomness in physics (Nature 439, 2006). Talk slides, TEDx video, transcript.

Surgery robot learns to tie a knot Learning Robots. Some hardwired robots achieve impressive feats. But they do not learn like babies do. Traditional reinforcement learning algorithms are limited to simple reactive behavior and do not work well for realistic robots. Hence robot learning requires novel methods for learning to identify important past events and memorize them until needed. Our group is focusing on the above-mentioned recurrent neural networks, RNN evolution, Compressed Network Search, and policy gradients. AAAI 2013 Best Student Video Collaborations: with UniBW on robot cars, with TUM-AM on humanoids learning to walk, with DLR on artificial hands. New IDSIA projects on developmental robotics with curious adaptive humanoids have started in 2009. See AAAI 2013 Best Student Video.

Financial Forecasting. Our most lucrative neural network application employs a second-order method for finding the simplest model of stock market training data.

Link to the first artificial fovea sequentially steered by a learning neural controller (1990). Juergen Schmidhuber Learning attentive vision. Humans and other biological systems use sequential gaze shifts for pattern recognition. This can be much more efficient than fully parallel approaches to vision. In 1990 we built an artificial fovea controlled by an adaptive neural controller. Without a teacher, it learns to find targets in a visual scene, and to track moving targets.

optimizing flow through evolution Artificial Evolution. State-of-the-art methods for network evolution co-evolve all neurons in parallel (excellent results in various applications). EVOLINO outperforms previous methods on several supervised learning tasks, and yields the first recurrent support vector machines. Probabilistic incremental program evolution evolves computer programs through probabilistic templates instead of program populations (first approach to evolving entire soccer team strategies from scratch). Compressed Network Search Finds Complex Neural Controllers with a Million Weights As an undergrad Schmidhuber also implemented the first genetic programming system with loops and variable length code (1987, see below). Our novel Natural Evolution Strategies (2008-) yield excellent results and link policy gradients to evolution. And while most previous algorithms can evolve only hundreds of adaptive parameters, but not millions, our Compressed Network Search (1995-) finds compact descriptions of huge networks. A 2013 variant was the first method to evolve neural network controllers with over a million weights. Compare work on learning to think.

Too much curiosity Interestingness & Active Exploration & Artificial Curiosity & Theory of Surprise (1990-2010). Schmidhuber's curious learning agents like to go where they expect to learn something. These rudimentary artificial scientists or artists are driven by intrinsic motivation, losing interest in both predictable and unpredictable things. A basis for much of the recent work in Developmental Robotics since 2004. Formal Theory of Creativity & Fun & Intrinsic Motivation According to Schmidhuber's formal theory of creativity, art and science and humor are just by-products of the desire to create / discover more data that is predictable or compressible in hitherto unknown ways! See Singularity Summit talk (2009).

Reinforcement learning mouse finds a piece of cheese Reinforcement Learning in partially observable worlds. Just like humans, reinforcement learners are supposed to maximize expected pleasure and minimize expected pain. Most traditional work is limited to reactive mappings from sensory inputs to actions. Our approaches (1989-2003) for partially observable environments are more general: they learn how to use memory and internal states, sometimes through evolution of RNN. The first universal reinforcement learner is optimal if we ignore computation time, and here is one that is optimal if we don't. The novel Natural Evolution Strategies (2008-) link policy gradients to evolution. See also Compressed Network Search.

weight patterns of receptive fields Unsupervised learning; non-linear ICA; history compression. Pattern recognition works better on non-redundant data with independent components. Schmidhuber's Predictability Minimization (1992) was the first non-linear neural algorithm for learning to encode redundant inputs in this way. It is based on co-evolution of unsupervised adversarial neural networks that fight each other in a minimax game. My First Deep Learning System of 1991
+ Deep Learning Timeline 1962-2013 His neural history compressors (1991) compactly encode sequential data for Deep Learning. And Lococode unifies regularization and unsupervised learning. The feature detectors generated by such unsupervised methods resemble those of our more recent supervised neural computer vision systems.

robot in swamp Metalearning Machines / Learning to Learn / Recursive Self- Improvement. Can we construct metalearning algorithms that learn better learning algorithms? This question has been a main drive of Schmidhuber's research since his 1987 diploma thesis. In 1993 he introduced self-referential weight matrices, and in 1994 self-modifying policies trained by the "success-story algorithm" (talk slides). His first bias-optimal metalearner was the above-mentioned Optimal Ordered Problem Solver (2002), and the ultimate metalearner is the Gödel machine (2003). See also work on "learning to think."

subgoals Automatic Subgoal Generators and Hierarchical Learning. There is no teacher providing useful intermediate subgoals for our reinforcement learning systems. In the early 1990s Schmidhuber introduced gradient-based (pictures) adaptive subgoal generators; in 1997 also discrete ones.

GP helix Program Evolution and Genetic Programming. As an undergrad Schmidhuber used Genetic Algorithms to evolve computer programs on a Symbolics LISP machine at SIEMENS AG. Two years later this was still novel: In 1987 he published world's 2nd paper on pure Genetic Programming (the first was Cramer's in 1985) and the first paper on Meta-Genetic Programming.

Learning Economies with Credit Conservation. In the late 1980s Schmidhuber developed the first credit-conserving reinforcement learning system based on market principles, and also the first neural one.

Neural Heat Exchanger. Like a physical heat exchanger, but with neurons instead of liquid. Perceptions warm up, expectations cool down.

fast weights Fast weights instead of recurrent nets. A slowly changing feedforward neural net learns to quickly manipulate short-term memory in quickly changing synapses of another net. More on fast weights. Evolution of fast weight control.

Low-co butterfly Complexity-Based Theory of Beauty. In 1997 Schmidhuber claimed: among several patterns classified as "comparable" by some subjective observer, the subjectively most beautiful is the one with the simplest description, given the observer's particular method for encoding and memorizing it. Exemplary applications include low-complexity faces Formal Theory of Creativity & Fun & Intrinsic Motivation and Low-Complexity Art, the computer-age equivalent of minimal art (Leonardo, 1997). A low-complexity artwork such as this Femme Fractale both `looks right' and is computable by a short program; a typical observer should be able to see its simplicity. The drive to create such art is explained by the formal theory of creativity.

Artificial Ants Artificial Ants & Swarm Intelligence. IDSIA's Artificial Ant Algorithms are multiagent optimizers that use local search techniques and communicate via artificial pheromones that evaporate over time. They broke several important benchmark world records. This work got numerous reviews in journals such as Nature, Science, Scientific American, TIME, NY Times, Spiegel, Economist, etc. It led to an IDSIA spin-off company called ANTOPTIMA. See also the AAAI 2011 best video on swarmbots.

All cartoons & artwork & Fibonacci web design templates copyright © by Jürgen Schmidhuber (except when indicated otherwise).

Image below taken from an interview for Deutsche Bank during the 2019 World Economic Forum in Davos.

Juergen Schmidhuber 2019 being interviewed by Deutsche Bank at WEF = Davos

Do not press the red button

dangerous red button