★ Nov 2021: KAUST (17 full papers at NeurIPS 2021) and its environment are now offering enormous resources to advance both fundamental and applied AI research:
we are hiring outstanding professors, postdocs, and PhD students.
★ 1 Oct 2021: Starting as Director of the AI Initiative at KAUST, the university with the highest impact per faculty. Keeping current affiliations. Hiring on all levels. Great research conditions. I photographed the Dolphin above on a snorkeling trip off the coast of KAUST.
★ Sept 2021: The most cited neural networks all build on work done in my labs: 1. Long Short-Term Memory (LSTM). 2. ResNet (open-gated Highway Net). 3. AlexNet & VGG Net (our similar
DanNet of 2011 won 4 image recognition challenges before them). 4. GAN (an instance of our Adversarial Artificial Curiosity of 1990). 5. Transformer variants (linear Transformers are formally equivalent to our Fast Weight Programmers of 1991).
★ Sept 2021: Scientific Integrity, the 2021 Turing Lecture, and the 2018 Turing Award for Deep Learning: This is a point-for-point critique of ACM's justification of the ACM A. M. Turing Award for deep learning, as well as a critique of the Turing Lecture given by the awardees (published by ACM in July 2021). In brief, three Europeans went to North America, where they republished methods and concepts first published by other Europeans whom they did not cite—not even in later surveys. Instead, they credited each other at the expense of the field's pioneers. Apparently, the ACM was not aware of this. This work can also be seen as a short history of deep learning, at least as far as ACM's erroneous laudation and the Turing Lecture are concerned.
Sep 2021: Turing Oversold.
Alan M. Turing made certain significant contributions to computer science. However, their importance and impact is often greatly exaggerated, at the expense of the field's pioneers. It's not Turing's fault, though.
80th anniversary celebrations: 1941: Konrad Zuse completes the first working general purpose computer, based on his 1936 patent application. (German version published in Weltwoche, 19 Aug 2021, also online.)
90th anniversary of Kurt Gödel's 1931 paper
which laid the foundations of theoretical computer science, identifying fundamental limitations of algorithmic theorem proving, computing, AI, logics, and math itself (German version published in FAZ, 16/6/2021). This reached the top position of the Hacker News front page.
★ 375th birthday of Gottfried Wilhelm Leibniz, founder of computer science (just published in FAZ, 17/5/2021, with an interview): 1st machine with a memory (1673); 1st to perform all arithmetic operations. Principles of binary computers (1679). Algebra of Thought (1686) deductively equivalent to the much later Booelan Algebra. Calculemus! (Deutsch)
★ ICLR 2021:
Our five submissions got accepted
(probability < 0.002 according to the acceptance rate). Additional papers at AAAI 2021 and ICML 2021.
★ 13 April 11:15-12:00: Talk as Chief Scientist of NNAISENSE on use cases of industrial AI at Hannover Messe, the world's largest trade fair (opened by Angela Merkel). Same day 14:00-15:00: Talk on "Modern AI 1980s-2021 and Beyond" at GTC 2021 (opened by NVIDIA CEO Jensen Huang).
★ 2021: LEGO Art: Stable rings from rectangular LEGO bricks (2001).
Two decades ago,
I stole a basic LEGO kit from my daughters, and discovered several different ways of making very stable rings and other curved objects from only rectangular LEGO bricks.
Natural tilting angles between LEGO pieces define ring diameters. The resulting low-complexity artworks reflect the formal theory of beauty/creativity/curiosity.
★ 26 Mar 2021: 26 Mar 1991: Neural nets learn to program neural nets with fast weights—like today's Transformer variants. 2021: New stuff!
How can artificial neural nets process sequential data such as videos, speech, and text? Traditionally this is done with recurrent nets. 3 decades ago, however, I published a now popular alternative. A feedforward net slowly learns by gradient descent to program changes of fast weights of another net. Such Fast Weight Programmers learn to memorize past data, e.g., by computing fast weight changes through additive outer products of self-invented activation patterns (now often called keys and values for self-attention). The popular Transformers (2017) combine this with projections and softmax and are now widely used in natural language processing. For long input sequences, their efficiency was improved through linear Transformers or Performers whose core is formally equivalent to the 1991 Fast Weight Programmers. 2021: New stuff!
★ Mar 2021: 3 decades of artificial curiosity & creativity. Our artificial scientists not only answer given questions but also invent new questions. They achieve curiosity through: (1990) the principle of generative adversarial networks, (1991) neural nets that maximise learning progress, (1995) neural nets that maximise information gain (optimally since 2011), (1997) adversarial design of surprising computational experiments, (2006) maximizing compression progress like scientists/artists/comedians do, (2011) PowerPlay... Since 2012: applications to real robots.
★ Jan 2021: 30-year anniversary. 1991: First very deep learning with unsupervised pre-training. Unsupervised hierarchical predictive coding finds compact internal representations of sequential data to facilitate downstream learning. The hierarchy can be distilled into a single deep neural network (suggesting a simple model of conscious and subconscious information processing). 1993: solving problems of depth >1000.
★ Feb 2021: 10-year anniversary. In 2011, DanNet triggered the deep convolutional neural network (CNN) revolution. Named after my outstanding postdoc Dan Ciresan, it was the first deep and fast CNN to win international computer vision contests, and had a temporary monopoly on winning them, driven by a very fast implementation based on graphics processing units (GPUs). 1st superhuman result in 2011. Now everybody is using this approach.
★ 2017 (updated 2021 for 10th birthday of DanNet): History of computer vision contests won by deep CNNs since 2011. DanNet won 4 of them in a row before the similar AlexNet & VGG Net and the Resnet (a Highway Net with open gates) joined the party. Today, deep CNNs are standard in computer vision.
★ 2011 (updated 2021 for 10th birthday of DanNet): First superhuman visual pattern recognition.
At the IJCNN 2011 computer vision competition in Silicon Valley,
our artificial neural network called DanNet performed twice better than humans, three times better than the closest artificial competitor, and six times better than the best non-neural method.
★ Dec 2020: 1/3 century anniversary of
first publication on metalearning machines that learn to learn (1987).
For its cover I drew a robot that bootstraps itself.
1992-: gradient descent-based neural metalearning. 1994-: Meta-Reinforcement Learning with self-modifying policies. 1997: Meta-RL plus artificial curiosity and intrinsic motivation.
2002-: asymptotically optimal metalearning for curriculum learning. 2003-: mathematically optimal Gödel Machine. 2020: new stuff!
★ Dec 2020: 30-year anniversary of planning & reinforcement learning with recurrent world models and artificial curiosity (1990). This work also introduced high-dimensional reward signals, deterministic policy gradients for RNNs, and
the GAN principle (widely used today). Agents with adaptive recurrent world models even suggest a simple explanation of consciousness & self-awareness
(dating back three decades).
★ Dec 2020: 1/3 century anniversary of
Genetic Programming for code of unlimited size (1987).
GP is about solving problems by applying the principles of biological evolution to computer programs.
★ Dec 2020: 10-year anniversary of our journal paper on deep reinforcement learning with policy gradients for LSTM (2007-2010). Recent famous applications: DeepMind's Starcraft player (2019) and OpenAI's dextrous robot hand & Dota player (2018)—Bill Gates called this a huge milestone in advancing AI.
★ Oct 2020: 30-year anniversary of end-to-end differentiable sequential neural attention. Plus goal-conditional reinforcement learning. An artificial fovea learned to find objects in visual scenes through sequences of saccades. We had both hard attention (1990) and soft attention (1993). Today, both types are very popular. See also my Fast Weight Programmers of 1991 which are formally equivalent to attention-based linear Transformers.
★ Nov 2020: 15-year anniversary: 1st paper with "learn deep" in the title (2005). Our deep reinforcement learning & neuroevolution solved problems of depth 1000 and more. Soon after its publication, everybody started talking about "deep learning." Causality or correlation?
★ Sep 2020: 10-year anniversary of supervised deep learning breakthrough (2010). No unsupervised pre-training.
By 2010, when compute was 100 times more expensive than today, both our feedforward NNs and our earlier recurrent NNs were able to beat all competing algorithms on important problems of that time. This deep learning revolution quickly spread from Europe to North America and Asia. The rest is history.
★ Apr/Jun 2020: Critique of ACM's justification of the 2018 Turing Award for deep learning (backed up by 200+ references).
Similar critique of 2019 Honda Prize—science must not allow corporate PR to distort the academic record.
★ Apr 2020: AI v Covid-19.
I made a little cartoon and notes with references and links to the recent ELLIS workshops & JEDI Grand Challenge & other initiatives.
AI based on Neural Networks (NNs) and Deep Learning can help to fight Covid-19 in many ways. The basic principle is simple. Teach NNs to detect patterns in data from viruses and patients and others. Use those NNs to predict future consequences of possible actions. Act to minimize damage.
★ Apr 2020: Coronavirus geopolitics. Pandemics have greatly influenced the rise and fall of empires. What will be the impact of the current pandemic?
★ Feb 2020 (revised 2021): 2010-2020: our decade of deep learning. The recent decade's most important developments and industrial applications based on our AI, with an outlook on the 2020s, also addressing privacy and data markets.
★ Oct 2019 (revised 2021): Deep learning: our Miraculous Year 1990-1991. The deep learning neural networks of our team have revolutionised pattern recognition and machine learning, and are now heavily used in academia and industry. In 2020-21, we celebrate that many of the basic ideas behind this revolution were published within fewer than 12 months in our "Annus Mirabilis" 1990-1991 at TU Munich.
★ Nov 2018: Unsupervised neural networks fight in a minimax game (1990). To build curious artificial agents, I introduced a new type of active self-supervised learning in 1990. It is based on a duel where one
neural net minimizes the objective function maximized by another.
GANs are simple special cases.
Today, this principle is widely used.
★ Aug 2017:
Our impact on the world's most valuable public companies: Apple, Google, Microsoft, Facebook, Amazon... By 2015-17, neural nets developed in my labs were on over 3 billion devices such as smartphones, and used many billions of times per day, consuming a significant fraction of the world's compute. Examples: greatly improved (CTC-based) speech recognition on all Android phones, greatly improved machine translation through Google Translate and Facebook (over 4 billion LSTM-based translations per day), Apple's Siri and Quicktype on all iPhones, the answers of Amazon's Alexa, etc. Google's 2019
on-device speech recognition
(on the phone, not the server)
is still based on
★ 2017-: Many jobs for PhD students and PostDocs
★ Jul 2016: I got the 2016 IEEE CIS Neural Networks Pioneer Award for "pioneering contributions to deep learning and neural networks."
★ May 2015: Highway Networks: First working feedforward neural networks with over 100 layers (updated 2020 for 5-year anniversary). Previous neural nets had at most a few tens of layers. Highway Nets excel at ImageNet & natural language processing & other tasks. Based on LSTM principle. Open their gates to get a famous special case called Residual Nets.
★ Oct 2015: Brainstorm open source
software for neural networks. Before Google's Tensorflow dethroned our Brainstorm, this open source software made the Swiss AI Lab IDSIA the top trending Python developer on Github, ahead of Facebook, Google, and Dropbox.
★ 2014-15: Who invented backpropagation? (Updated 2020 for 1/2 century anniversary.) The "modern" version of backpropagation, the reverse mode of automatic differentiation, was published in 1970 by Finnish master student Seppo Linnainmaa. 2020: 60-year anniversary of Kelley's precursor (1960)
★ Feb 2015: DeepMind's Nature paper and earlier related work
★ Jan 2015: Deep learning in neural networks: an overview. This paper of 2015 got the first best paper award ever issued by the journal Neural Networks, founded in 1988. It has also become the most cited paper of Neural Networks.
★ July 2013:
Compressed network search:
First deep learner to learn control policies directly from high-dimensional sensory input using reinforcement learning. (More.)
★ 2013: Sepp Hochreiter's fundamental deep learning problem (1991). (More.)
★ Sep 2012: First deep learner to win a medical imaging contest (cancer detection)
★ Mar 2012: First deep learner to win an image segmentation competition
★ Aug 2011: First superhuman visual pattern recognition.
Recurrent neural networks - especially Long Short-Term Memory or LSTM. See also this more recent LSTM summary of 2020.
✯ 2011: Preface of book on recurrent neural networks
✯ 2009-: First contests won by recurrent nets (2009)
and deep feedforward nets (2010)
✯ 2009-: Winning computer vision contests through deep learning
Evolving recurrent neurons - first paper with "learn deep" in the title. More. See also
15th anniversary (2020)
✯ 1991-: First working deep learner based on unsupervised pre-training + Deep Learning Timeline 1962-2013. More. See also
30th anniversary (2021)
Deep learning & neural computer vision. Our simple training algorithms for deep, wide, often recurrent, artificial neural networks similar to biological brains were the first to win competitions on a routine basis and yielded best known results on many famous benchmarks for computer vision, speech recognition, etc. Today, everybody is using them.
✯ 1991-: Unsupervised learning
✯ 1991-: Neural heat exchanger
Meta-learning or learning to learn. See also:
1/3 century anniversary of
Gödel machines as mathematically optimal general self-referential problem solvers
Asymptotically optimal curriculum learner: the optimal ordered problem solver OOPS
Theory of universal artificial intelligence
Generalized algorithmic information & Kolmogorov complexity
✯ 2000-: Speed Prior: a new simplicity measure for near-optimal computable predictions
Computable universes / theory of everything / generalized algorithmic information
Subgoal learning & hierarchical reinforcement learning. (More.)
✯ 1990-: Learning attentive vision (more) &
goal-conditional reinforcement learning.
See also 30-year anniversary (2020)
✯ 1989: Reinforcement learning economies
with credit conservation
✯ 2005-: Evolino
✯ 1987-: Genetic programming. See also 1/3 century anniversary (2020)
✯ 2004-2009: TU Munich Cogbotlab at TUM
✯ 2004-2009: CoTeSys cluster of excellence
✯ 2007: Highlights of robot car history
✯ 2004: Statistical robotics
✯ 2004: Resilient machines &
✯ 2000-: Artificial Intelligence
Artificial curiosity & creativity & intrinsic motivation & developmental robotics. (More.)
✯ 1990-: Formal theory of creativity
Evolution of national Nobel Prize shares in the 20th century.
Switzerland - best country in the world?
✯ 2010: A new kind of empire?
✯ 2000s: Einstein &
Haber & Bosch &
& Schwarzenegger &
✯ 2012: Olympic medal statistics & Bolt
✯ 2006: Is history converging? Again?
✯ 1990: Computer history speedup
Theory of beauty and
✯ 2001: Lego Art
✯ 2010: Fibonacci web design
✯ 2007: J.S.'s painting of his daughters and related work
✯ 1987-: What's new?
Old talk videos up to 2015
✯ 1981: Closest brush with fame
✯ 2010-: Master's in artificial intelligence