2005: First paper with "learn deep" in the title

Jürgen Schmidhuber (October 2020, updated 2025)
Pronounce: You_again Shmidhoobuh AI Blog
Twitter: @SchmidhuberAI

20-year anniversary: 1st paper with "learn deep" in the title (2005)

In 2025, we celebrate the 20-year anniversary of the first machine learning paper with the word combination "learn deep" in the title (2005) [DL6]. It showed how deep reinforcement learning (RL) without a teacher can solve problems of depth 1000 [DL1] and more. Soon after its publication, everybody started talking about "deep learning." Causality or correlation?

In any case, it should be mentioned that the ancient term "deep learning" was introduced to the field of Machine Learning much earlier [DL2] by Rina Dechter in 1986 [Dec86], and to Artificial Neural Networks (NNs) by Aizenberg et al. in 2000 [Aiz00]. That is, in 2025, we are also celebrating the 25-year anniversary of the latter.

Of course, deep learning itself started much earlier, in 1965, when Ivakhnenko & Lapa had the first working algorithm for deep learning of internal representations [DEEP1]. Ivakhnenko's 1971 paper [DEEP2] already described a deep learning net with 8 layers [DLH][DLP][NOB].

The "learn deep" work of 2005 [DL6] was driven by my former senior researcher Faustino Gomez, now CEO of NNAISENSE. It was about deep RL with recurrent neural networks and neuroevolution. An algorithm called Hierarchical Enforced SubPopulations was used to simultaneously evolve NNs at two levels of granularity: full networks and network components or neurons. In partially observable environments, the method was applied to tasks that involve temporal dependencies of up to thousands of time-steps. It outperformed the best conventional RL systems.

We had many additional papers on these topics. See, e.g., the overview pages on reinforcement learning (since 1989), artificial evolution (since 1987), co-evolving recurrent neurons (since 2005), compressed network search (since 2013), Evolino (since 2005), genetic programming (since 1987). Even more papers on this can be found in my publication page. See also [DLH], Sec. 5 of [DEC], and Sec. 8 of [MIR].

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

References

[DEEP1] Ivakhnenko, A. G. and Lapa, V. G. (1965). Cybernetic Predicting Devices. CCM Information Corporation. First working Deep Learners with many layers, learning internal representations.

[DEEP1a] Ivakhnenko, Alexey Grigorevich. The group method of data of handling; a rival of the method of stochastic approximation. Soviet Automatic Control 13 (1968): 43-55.

[DEEP2] Ivakhnenko, A. G. (1971). Polynomial theory of complex systems. IEEE Transactions on Systems, Man and Cybernetics, (4):364-378.

[DL6] F. Gomez and J. Schmidhuber. Co-evolving recurrent neurons learn deep memory POMDPs. In Proc. GECCO'05, Washington, D. C., pp. 1795-1802, ACM Press, New York, NY, USA, 2005. PDF.

[DL1] J. Schmidhuber, 2015. Deep Learning in neural networks: An overview. Neural Networks, 61, 85-117. More.

[DL2] J. Schmidhuber, 2015. Deep Learning. Scholarpedia, 10(11):32832.

[DLH] J. Schmidhuber (2022). Annotated History of Modern AI and Deep Learning. Technical Report IDSIA-22-22, IDSIA, Lugano, Switzerland, 2022. Preprint arXiv:2212.11279. Tweet of 2022.

[DLP] J. Schmidhuber (2023). How 3 Turing awardees republished key methods and ideas whose creators they failed to credit. Technical Report IDSIA-23-23, Swiss AI Lab IDSIA, 14 Dec 2023. Tweet of 2023.

[Dec86] R. Dechter (1986). Learning while searching in constraint-satisfaction problems. University of California, Computer Science Department, Cognitive Systems Laboratory. [First paper to introduce the term "Deep Learning" to Machine Learning.]

[Aiz00] I. Aizenberg, N.N. Aizenberg, and J. P.L. Vandewalle (2000). Multi-Valued and Universal Binary Neurons: Theory, Learning and Applications. Springer Science & Business Media. [First work to introduce the term "Deep Learning" to Neural Networks.]

[MIR] J. Schmidhuber (AI Blog, Oct 2019, updated 2025). Deep Learning: Our Miraculous Year 1990-1991. Preprint arXiv:2005.05744. The deep learning neural networks (NNs) of our team have revolutionised pattern recognition & machine learning & AI. Many of the basic ideas behind this revolution were published within fewer than 12 months in our "Annus Mirabilis" 1990-1991 at TU Munich, including principles of (1) LSTM, the most cited AI of the 20th century (based on constant error flow through residual connections); (2) ResNet, the most cited AI of the 21st century (based on our LSTM-inspired Highway Network, 10 times deeper than previous NNs); (3) GAN (for artificial curiosity and creativity); (4) Transformer (the T in ChatGPT—see the 1991 Unnormalized Linear Transformer); (5) Pre-training for deep NNs (the P in ChatGPT); (6) NN distillation (see DeepSeek); (7) recurrent World Models, and more.

[NOB] J. Schmidhuber. A Nobel Prize for Plagiarism. Technical Report IDSIA-24-24 (7 Dec 2024). Sadly, the Nobel Prize in Physics 2024 for Hopfield & Hinton is a Nobel Prize for plagiarism. They republished methodologies for artificial neural networks developed in Ukraine and Japan by Ivakhnenko and Amari in the 1960s & 1970s, as well as other techniques, without citing the original papers. Even in later surveys, they didn't credit the original inventors (thus turning what may have been unintentional plagiarism into a deliberate form). None of the important algorithms for modern Artificial Intelligence were created by Hopfield & Hinton. See also popular tweet1, tweet2, and LinkedIn post.

[DEC] J. Schmidhuber (02/20/2020). The 2010s: Our Decade of Deep Learning / Outlook on the 2020s.

.