First Superhuman Visual Pattern Recognition 2011

Jürgen Schmidhuber (2011; updated 2025)
Pronounce: You_again Shmidhoobuh

2011: First Superhuman Visual Pattern Recognition

On 6 August 2011 at the IJCNN 2011 computer vision competition in Silicon Valley, our artificial neural network called DanNet [5,2,1,6] performed twice as good as humans, three times better than the closest artificial competitor [13], and six times better than the best non-neural method. Apparently, this was the first superhuman pattern recognition result in the history of computer vision. Today, many commercial applications are based on what started in 2011.

Our Deep Learning Neural Networks (NNs) were the first methods to achieve superhuman pattern recognition in an official international competition (with a secret test set known only to the organisers) [2,1]. This was made possible through the work of my postdocs Dan Claudiu Ciresan & Ueli Meier and my PhD student Jonathan Masci (co-founder of NNAISENSE).

At IJCNN 2011 in San Jose (CA), our fast and deep convolutional NN (CNN) known as DanNet achieved 0.56% error rate in the IJCNN Traffic Sign Recognition Competition of INI/RUB [14,14a-d]. Humans achieved 1.16% on average (over 2 times worse—some humans will do better than that though). The already impressive runner-up NN by Yann LeCun's team [13] was 3 times worse, achieving 1.69%. The best non-neural (natural or artificial) learner was over 6 times worse, achieving 3.86%.

A few months earlier, DanNet already won the qualifying in a 1st stage online competition, albeit by a much smaller margin: 1.02% vs 1.03% for second place [2,13]. After the deadline, the organisers revealed that human performance on the test set was 1.19%. That is, the best methods already seemed human-competitive. However, during the 1st stage it was possible to incrementally gain information about the test set by probing it through repeated submissions. This can be seen from the better and better results obtained by various teams over time [14a] (the organisers eventually imposed a limit of ten resubmissions). In the final competition [14b] this was not possible, because each team had only one single trial.

I still remember when in 1997 many thought it a big deal that human chess world champion Kasparov was beaten by an IBM computer. But back then computers could not at all compete with little kids in visual pattern recognition, which seems much harder than chess from a computational perspective.

Although kids were still better general pattern recognisers in 2011, DanNet was able to learn to rival them in important limited domains. Furthermore, with each decade we gain another factor of 100 in terms of raw computational power per cent. Deep learning is here to stay.

Traffic sign recognisers are obviously important for self-driving cars. In 1994, the first fully autonomous cars appeared in traffic (Ernst Dickmanns & Mercedes Benz) [3]. For legal and safety reasons, a human had to be onboard. Superhuman pattern recognition could help to make robot taxis acceptable.

CNNs originated in Japan over 4 decades ago between 1979 and 1988 [10a-d]: the basic CNN architecture with convolutional layers and downsampling layers is due to Fukushima (1979) [10a,b]. In 1987, NNs with 1-dimensional convolutions were combined by Waibel [10c,e] with backpropagation, a technique from 1970 [4a], and with weight sharing. Waibel did not call this CNNs but TDNNs. One year later, in 1988, the first "modern" backpropagation-trained 2-dimensional CNNs were published by Zhang and colleagues [10d]. LeCun's team later applied CNNs to the MNIST dataset, e.g., [10f]. The popular downsampling variant called max-pooling was introduced by Yamaguchi et al. for TDNNs in 1990 [11] and by Weng et al. for higher-dimensional CNNs in 1993 [12]. The CNN architecture is biologically rather plausible, inspired by early neuroscience-related work [9,10a-b], although the training method is not. Additional tricks can be found in [1,2,5,6,13,15]. More on the history of CNNs in [20-21,23-26].

Our deep and wide DanNet also was the first system with human-competitive performance [6] of around 0.2% error rate on MNIST handwritten digits [12c], once the most famous benchmark of Machine Learning. This represented a dramatic improvement, since by then the MNIST record had hovered around 0.4% for almost a decade.

In 2011-2012, DanNet won every contest it entered. In fact, it won four important computer vision competitions in a row before similar NNs won any [17,17a]. Most if not all leading IT companies and research labs are now using our combination of techniques, too. Compare [15-24].

Click here for the old version of this page before its update for the 10-year anniversary 2021. Can you spot the visual Fibonacci pattern in the graphics above? The contents of this article may be used for educational and non-commercial purposes, including articles for Wikipedia and similar sites.

References

[1] D. C. Ciresan, U. Meier, J. Masci, J. Schmidhuber. Multi-Column Deep Neural Network for Traffic Sign Classification. Neural Networks 32: 333-338, 2012. PDF of preprint.

[2] D. C. Ciresan, U. Meier, J. Masci, J. Schmidhuber. A Committee of Neural Networks for Traffic Sign Classification. International Joint Conference on Neural Networks (IJCNN-2011, San Francisco), 2011. PDF. HTML overview. [First superhuman performance in a computer vision contest, with half the error rate of humans, and one third the error rate of the closest competitor. This led to massive interest from industry.]

[3] J. Schmidhuber. Highlights of robot car history, 2005.

[4] P. J. Werbos. Applications of advances in nonlinear sensitivity analysis. In R. Drenick, F. Kozin, (eds): System Modeling and Optimization: Proc. IFIP, Springer, 1982. PDF. [First application of backpropagation [4a] to neural networks. Extending preliminary thoughts in his 1974 thesis.]

[4a] S. Linnainmaa. The representation of the cumulative rounding error of an algorithm as a Taylor expansion of the local rounding errors. Master's Thesis (in Finnish), Univ. Helsinki, 1970. See chapters 6-7 and FORTRAN code on pages 58-60. PDF. See also BIT 16, 146-160, 1976. Link. [The first publication on "modern" backpropagation, also known as the reverse mode of automatic differentiation.]

[5] D. C. Ciresan, U. Meier, J. Masci, L. M. Gambardella, J. Schmidhuber. D. C. Ciresan, U. Meier, J. Masci, L. M. Gambardella, J. Schmidhuber. Flexible, High Performance Convolutional Neural Networks for Image Classification. International Joint Conference on Artificial Intelligence (IJCAI-2011, Barcelona), 2011. PDF. ArXiv preprint (1 Feb 2011). [Speeding up deep CNNs on GPU by a factor of 60. Used to win four important computer vision competitions 2011-2012 before others won any with similar approaches.]

[6] D. C. Ciresan, U. Meier, J. Schmidhuber. Multi-column Deep Neural Networks for Image Classification. Proc. IEEE Conf. on Computer Vision and Pattern Recognition CVPR 2012, p 3642-3649, July 2012. PDF. Longer TR of Feb 2012: arXiv:1202.2745v1 [cs.CV]. More.

[7] M. A. Ranzato, Y. LeCun: A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images. Proc. ICDAR, 2007

[8] D. Scherer, A. Mueller, S. Behnke. Evaluation of pooling operations in convolutional architectures for object recognition. In Proc. ICANN 2010.

[9] Hubel, D. H., T. N. Wiesel. Receptive Fields, Binocular Interaction And Functional Architecture In The Cat's Visual Cortex. Journal of Physiology, 1962.

[10a] K. Fukushima: Neural network model for a mechanism of pattern recognition unaffected by shift in position—Neocognitron. Trans. IECE, vol. J62-A, no. 10, pp. 658-665, 1979. [The first deep convolutional neural network architecture, with alternating convolutional layers and downsampling layers. In Japanese. English version: [10b]. More in Scholarpedia.]

[10b] K. Fukushima. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 36(4): 193-202, 1980. Scholarpedia.

[10c] A. Waibel. Phoneme Recognition Using Time-Delay Neural Networks. Meeting of IEICE, Tokyo, Japan, 1987. [First application of backpropagation [4a] and weight-sharing to a 1-dimensional convolutional network.]

[10d] W. Zhang, J. Tanida, K. Itoh, Y. Ichioka. Shift-invariant pattern recognition neural network and its optical architecture. Proc. Annual Conference of the Japan Society of Applied Physics, 1988. First backpropagation-trained 2-dimensional CNN.

[10e] A. Waibel, T. Hanazawa, G. Hinton, K. Shikano and K. J. Lang. Phoneme recognition using time-delay neural networks. IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 37, no. 3, pp. 328-339, March 1989.

[10f] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, L. D. Jackel: Backpropagation Applied to Handwritten Zip Code Recognition, Neural Computation, 1(4):541-551, 1989.

[11] K. Yamaguchi, K. Sakamoto, A. Kenji, T. Akabane, Y. Fujimoto. A Neural Network for Speaker-Independent Isolated Word Recognition. First International Conference on Spoken Language Processing (ICSLP 90), Kobe, Japan, Nov 1990. A 1D NN with convolutions using Max-Pooling instead of Fukushima's Spatial Averaging.

[12] Weng, J., Ahuja, N., and Huang, T. S. (1993). Learning recognition and segmentation of 3-D objects from 2-D images. Proc. 4th Intl. Conf. Computer Vision, Berlin, Germany, pp. 121-128. A 2D CNN whose downsampling layers use Max-Pooling (which has become very popular) instead of Fukushima's Spatial Averaging.

[13] P. Sermanet, Y. LeCun. Traffic sign recognition with multi-scale convolutional networks. Proc. IJCNN 2011, p 2809-2813, IEEE, 2011

[14] INI Benchmark Website: The German Traffic Sign Recognition Benchmark

[14a] Qualifying for IJCNN 2011 competition: results of 1st stage (January 2011)

[14b] Results for IJCNN 2011 competition (2 August 2011)

[14c] Stallkamp, J., Schlipsing, M., Salmen, J., and Igel, C. (2011). The German traffic sign recognition benchmark: A multi-class classification competition. In International Joint Conference on Neural Networks (IJCNN 2011), pages 1453-1460. IEEE Press.

[14d] Stallkamp, J., Schlipsing, M., Salmen, J., and Igel, C. (2012). Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition. Neural Networks, 32:323-332.

[15] J. Schmidhuber. Deep learning since 1991

[16] J. Schmidhuber (Sep 2020). 10-year anniversary of supervised deep learning breakthrough (2010). No unsupervised pre-training. The rest is history

[17] J. Schmidhuber. History of computer vision contests won by deep CNNs on GPU. March 2017. [How IDSIA used deep and fast GPU-based CNNs to win four important computer vision competitions 2011-2012 before others won contests using similar approaches.]

[17a] Reddit/ML, 2019. DanNet, the CUDA CNN of Dan Ciresan in J. Schmidhuber's team, won 4 image recognition challenges prior to AlexNet.

[18] J. Schmidhuber, 2017. Our impact on the world's most valuable public companies: 1. Apple, 2. Alphabet (Google), 3. Microsoft, 4. Facebook, 5. Amazon ....

[19] J. Schmidhuber (2020). The 2010s: Our Decade of Deep Learning / Outlook on the 2020s.

[20] J. Schmidhuber (2019). Deep Learning: Our Miraculous Year 1990-1991. See also arxiv:2005.05744.

[21] J. Schmidhuber (AI Blog, 2022). Scientific Integrity and the History of Deep Learning: The 2021 Turing Lecture, and the 2018 Turing Award. Technical Report IDSIA-77-21, IDSIA, Lugano, Switzerland, 2022.

[22] J. Schmidhuber (2015): Overview of Highway Networks: First working really deep feedforward neural networks with over 100 layers. (Updated 2025 for 10-year anniversary.)

[23] J. Schmidhuber, 2015. Deep Learning in neural networks: An overview. Neural Networks, 61, 85-117. More.

[24] J. Schmidhuber, 2015. Deep Learning. Scholarpedia, 10(11):32832.

[25] J. Schmidhuber (AI Blog, 2023). How 3 Turing awardees republished key methods and ideas whose creators they failed to credit. Technical Report IDSIA-23-23, Swiss AI Lab IDSIA, 14 Dec 2023. The piece is aimed at people who are not aware of the numerous AI priority disputes, but are willing to check the facts (see tweet).

[26] J. Schmidhuber (AI Blog, 2022). Annotated History of Modern AI and Deep Learning. Technical Report IDSIA-22-22, IDSIA, Lugano, Switzerland, 2022. Preprint arXiv:2212.11279. Tweet of 2022.
.