First Superhuman Visual Pattern Recognition 2011

Jürgen Schmidhuber (2011; minor updates for 10-year anniversary 2021)
Pronounce: You_again Shmidhoobuh

2011: First Superhuman Visual Pattern Recognition

On 6 August 2011 at the IJCNN 2011 computer vision competition in Silicon Valley, our artificial neural network called DanNet [5, 2, 1, 6] performed twice as good as humans, three times better than the closest artificial competitor [13], and six times better than the best non-neural method. Apparently, this was the first superhuman pattern recognition result in the history of computer vision. Today, many commercial applications are based on what started in 2011.

Our Deep Learning Neural Networks (NNs) were the first methods to achieve superhuman pattern recognition in an official international competition (with a secret test set known only to the organisers) [2, 1]. This was made possible through the work of my postdocs Dan Claudiu Ciresan & Ueli Meier and my PhD student Jonathan Masci (co-founder of NNAISENSE).

At IJCNN 2011 in San Jose (CA), our fast and deep convolutional NN (CNN) known as DanNet achieved 0.56% error rate in the IJCNN Traffic Sign Recognition Competition of INI/RUB [14, 14a-d]. Humans achieved 1.16% on average (over 2 times worse—some humans will do better than that though). The already impressive runner-up NN by Yann LeCun's team [13] was 3 times worse, achieving 1.69%. The best non-neural (natural or artificial) learner was over 6 times worse, achieving 3.86%.

A few months earlier, DanNet already won the qualifying in a 1st stage online competition, albeit by a much smaller margin: 1.02% vs 1.03% for second place [2, 13]. After the deadline, the organisers revealed that human performance on the test set was 1.19%. That is, the best methods already seemed human-competitive. However, during the 1st stage it was possible to incrementally gain information about the test set by probing it through repeated submissions. This can be seen from the better and better results obtained by various teams over time [14a] (the organisers eventually imposed a limit of ten resubmissions). In the final competition [14b] this was not possible, because each team had only one single trial.

I still remember when in 1997 many thought it a big deal that human chess world champion Kasparov was beaten by an IBM computer. But back then computers could not at all compete with little kids in visual pattern recognition, which seems much harder than chess from a computational perspective.

Although kids were still better general pattern recognisers in 2011, DanNet was able to learn to rival them in important limited domains. Furthermore, with each decade we gain another factor of 100 in terms of raw computational power per cent. Deep learning is here to stay.

Traffic sign recognisers are obviously important for self-driving cars. In 1994, the first fully autonomous cars appeared in traffic (Ernst Dickmanns & Mercedes Benz) [3]. For legal and safety reasons, a human had to be onboard. Superhuman pattern recognition could help to make robot taxis acceptable.

To achieve excellent pattern recognition, pure supervised gradient descent (the backpropagation technique of 1970 [4a, 4]) was applied [12a, 12b, 12c, 7, 8] to our GPU-based Deep and Wide Multi-Column Committees of Max-Pooling Convolutional Neural Networks [5, 6] with alternating weight-sharing convolutional layers [10a, 10b, 12a, 12b, 12c] and max-pooling layers [11, 11a, 7, 8] topped by fully connected layers [4]. This architecture is biologically rather plausible, inspired by early neuroscience-related work [9, 10a, 10b], although the training method is not. Additional tricks can be found in the papers [1, 2, 5, 6, 13, 15]. More on the history of CNNs in Sec. D & Sec. XVIII & Sec. XIV of [21] and Sec. 19 of [20].

Our deep and wide DanNet also was the first system with human-competitive performance [6] of around 0.2% error rate on MNIST handwritten digits [12c], once the most famous benchmark of Machine Learning. This represented a dramatic improvement, since by then the MNIST record had hovered around 0.4% for almost a decade.

In 2011-2012, DanNet won every contest it entered. In fact, it won four important computer vision competitions in a row before similar NNs won any [17,17a]. Most if not all leading IT companies and research labs are now using our combination of techniques, too. Compare [15-24].

Click here for the old version of this page before its update for the 10-year anniversary 2021. Can you spot the visual Fibonacci pattern in the graphics above? The contents of this article may be used for educational and non-commercial purposes, including articles for Wikipedia and similar sites.

References

[1] D. C. Ciresan, U. Meier, J. Masci, J. Schmidhuber. Multi-Column Deep Neural Network for Traffic Sign Classification. Neural Networks 32: 333-338, 2012. PDF of preprint.

[2] D. C. Ciresan, U. Meier, J. Masci, J. Schmidhuber. A Committee of Neural Networks for Traffic Sign Classification. International Joint Conference on Neural Networks (IJCNN-2011, San Francisco), 2011. PDF. HTML overview. [First superhuman performance in a computer vision contest, with half the error rate of humans, and one third the error rate of the closest competitor. This led to massive interest from industry.]

[3] J. Schmidhuber. Highlights of robot car history, 2005.

[4] P. J. Werbos. Applications of advances in nonlinear sensitivity analysis. In R. Drenick, F. Kozin, (eds): System Modeling and Optimization: Proc. IFIP, Springer, 1982. PDF. [First application of backpropagation [4a] to neural networks. Extending preliminary thoughts in his 1974 thesis.]

[4a] S. Linnainmaa. The representation of the cumulative rounding error of an algorithm as a Taylor expansion of the local rounding errors. Master's Thesis (in Finnish), Univ. Helsinki, 1970. See chapters 6-7 and FORTRAN code on pages 58-60. PDF. See also BIT 16, 146-160, 1976. Link. [The first publication on "modern" backpropagation, also known as the reverse mode of automatic differentiation.]

[5] D. C. Ciresan, U. Meier, J. Masci, L. M. Gambardella, J. Schmidhuber. D. C. Ciresan, U. Meier, J. Masci, L. M. Gambardella, J. Schmidhuber. Flexible, High Performance Convolutional Neural Networks for Image Classification. International Joint Conference on Artificial Intelligence (IJCAI-2011, Barcelona), 2011. PDF. ArXiv preprint (1 Feb 2011). [Speeding up deep CNNs on GPU by a factor of 60. Used to win four important computer vision competitions 2011-2012 before others won any with similar approaches.]

[6] D. C. Ciresan, U. Meier, J. Schmidhuber. Multi-column Deep Neural Networks for Image Classification. Proc. IEEE Conf. on Computer Vision and Pattern Recognition CVPR 2012, p 3642-3649, July 2012. PDF. Longer TR of Feb 2012: arXiv:1202.2745v1 [cs.CV]. More.

[7] M. A. Ranzato, Y. LeCun: A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images. Proc. ICDAR, 2007

[8] D. Scherer, A. Mueller, S. Behnke. Evaluation of pooling operations in convolutional architectures for object recognition. In Proc. ICANN 2010.

[9] Hubel, D. H., T. N. Wiesel. Receptive Fields, Binocular Interaction And Functional Architecture In The Cat's Visual Cortex. Journal of Physiology, 1962.

[10a] K. Fukushima: Neural network model for a mechanism of pattern recognition unaffected by shift in position—Neocognitron. Trans. IECE, vol. J62-A, no. 10, pp. 658-665, 1979. [The first deep convolutional neural network architecture, with alternating convolutional layers and downsampling layers. In Japanese. English version: [10b]. More in Scholarpedia.]

[10b] K. Fukushima. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 36(4): 193-202, 1980. Scholarpedia.

[11] Weng, J., Ahuja, N., and Huang, T. S. (1992). Cresceptron: a self-organizing neural network which grows adaptively. In International Joint Conference on Neural Networks (IJCNN), vol 1, p 576-581.

[11a] M. Riesenhuber, T. Poggio. Hierarchical models of object recognition in cortex. Nature Neuroscience 11, p 1019-1025, 1999.

[12a] A. Waibel. Phoneme Recognition Using Time-Delay Neural Networks. Meeting of IEICE, Tokyo, Japan, 1987. [First application of backpropagation [4a, 4] and weight-sharing to a convolutional network.]

[12b] A. Waibel, T. Hanazawa, G. Hinton, K. Shikano and K. J. Lang. Phoneme recognition using time-delay neural networks. IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 37, no. 3, pp. 328-339, March 1989.

[12c] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, L. D. Jackel: Backpropagation Applied to Handwritten Zip Code Recognition, Neural Computation, 1(4):541-551, 1989.

[13] P. Sermanet, Y. LeCun. Traffic sign recognition with multi-scale convolutional networks. Proc. IJCNN 2011, p 2809-2813, IEEE, 2011

[14] INI Benchmark Website: The German Traffic Sign Recognition Benchmark

[14a] Qualifying for IJCNN 2011 competition: results of 1st stage (January 2011)

[14b] Results for IJCNN 2011 competition (2 August 2011)

[14c] Stallkamp, J., Schlipsing, M., Salmen, J., and Igel, C. (2011). The German traffic sign recognition benchmark: A multi-class classification competition. In International Joint Conference on Neural Networks (IJCNN 2011), pages 1453-1460. IEEE Press.

[14d] Stallkamp, J., Schlipsing, M., Salmen, J., and Igel, C. (2012). Man vs. computer: Benchmarking machine learning algorithms for traffic sign recognition. Neural Networks, 32:323-332.

[15] J. Schmidhuber. Deep learning since 1991

[16] J. Schmidhuber (Sep 2020). 10-year anniversary of supervised deep learning breakthrough (2010). No unsupervised pre-training. The rest is history

[17] J. Schmidhuber. History of computer vision contests won by deep CNNs on GPU. March 2017. [How IDSIA used deep and fast GPU-based CNNs to win four important computer vision competitions 2011-2012 before others won contests using similar approaches.]

[17a] Reddit/ML, 2019. DanNet, the CUDA CNN of Dan Ciresan in J. Schmidhuber's team, won 4 image recognition challenges prior to AlexNet.

[18] J. Schmidhuber, 2017. Our impact on the world's most valuable public companies: 1. Apple, 2. Alphabet (Google), 3. Microsoft, 4. Facebook, 5. Amazon ....

[19] J. Schmidhuber (2020). The 2010s: Our Decade of Deep Learning / Outlook on the 2020s.

[20] J. Schmidhuber (2019). Deep Learning: Our Miraculous Year 1990-1991. See also arxiv:2005.05744.

[21] J. Schmidhuber (2020). Critique of 2018 Turing Award for deep learning.

[22] J. Schmidhuber (2015): Overview of Highway Networks: First working really deep feedforward neural networks with over 100 layers. (Updated 2020 for 5-year anniversary.)

[23] J. Schmidhuber, 2015. Deep Learning in neural networks: An overview. Neural Networks, 61, 85-117. More.

[24] J. Schmidhuber, 2015. Deep Learning. Scholarpedia, 10(11):32832.
.