We are currently experiencing a second Neural Network
ReNNaissance (title of JS' IJCNN 2011 keynote) - the first one happened in the 1980s and early 90s.
In many applications, our deep NNs are now outperforming all other methods
including
the theoretically less general and less powerful support vector machines
(which for a long time had the upper hand, at least in practice).
Check out the, in hindsight, not too optimistic
predictions of our RNNaissance workshop at NIPS 2003,
and compare the RNN book preface.
Computer Vision Team (ex-)members in Schmidhuber's lab(s):
Dan Ciresan,
Ueli Meier,
Jonathan Masci,
Somayeh Danafar,
Alex Graves,
Davide Migliore.
For medical imaging, we also work with
Alessandro Giusti
in the group of
Luca Maria Gambardella.
Our work builds on earlier work by great neural network pioneers including Bryson, Kelley, Dreyfus, Werbos, Fukushima, Amari, LeCun, Hinton, Williams, Rumelhart, Poggio, von der Malsburg, Kohonen, and others
(more).
SELECTED PUBLICATIONS
[24]
J. Schmidhuber.
Deep Learning in Neural Networks: An Overview.
Neural Networks, Volume 61, January 2015, Pages 85-117, published online in 2014 (DOI: 10.1016/ j.neunet.2014.09.003). Draft of invited survey (88 pages, 888 references):
Preprint IDSIA-03-14 / arXiv:1404.7828 [cs.NE];
version
v4 (PDF, 8 Oct 2014);
LATEX source;
complete public BIBTEX file (888 kB).
(Older PDF versions:
v1 of 30 April;
v1.5 of 15 May;
v2 of 28 May;
v3 of 2 July.)
HTML overview page.
[23]
D. Ciresan, J. Schmidhuber. Multi-Column Deep Neural Networks for Offline Handwritten Chinese Character Classification. Preprint arXiv:1309.0261, 1 Sep 2013.
[22]
J. Masci, A. Giusti, D. Ciresan, G. Fricout, J. Schmidhuber. A Fast Learning Algorithm for Image Segmentation with Max-Pooling Convolutional Networks. ICIP 2013. Preprint arXiv:1302.1690.
On object detection in large images, now scanned by our deep networks 1500 times faster than with previous methods.
[21]
A. Giusti, D. Ciresan, J. Masci, L.M. Gambardella, J. Schmidhuber. Fast Image Scanning with Deep Max-Pooling Convolutional Neural Networks. ICIP 2013. Preprint arXiv:1302.1700
[20]
D. Ciresan, A. Giusti, L. M. Gambardella, J. Schmidhuber. Mitosis Detection in Breast Cancer Histology Images using Deep Neural Networks. MICCAI 2013. PDF.
[19]
D. Ciresan, U. Meier, J. Schmidhuber.
Transfer Learning for Latin and Chinese Characters with Deep Neural Networks.
Proc. IJCNN 2012, p 1301-1306, 2012.
PDF.
Pretrain on one data set, profit on another.
[18]
D. C. Ciresan, U. Meier, J. Masci, J. Schmidhuber.
Multi-Column Deep Neural Network for Traffic Sign Classification.
Neural Networks 32, p 333-338, 2012.
PDF of preprint.
(First
superhuman visual pattern recognition.)
[17]
D. C. Ciresan, U. Meier, J. Schmidhuber.
Multi-column Deep Neural Networks for Image Classification.
IEEE Conf. on Computer Vision and Pattern Recognition CVPR 2012, p 3642-3649, 2012.
PDF.
Longer preprint
arXiv:1202.2745v1 [cs.CV].
[16]
J. Masci, U. Meier, D. Ciresan, G. Fricout, J. Schmidhuber.
Steel Defect Classification with Max-Pooling Convolutional Neural Networks.
Proc. IJCNN 2012. PDF.
[15]
D. Ciresan, A. Giusti, L. Gambardella, J. Schmidhuber.
Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images.
In Advances in Neural Information Processing Systems (NIPS 2012), Lake Tahoe,
2012. PDF. (See also
ISBI EM Competition Abstracts.)
[14]
J. Nagi, F. Ducatelle, G. A. Di Caro, D. Ciresan, U. Meier, A. Giusti, F. Nagi, J. Schmidhuber, L. M. Gambardella. Max-Pooling Convolutional Neural Networks for Vision-based Hand Gesture Recognition. Proc. 3rd IEEE Intl. Conf. on Signal & Image Processing and Applications (ICSIPA), Kuala Lumpur, 2011.
PDF.
[13]
J. Schmidhuber, D. Ciresan, U. Meier, J. Masci, A. Graves.
On Fast Deep Nets for AGI Vision.
In Proc. Fourth Conference on Artificial General Intelligence (AGI-11),
Google, Mountain View, California, 2011.
PDF.
Video.
[12]
D. C. Ciresan, U. Meier, L. M. Gambardella, J. Schmidhuber.
Convolutional Neural Network Committees For Handwritten Character Classification.
11th International Conference on Document Analysis and Recognition (ICDAR 2011),
Beijing, China, 2011.
PDF.
[11]
U. Meier, D. C. Ciresan, L. M. Gambardella, J. Schmidhuber.
Better Digit Recognition with a Committee of Simple Neural Nets.
11th International Conference on Document Analysis and Recognition (ICDAR 2011),
Beijing, China, 2011.
PDF.
[10]
D. C. Ciresan, U. Meier, J. Masci, J. Schmidhuber.
A Committee of Neural Networks for Traffic Sign Classification.
International Joint Conference on Neural Networks (IJCNN-2011, San Francisco), 2011.
PDF.
[9]
D. C. Ciresan, U. Meier, L. M. Gambardella, J. Schmidhuber.
Handwritten Digit Recognition with a Committee of Deep Neural Nets on GPUs.
ArXiv Preprint
arXiv:1103.4487v1 [cs.LG], 23 Mar 2011.
[8]
D. C. Ciresan, U. Meier, J. Masci, L. M. Gambardella, J. Schmidhuber.
Flexible, High Performance Convolutional Neural Networks for Image Classification.
International Joint Conference on Artificial Intelligence (IJCAI-2011, Barcelona), 2011. PDF.
ArXiv preprint, 1 Feb 2011.
Describes our special breed of max-pooling convolutional networks (MPCNN), now widely tested/used by research labs (e.g., Univ. Toronto/Google/Stanford) and companies (e.g., Apple) all over the world.
[7]
S. Danafar, A. Giusti, J. Schmidhuber. New State-of-the-Art Recognizers of Human Actions. EURASIP Journal on Advances in Signal Processing, doi:10.1155/2010/202768, 2010.
HTML.
[6]
D. C. Ciresan, U. Meier, L. M. Gambardella, J. Schmidhuber.
Deep Big Simple Neural Nets For Handwritten Digit Recognition.
Neural Computation 22(12): 3207-3220, 2010.
ArXiv Preprint
arXiv:1003.0358v1 [cs.NE], 1 March 2010.
[5] G. Corani, A. Giusti, D. Migliore, J. Schmidhuber.
Robust Texture Recognition Using Credal Classifiers.
Proc. BMVC, p 78.1-78.10. BMVA Press, 2010. doi:10.5244/C.24.78.
HTML.
[4]
A. Graves, J. Schmidhuber.
Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks.
Advances in Neural Information Processing Systems 22, NIPS'22, p 545-552,
Vancouver, MIT Press, 2009.
PDF.
[3]
A. Graves, S. Fernandez, J. Schmidhuber. Multi-Dimensional Recurrent
Neural Networks.
Intl. Conf. on Artificial Neural Networks ICANN'07,
2007.
Preprint: arxiv:0705.2011.
PDF.
[3a] A. Graves, S. Fernandez, F. Gomez, J. Schmidhuber. Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks. ICML 06, Pittsburgh, 2006.
PDF.
[2]
M.v.d. Giessen and J. Schmidhuber.
Fast color-based object recognition independent of position and
orientation.
In W. Duch et al. (Eds.):
Proc. Intl. Conf. on Artificial Neural Networks ICANN'05,
LNCS 3696, pp. 469-474, Springer-Verlag Berlin Heidelberg, 2005.
PDF.
Ongoing work on active perception.
While the methods above tend to work fine in many applications,
they are passive learners - they do not learn to actively
search for the most informative image parts. Humans, however,
use sequential gaze shifts for pattern recognition.
This can be much more efficient than the fully parallel one-shot approach.
That's why we want to combine the algorithms above with variants of our old method of 1990 - back then
we built what to our knowledge was
the first artificial fovea sequentially steered by a learning neural controller.
Without a teacher, it used a variant of reinforcement learning
to create saccades and find targets in a visual scene (and to track moving targets), although computers were a million times slower back then:
[1]
J. Schmidhuber, R. Huber.
Learning to
generate artificial fovea trajectories for target detection.
International Journal of Neural Systems, 2(1 & 2):135-141, 1991
(figures omitted).
PDF.
HTML.
HTML overview with figures.
More on active learning without a teacher in the overview
pages on the Formal Theory of Creativity
and
Curiosity.
More Deep Learning Web Sites:
Deep Learning since 1991 (overview site derived from the present page)
Sept/Oct 2013: G+ posts on Deep Learning
Deep NN win MICCAI 2013 Grand Challenge
and 2012 ICPR Contest on Mitosis Detection (first Deep Learner to win a contest on object detection in large images)
Deep NN win 2012 Brain Image Segmentation Contest (first image segmentation competition won by a feedforward Deep Learner)
2012: 8th international pattern recognition contest won since 2009
2011: First superhuman visual pattern recognition
(twice better than humans, three times better than the closest artificial competitor, six times better than the best non-neural method)
2009: First official international pattern recognition contests won by Deep Learning (connected handwriting through LSTM RNN: simultaneous segmentation and recognition)
1997: First purely supervised Deep Learner (LSTM RNN)
JS' first Deep Learner of 1991 + Deep Learning Timeline 1962-2013 (also summarises the origins of backpropagation, still the central algorithm of Deep Learning)
1991: Fundamental Deep Learning Problem discovered and analysed and partially solved
.
|
|
COMPETITION DETAILS
Links to the original datasets of competitions and benchmarks,
plus more information on the world records set by our team:
16. 22 Sept 2013: our deep and wide MCMPCNN [8,17]
won the MICCAI 2013 Grand Challenge on Mitosis Detection (important for cancer prognosis etc).
This was made possible through the efforts of Dan and Alessandro [20].
Don't confuse this with the earlier ICPR 2012 Contest below!
Comment: When we started our work on deep learning over two decades ago, limited computing power forced us to focus on tiny toy applications to illustrate the benefits of our methods. How things have changed!
It is gratifying to observe that today
our techniques may actually help to improve healthcare and save lives.
15. As of 1 Sep 2013, our Deep Learning Neural Networks are the best artificial offline recognisers of Chinese characters from the
ICDAR 2013 competition
(3755 classes), approaching human performance [23].
This is relevant for smartphone producers who want to build phones that can translate photos of foreign texts and signs.
As always in such competitions, GPU-based pure supervised gradient descent (40-year-old backprop) was applied to deep and wide multi-column networks with interleaving max-pooling layers and convolutional layers (multi-column MPCNN) [8,17]. Many leading IT companies and research labs are now using this technique, too.
14.
ICPR 2012 Contest on Mitosis Detection in Breast Cancer Histological Images (MITOS Aperio images).
There were 129 registered companies / institutes / universities from 40 countries, and 14 results.
Our team (with Alessandro & Dan) clearly
won the contest (over 20% fewer errors than the second best team). See ref [20], as well as the later MICCAI 2013 Grand Challenge above.
13. ISBI 2012
Segmentation of neuronal structures in EM stacks challenge.
See the TrakEM2 data sets of INI.
Our team won the contest on all three evaluation metrics
by a large margin,
with superhuman performance in terms of pixel error (March 2012) [15].
(Ranks 2-6 for researchers at ETHZ, MIT, CMU, Harvard.)
This is relevant for the recent huge brain projects in Europe and the US, which try to build 3D models of real brains.
12. IJCNN 2011 on-site
Traffic Sign Recognition Competition (1st rank, 2 August 2011, 0.56% error rate, the only method better than humans, who achieved 1.16% on average; 3rd place for 1.69%) [10,18]. The first method ever to achieve
superhuman visual pattern recognition on an important benchmark (with deadline and test set known only to the organisers).
This is obviously relevant for self-driving cars.
11. INI @ Univ. Bochum's online
German Traffic Sign Recognition Benchmark, won through late night efforts of Dan & Ueli & Jonathan (1st & 2nd rank; 1.02% error rate, January 2011) [10].
10. NORB object recognition dataset for stereo images, NY University, 2004.
Our team set the new record on the standard set (2.53% error rate) in January 2011 [8],
and achieved 2.7% on the full set [17] (best previous result by others: 5%).
9. The
CIFAR-10 dataset of Univ. Toronto, 2009.
Our team set the
new record (19.51% error rate) on these rather challenging data in January 2011 [8],
and improved this to 11.2% [17].
8. The MNIST dataset of NY University, 1998. Our team set the new record (0.35% error rate) in 2010 [6], tied it again
in January 2011 [8], broke it again in March 2011 (0.31%) [9], and again (0.27%, ICDAR 2011) [12],
and finally achieved the first human-competitive result: 0.23% [17] (mean of many runs; many individual runs
yield better results, of course, down to 0.17% [12]).
7. The Chinese Handwriting Recognition Competition at ICDAR 2011 (offline). Our team won 1st and 2nd rank (CR(1): 92.18% correct; CR(10): 99.29% correct) in June 2011.
Three Connected Handwriting Recognition Competitions at ICDAR 2009 were won by
our multi-dimensional LSTM recurrent neural networks [3,3a,4] through
the efforts of Alex. This was the first RNN system ever to win an official international pattern recognition competition. To our knowledge, this also was the first Deep Learning system ever (recurrent or not) to win such a contest:
6. ICDAR 2009
Arabic Connected Handwriting Competition of Univ. Braunschweig
5. ICDAR 2009 Handwritten Farsi/Arabic Character Recognition Competition
4. ICDAR 2009
French Connected Handwriting Competition (PDF) based on data from the RIMES campaign
Note that 4-8 are treated in more detail in the page on handwriting recognition.
3. The
Weizmann Human Action Dataset
of Weizmann Institute of Science, and the KTH Human Action Dataset of KTH Royal
Insitute of Technology. New records set in 2010 [7], thanks to Somayeh's efforts.
2.
The Outex Texture Database, Univ. Oulu, 2002 [5].
1.
The ZuBuD database of
pictures of buildings in Zürich, ETHZ, 2003 [2].
Here a
12 min Google Tech Talk video on fast deep / recurrent nets (only slides and voice)
at AGI 2011, summarizing results as of August 2011:
People keep asking: What is the secret of your successes?
There are two secrets:
(i) For competitions involving sequential data such as video and speech we use deep
(stacks of)
multi-dimensional [3]
Long Short-Term Memory (LSTM) recurrent networks
(1997) trained by Connectionist Temporal Classification (CTC, 2006) [3a].
This is what since 2009 has set records in recognising
connected handwriting
and speech.
(ii) For other competitions we use multi-column committees [10] of GPU-based max-pooling CNN (2011) [8], where we apply (in the style of LeCun et al 1989, Ranzato et al 2007) efficient backpropagation (Linnainmaa 1970, Werbos 1981) to deep Neocognitron-like weight-sharing convolutional architectures (Fukushima 1979) with max-pooling layers (Weng 1992, Riesenhuber & Poggio 1999). Over two decades, LeCun's lab has invented many improvements of such CNN. Our GPU-MPCNN achieved the
first superhuman image recognition results
(2011) [18], and were the first Deep Learners to win contests in
object detection (2012)
and image segmentation (2012),
which require fast, non-redundant MPCNN image scans [21,22].
Our algorithms not only were the first deep learning methods to win official international competitions (since 2009) and to become human-competitive, they also have numerous immediate industrial and medical applications. Apple & Google and many others adopted our techniques.
Are you an industrial company that wants to solve
interesting pattern recognition problems? Don't hesitate to contact
JS.
We already developed:
1. State-of-the-art handwriting recognition for a software services company.
2. State-of-the-art steel defect detection for the world's largest steel maker.
3. State-of-the-art low-cost pattern recognition for a leading automotive supplier.
4. Low-power variants of our methods for apps running on cell phone chips.
.
|