## Our impact on the world's most valuable public companies: 1. Apple, 2. Alphabet (Google), 3. Microsoft, 4. Amazon ...
Jürgen Schmidhuber (pronounce: you_again shmidhoobuh)
Our deep learning methods
developed since 1991 have transformed machine learning and Artificial Intelligence (AI), and are now available to billions of users through Many of the most widely used AI applications of these companies are now based on our Long Short-Term Memory (LSTM) recurrent neural networks (RNNs), which learn from experience to solve all kinds of previously unsolvable problems. The LSTM principle has become a foundation of what's now called deep learning (see survey), especially for sequential data (but also for very deep feedforward networks [11,12]). LSTM-based systems can learn to translate languages, control robots, analyse images, summarise documents, recognise speech and videos and handwriting, run chat bots, predict diseases and click rates and stock markets, compose music, and much more, e.g., [22]. Most of our main peer-reviewed publications on LSTM appeared between 1997 and 2009, the year when LSTM became the first RNN to win international pattern recognition competitions, e.g., [8, 9, 9a-c, 10, 10a].
The Chinese search giant
Numerous other famous companies are using LSTM for all kinds of applications such as predictive maintenance, stock market prediction, click rate prediction, automatic document analysis, etc.
Another influential
contribution of our lab at
Even earlier, in 2009, our
CTC-trained LSTM [10,10a]
became the first
Although our work
has influenced
many companies large and small, most of our pioneers of basic learning algorithms and methods for Artificial General Intelligence (AGI) are still based in Switzerland or affiliated with our company
## References
[1] List of public corporations by market capitalization (Wikipedia, March 31, 2017). We ignore non-public companies such as Saudi Aramco whose value was estimated (2016) at several trillions of USD. [2] Google's speech recognition for Android phones etc. based on our LSTM & CTC: Google Research Blog, Sep 2015 and Aug 2015 [2a] Dramatic improvement of Google's speech recognition through LSTM: Alphr Technology, Jul 2015, or 9to5google, Jul 2015 [2b] Apple's iPhone uses our LSTM, e.g., TechCrunch, Jul 2016, or noJitter, Jun 2016 [2b+] Apple's Siri uses LSTM for various tasks, e.g., BGR.com, Jun 2016 [2c] Microsoft's speech recognition also uses LSTM, e.g., TheRegister, Oct 2016 or Business Insider, Oct 2016 [2d] Baidu's speech recognition also uses our CTC [10], e.g., VentureBeat, Jan 2016 [2e] Amazon uses our LSTM for Alexa & Echo, e.g., Vogels' Blog, Nov 2016 [2g] Google's image caption generation with LSTM: arXiv PDF, Nov 2014 [2h] Google's automatic email answering with LSTM: WIRED, Mar 2015 [2h] Google's smart assistant Allo with LSTM: Google Research Blog, May 2016 [2i] Google's dramatically improved Google Translate [10b] based on LSTM, e.g., arXiv report, Sep 2016, or HotHardWare, Sep 2016, or WIRED, Sep 2016, or siliconAngle, Sep 2016 [2j] IBM uses LSTM to analyze emotions (2014) [2k] Microsoft uses LSTM for photo-real talking heads (2014) [2m] Microsoft uses LSTM for learning to write programs (2017) [3] Arcelor Mittal: our GPU-based CNNs for much better steel defect detection; see Masci et al., IJCNN 2012 [4] Fukushima's CNN architecture [13] (1979) (with Max-Pooling [14], 1993) is trained [6] in the shift-invariant 1D case [15a] or 2D case [15, 16, 17] by Linnainmaa's automatic differentiation or backpropagation algorithm of 1970 [5] (extending earlier work in control theory [5a-c]). [5] Linnainmaa, S. (1970). The representation of the cumulative rounding error of an algorithm as a Taylor expansion of the local rounding errors. Master's thesis, Univ. Helsinki. (See also BIT Numerical Mathematics, 16(2):146-160, 1976.) [5a] Kelley, H. J. (1960). Gradient theory of optimal flight paths. ARS Journal, 30(10):947-954. [5b] Bryson, A. E. (1961). A gradient method for optimizing multi-stage allocation processes. In Proc. Harvard Univ. Symposium on digital computers and their applications. [5c] Dreyfus, S. E. (1962). The numerical solution of variational problems. Journal of Mathematical Analysis and Applications, 5(1):30-45. [6] Werbos, P. J. (1982). Applications of advances in nonlinear sensitivity analysis. In Proceedings of the 10th IFIP Conference, 31.8 - 4.9, NYC, pp. 762-770. (Extending thoughts in his 1974 thesis.) [7a] Schmidhuber, J. (1992). Learning complex, extended sequences using the principle of history compression. Neural Computation, 4(2):234-242. Based on TR FKI-148-91, TUM, 1991. More. [7b] G. E. Hinton, R. R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, Vol. 313. no. 5786, pp. 504 - 507, 2006. [7c] Raina, R., Madhavan, A., and Ng, A. (2009). Large-scale deep unsupervised learning using graphics processors. In Proceedings of the 26th Annual International Conference on Machine Learning (ICML), pages 873-880. ACM. [8] Hochreiter, S. and Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation, 9(8):1735-1780. Based on TR FKI-207-95, TUM (1995). More. [9] Gers, F. A., Schmidhuber, J., and Cummins, F. (2000). Learning to forget: Continual prediction with LSTM. Neural Computation, 12(10):2451-2471. [9a] S. Fernandez, A. Graves, J. Schmidhuber. Sequence labelling in structured domains with hierarchical recurrent neural networks. In Proc. IJCAI 07, p. 774-779, Hyderabad, India, 2007 [9b] A. Graves and J. Schmidhuber. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Networks, 18:5-6, pp. 602-610, 2005. [9c] J. Bayer, D. Wierstra, J. Togelius, J. Schmidhuber. Evolving memory cell structures for sequence learning. Proc. ICANN-09, Cyprus, 2009. [10] Graves, A., Fernandez, S., Gomez, F. J., and Schmidhuber, J. (2006). Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural nets. Proc. ICML'06, pp. 369-376. [10a] A. Graves, M. Liwicki, S. Fernandez, R. Bertolami, H. Bunke, J. Schmidhuber. A Novel Connectionist System for Improved Unconstrained Handwriting Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 5, 2009. [10b] Y. Wu et al (2016). Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. Preprint arXiv:1609.08144 [10b+] D. Britz et al (2017). Massive Exploration of Neural Machine Translation Architectures. Preprint arXiv:1703.03906 [10c] Jouppi et al (2017). In-Datacenter Performance Analysis of a Tensor Processing Unit. Preprint arXiv:1704.04760 [10d] A. Graves et al. Hybrid computing using a neural network with dynamic external memory. Nature 538.7626 (2016): 471-476.
[11] Srivastava, R. K., Greff, K., Schmidhuber, J. Highway networks.
arXiv:1505.00387
(May 2015) and
arXiv:1507.06228
(Jul 2015). Also at NIPS'2015.
[12] He, K., Zhang,
X., Ren, S., Sun, J. Deep residual learning for image recognition. Preprint
arXiv:1512.03385
(Dec 2015). [13] K. Fukushima. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 36(4): 193-202, 1980. Scholarpedia. [14] Weng, J., Ahuja, N., and Huang, T. S. (1993). Learning recognition and segmentation of 3-D objects from 2-D images. Proc. 4th Intl. Conf. Computer Vision, Berlin, Germany, pp. 121-128. [15a] A. Waibel, T. Hanazawa, G. Hinton, K. Shikano, K. J. Lang. Phoneme Recognition using Time-Delay Neural Networks. ATR Tech report, 1987. (Also in IEEE TNN, 1989.) [15] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, L. D. Jackel: Backpropagation Applied to Handwritten Zip Code Recognition, Neural Computation, 1(4):541-551, 1989. [16] M. A. Ranzato, Y. LeCun: A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images. Proc. ICDAR, 2007 [17] D. Scherer, A. Mueller, S. Behnke. Evaluation of pooling operations in convolutional architectures for object recognition. In Proc. ICANN 2010. [18] Ciresan, D. C., Meier, U., Gambardella, L. M., and Schmidhuber, J. (2010). Deep big simple neural nets for handwritten digit recognition. Neural Computation, 22(12):3207-3220.
[18b] D. C. Ciresan, U. Meier, J. Masci, L. M. Gambardella, J. Schmidhuber. Flexible, High Performance Convolutional Neural Networks for Image Classification. International Joint Conference on Artificial Intelligence (IJCAI-2011, Barcelona), 2011.
[18c] D. C. Ciresan, U. Meier, J. Masci, J. Schmidhuber. A Committee of Neural Networks for Traffic Sign Classification. International Joint Conference on Neural Networks (IJCNN-2011, San Francisco), 2011. [18d] Results of 2011 IJCNN traffic sign recognition contest [18e] Results of 2011 ICDAR Chinese handwriting recognition competition: WWW site, PDF. [19] Ciresan, D. C., Meier, U., and Schmidhuber, J. (2012c). Multi-column deep neural networks for image classification. Proc. CVPR, June 2012. Long preprint arXiv:1202.2745 [cs.CV], Feb 2012. [20a] Results of 2012 ICPR cancer detection contest [20b] Results of 2013 MICCAI Grand Challenge (cancer detection) [20c] D. C. Ciresan, A. Giusti, L. M. Gambardella, J. Schmidhuber. Mitosis Detection in Breast Cancer Histology Images using Deep Neural Networks. MICCAI 2013. [20d] D. Ciresan, A. Giusti, L. Gambardella, J. Schmidhuber. Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images. NIPS 2012, Lake Tahoe, 2012. [20d+] I. Arganda-Carreras, S. C. Turaga, D. R. Berger, D. Ciresan, A. Giusti, L. M. Gambardella, J. Schmidhuber, D. Laptev, S. Dwivedi, J. M. Buhmann, T. Liu, M. Seyedhosseini, T. Tasdizen, L. Kamentsky, R. Burget, V. Uher, X. Tan, C. Sun, T. Pham, E. Bas, M. G. Uzunbas, A. Cardona, J. Schindelin, H. S. Seung. Crowdsourcing the creation of image segmentation algorithms for connectomics. Front. Neuroanatomy, November 2015. [20e] J. Masci, A. Giusti, D. Ciresan, G. Fricout, J. Schmidhuber. A Fast Learning Algorithm for Image Segmentation with Max-Pooling Convolutional Networks. ICIP 2013. Preprint arXiv:1302.1690 [22] Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks, 61, 85-117. More. Short version at Scholarpedia. |