Our impact on the world's most valuable public companies (Google, Apple, Microsoft, Amazon, etc)

Our impact on the world's most valuable public companies: 1. Apple, 2. Alphabet (Google), 3. Microsoft, 4. Amazon ...

Jürgen Schmidhuber (pronounce: you_again shmidhoobuh)
The Swiss AI Lab, IDSIA (USI & SUPSI), April 2017

Our impact on the world's most valuable public companies as of March 2017: Apple (#1), Alphabet (#2), Microsoft (#3), Amazon (#4) Our deep learning methods developed since 1991 have transformed machine learning and Artificial Intelligence (AI), and are now available to billions of users through the four most valuable public companies in the world: Apple (#1 as of 31 March 2017 with a market capitalization of USD 753bn), Google (Alphabet, #2, 573bn), Microsoft (#3, 508bn), and Amazon (#4, 423bn) [1].

LSTM recurrent neural networks Many of the most widely used AI applications of these companies are now based on our Long Short-Term Memory (LSTM) recurrent neural networks (RNNs), which learn from experience to solve all kinds of previously unsolvable problems. The LSTM principle has become a foundation of what's now called deep learning (see survey), especially for sequential data (but also for very deep feedforward networks [11,12]). LSTM-based systems can learn to translate languages, control robots, analyse images, summarise documents, recognise speech and videos and handwriting, run chat bots, predict diseases and click rates and stock markets, compose music, and much more, e.g., [22]. Most of our main peer-reviewed publications on LSTM appeared between 1997 and 2009, the year when LSTM became the first RNN to win international pattern recognition competitions, e.g., [8, 9, 9a-c, 10, 10a].

Apple explained at its WWDC 2016 developer conference how our LSTM is improving its iPhone [2b], for example, the Quicktype function. Apple's Siri also uses LSTM in various ways [2b+].

Google's speech recognition [2] for over 1.5 billion Android phones and many other devices is also based on LSTM (1997) [8] with forget gates (2000) [9] trained by our "Connectionist Temporal Classification (CTC)" (2006) [10]. In 2015, this approach dramatically improved Google's recognition rate not only by 5% or 10% (which already would have been great) but by almost 50% [2a].

Google is using our rather universal LSTM also for image caption generation [2g], automatic email answering [2h], its new smart assistant Allo [2i], and its dramatically improved Google Translate [10b, 2i]. In fact, a substantial fraction of the awesome computational power in Google's datacenters is now used for LSTM [10c]. Will Google end up as one huge LSTM?

Microsoft uses LSTM not only for its own greatly improved speech recognition [2c] but also for photo-real talking heads [2k] and for learning to write code [2m], amongst other things.

Amazon's famous Echo or Alexa also speaks to you in your home [2e] through our bidirectional [9b] LSTM.

The Chinese search giant Baidu is also building [2d] on our methods such as CTC [10].

IBM used LSTM to analyze emotions [2j], amongst other things.

Numerous other famous companies are using LSTM for all kinds of applications such as predictive maintenance, stock market prediction, click rate prediction, automatic document analysis, etc.

History of computer vision contests won by deep CNNs on GPUs Another influential contribution of our lab at IDSIA (since 2010) was to greatly speed up [18, 18b-d, 19, 20a-e] deep supervised feedforward neural networks (NNs) on NVIDIA's fast graphics processors (GPUs), in particular, convolutional NNs or CNNs [4]. This convinced the Machine Learning community that traditional unsupervised pre-training of NNs (1991-2009) [e.g., 7a-c] is not required. In 2011, our fast GPU-based CNNs [18b] achieved the first superhuman pattern recognition result in the history of computer vision [18c-d,19], and then kept winning contests with larger and larger images [20a-d+]. In particular, in 2012, we had the first deep NNs to win medical imaging contests [20a-d] (important for healthcare which represents 10% of the world's GDP). Our fast CNN image scanners were over 1000 times faster than previous methods [20e]. Today, many startups as well as established companies such as Facebook & IBM & Google are using such deep GPU-CNNs for numerous applications [22]. Arcelor Mittal, the world's largest steel maker, worked with us to greatly improve steel defect detection [3]. In May 2015, we also had the first working very deep NNs with hundreds of layers [11]; a special case thereof was used by Microsoft [12] to improve image recognition. NVIDIA rebranded itself as a deep learning company. BTW, thanks to NVIDIA for our 2016 NN Pioneers of AI Award, and for generously funding our research!

Deep Learning Wins Three Connected Handwriting Recognition Competitions at ICDAR 2009 Even earlier, in 2009, our CTC-trained LSTM [10,10a] became the first recurrent neural network to win competitions. Our lead author Alex Graves [10a] later joined DeepMind, a startup company heavily influenced by other former students of my lab: DeepMind's first PhDs in Artificial Intelligence and Machine Learning were PhD students at IDSIA, one of them DeepMind's co-founder (Shane Legg), one of them the first employee (Daan Wierstra). (The other two co-founders were not from my lab and had different backgrounds in biological neuroscience and business.) DeepMind was later bought by Google for about $600M; Alex became first author of DeepMind's recent Nature paper [10d]. BTW, thanks to Google DeepMind for generously funding our research!

My First Deep Learning System of 1991 + Deep Learning Timeline 1962-2013 Although our work has influenced many companies large and small, most of our pioneers of basic learning algorithms and methods for Artificial General Intelligence (AGI) are still based in Switzerland or affiliated with our company NNAISENSE. Its name is pronounced like "nascence," because it's about the birth of a general purpose Neural Network-based Artificial Intelligence (NNAI). It has 5 co-founders (CEO Faustino Gomez, Jan Koutnik, Jonathan Masci, Bas Steunebrink, and myself), brilliant advisors (Sepp Hochreiter, Marcus Hutter, Jaan Tallinn), outstanding employees, and revenues through ongoing state-of-the-art applications in industry and finance. We believe that the successes above are just the beginning, and that we can go far beyond what's possible today, through novel variants of learning to learn and recursive self-improvement (since 1987) and artificial curiosity and creativity and optimal program search and large reinforcement learning RNNs, to pull off the big practical breakthrough that will change everything, in line with my old motto since the 1970s: "build an AI smarter than myself such that I can retire" (e.g., H+ magazine, Jan 2010).

Deep Learning since 1991 - Winning Contests in Pattern Recognition and Sequence Learning Through Fast & Deep / Recurrent Neural Networks Related articles: long interview at NPA & ACM (Oct 2016, short version in IT World), WIRED (Nov 2016), Bloomberg (Jan 2017), Guardian (April 2017, front page), NY Times (Nov 2016, front page), Financial Times (Nov 2016, also here), Inverse (Dec 2016), Intl. Business Times (Feb 2016), BeMyApp (Mar 2016), Informilo (Jan 2016), InfoQ (Mar 2016). Also in leading German language newspapers: ZEIT (May 2016, ZEIT online in June), Spiegel (Europe's top news magazine, Feb 2016), NZZ 1 & 2 (August 2016), Tagesanzeiger (Sep 2016), Beobachter (Sep 2016), CHIP (April 2016), Computerwoche (July 2016), WiWo (Jan 2016), Spiegel (Jan 2016), Focus (Mar 2016), Welt (Mar 2016), SZ (Mar 2016), FAZ (Dec 2015, title page), NZZ (Nov 2015). More in Netzoekonom (Mar 2016), Performer (Oct 2016), WiWo (Feb 2016), Focus (Jan 2016), Bunte (Jan 2016). Earlier: Handelsblatt (Jun 2015), INNS Big Data (Feb 2015), KurzweilAI (Nov 2012), Fifth Conference (June 2010) ... Disclaimer: I am not responsible for everything that's written in these articles!


[1] We ignore non-public companies such as Saudi Aramco whose value was estimated at several trillions of USD.

[2] Google's speech recognition for Android phones etc. based on our LSTM & CTC: Google Research Blog, Sep 2015 and Aug 2015

Our impact on the world's most valuable public companies of March 2017: Apple (#1), Alphabet (#2), Microsoft (#3), Amazon (#4) [2a] Dramatic improvement of Google's speech recognition through LSTM: Alphr Technology, Jul 2015, or 9to5google, Jul 2015

[2b] Apple's iPhone uses our LSTM, e.g., TechCrunch, Jul 2016, or noJitter, Jun 2016

[2b+] Apple's Siri uses LSTM for various tasks, e.g., BGR.com, Jun 2016

[2c] Microsoft's speech recognition also uses LSTM, e.g., TheRegister, Oct 2016 or Business Insider, Oct 2016

[2d] Baidu's speech recognition also uses our CTC [10], e.g., VentureBeat, Jan 2016

[2e] Amazon uses our LSTM for Alexa & Echo, e.g., Vogels' Blog, Nov 2016

[2g] Google's image caption generation with LSTM: arXiv PDF, Nov 2014

[2h] Google's automatic email answering with LSTM: WIRED, Mar 2015

[2h] Google's smart assistant Allo with LSTM: Google Research Blog, May 2016

[2i] Google's dramatically improved Google Translate [10b] based on LSTM, e.g., arXiv report, Sep 2016, or HotHardWare, Sep 2016, or WIRED, Sep 2016, or siliconAngle, Sep 2016

[2j] IBM uses LSTM to analyze emotions (2014)

[2k] Microsoft uses LSTM for photo-real talking heads (2014)

[2m] Microsoft uses LSTM for learning to write programs (2017)

[3] Arcelor Mittal: our GPU-based CNNs for much better steel defect detection; see Masci et al., IJCNN 2012

[4] Fukushima's CNN architecture [13] (1979) (with Max-Pooling [14], 1993) is trained [6] in the shift-invariant 1D case [15a] or 2D case [15, 16, 17] by Linnainmaa's automatic differentiation or backpropagation algorithm of 1970 [5] (extending earlier work in control theory [5a-c]).

who invented backpropagation? [5] Linnainmaa, S. (1970). The representation of the cumulative rounding error of an algorithm as a Taylor expansion of the local rounding errors. Master's thesis, Univ. Helsinki. (See also BIT Numerical Mathematics, 16(2):146-160, 1976.)

[5a] Kelley, H. J. (1960). Gradient theory of optimal flight paths. ARS Journal, 30(10):947-954.

[5b] Bryson, A. E. (1961). A gradient method for optimizing multi-stage allocation processes. In Proc. Harvard Univ. Symposium on digital computers and their applications.

[5c] Dreyfus, S. E. (1962). The numerical solution of variational problems. Journal of Mathematical Analysis and Applications, 5(1):30-45.

[6] Werbos, P. J. (1982). Applications of advances in nonlinear sensitivity analysis. In Proceedings of the 10th IFIP Conference, 31.8 - 4.9, NYC, pp. 762-770. (Extending thoughts in his 1974 thesis.)

[7a] Schmidhuber, J. (1992). Learning complex, extended sequences using the principle of history compression. Neural Computation, 4(2):234-242. Based on TR FKI-148-91, TUM, 1991. More.

[7b] G. E. Hinton, R. R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, Vol. 313. no. 5786, pp. 504 - 507, 2006.

[7c] Raina, R., Madhavan, A., and Ng, A. (2009). Large-scale deep unsupervised learning using graphics processors. In Proceedings of the 26th Annual International Conference on Machine Learning (ICML), pages 873-880. ACM.

Recurrent Neural Networks, especially LSTM [8] Hochreiter, S. and Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation, 9(8):1735-1780. Based on TR FKI-207-95, TUM (1995). More.

[9] Gers, F. A., Schmidhuber, J., and Cummins, F. (2000). Learning to forget: Continual prediction with LSTM. Neural Computation, 12(10):2451-2471.

[9a] S. Fernandez, A. Graves, J. Schmidhuber. Sequence labelling in structured domains with hierarchical recurrent neural networks. In Proc. IJCAI 07, p. 774-779, Hyderabad, India, 2007

[9b] A. Graves and J. Schmidhuber. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Networks, 18:5-6, pp. 602-610, 2005.

[9c] J. Bayer, D. Wierstra, J. Togelius, J. Schmidhuber. Evolving memory cell structures for sequence learning. Proc. ICANN-09, Cyprus, 2009.

Brainstorm Open Source Neural Network Library [10] Graves, A., Fernandez, S., Gomez, F. J., and Schmidhuber, J. (2006). Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural nets. Proc. ICML'06, pp. 369-376.

[10a] A. Graves, M. Liwicki, S. Fernandez, R. Bertolami, H. Bunke, J. Schmidhuber. A Novel Connectionist System for Improved Unconstrained Handwriting Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 5, 2009.

[10b] Wu et al (2016). Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation. Preprint arXiv:1609.08144

[10c] Jouppi et al (2017). In-Datacenter Performance Analysis of a Tensor Processing Unit. Preprint arXiv:1704.04760 [cs.AR]

[10d] A. Graves et al. Hybrid computing using a neural network with dynamic external memory. Nature 538.7626 (2016): 471-476.

[11] Srivastava, R. K., Greff, K., Schmidhuber, J. Highway networks. Preprints arXiv:1505.00387 (May 2015) and arXiv:1507.06228 (Jul 2015). Also at NIPS'2015. The first working very deep feedforward nets with over 100 layers. Let g, t, h, denote non-linear differentiable functions. Each non-input layer of a highway net computes g(x)x + t(x)h(x), where x is the data from the previous layer. (Like LSTM [8] with forget gates [9] for RNNs.) Resnets [12] are a special case of this where g(x)=t(x)=const=1.

Microsoft dominated the ImageNet 2015 contest through a deep feedforward LSTM without gates [12] He, K., Zhang, X., Ren, S., Sun, J. Deep residual learning for image recognition. Preprint arXiv:1512.03385 (Dec 2015). Residual nets [12] are a special case of highway nets [11], with g(x)=1 (a typical highway net initialisation) and t(x)=1.

[13] K. Fukushima. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 36(4): 193-202, 1980. Scholarpedia.

[14] Weng, J., Ahuja, N., and Huang, T. S. (1993). Learning recognition and segmentation of 3-D objects from 2-D images. Proc. 4th Intl. Conf. Computer Vision, Berlin, Germany, pp. 121-128.

[15a] A. Waibel, T. Hanazawa, G. Hinton, K. Shikano, K. J. Lang. Phoneme Recognition using Time-Delay Neural Networks. ATR Tech report, 1987. (Also in IEEE TNN, 1989.)

[15] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, L. D. Jackel: Backpropagation Applied to Handwritten Zip Code Recognition, Neural Computation, 1(4):541-551, 1989.

[16] M. A. Ranzato, Y. LeCun: A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images. Proc. ICDAR, 2007

[17] D. Scherer, A. Mueller, S. Behnke. Evaluation of pooling operations in convolutional architectures for object recognition. In Proc. ICANN 2010.

Deep Learning Neural Networks are the best artificial offline recognisers of Chinese characters from the ICDAR 2013 competition (3755 classes), approaching human performance [18] Ciresan, D. C., Meier, U., Gambardella, L. M., and Schmidhuber, J. (2010). Deep big simple neural nets for handwritten digit recognition. Neural Computation, 22(12):3207-3220.

[18b] D. C. Ciresan, U. Meier, J. Masci, L. M. Gambardella, J. Schmidhuber. Flexible, High Performance Convolutional Neural Networks for Image Classification. International Joint Conference on Artificial Intelligence (IJCAI-2011, Barcelona), 2011. [Speeding up deep CNNs on GPU by a factor of 60. Basis of computer vision contest winners since 2011.]

IJCNN 2011 on-site Traffic Sign Recognition Competition (1st rank, 2 August 2011, 0.56% error rate, the only method better than humans, who achieved 1.16% on average; 3rd place for 1.69%) (Juergen Schmidhuber) [18c] D. C. Ciresan, U. Meier, J. Masci, J. Schmidhuber. A Committee of Neural Networks for Traffic Sign Classification. International Joint Conference on Neural Networks (IJCNN-2011, San Francisco), 2011.

[18d] Results of 2011 IJCNN traffic sign recognition contest

[18e] Results of 2011 ICDAR Chinese handwriting recognition competition: WWW site, PDF.

[19] Ciresan, D. C., Meier, U., and Schmidhuber, J. (2012c). Multi-column deep neural networks for image classification. Proc. CVPR, June 2012. Long preprint arXiv:1202.2745v1 [cs.CV], Feb 2012.

Deep Learning Wins MICCAI 2013 Grand Challenge on Mitosis Detection [20a] Results of 2012 ICPR cancer detection contest

[20b] Results of 2013 MICCAI Grand Challenge (cancer detection)

[20c] D. C. Ciresan, A. Giusti, L. M. Gambardella, J. Schmidhuber. Mitosis Detection in Breast Cancer Histology Images using Deep Neural Networks. MICCAI 2013.

Deep Learning Wins 2012 Brain Image Segmentation Contest [20d] D. Ciresan, A. Giusti, L. Gambardella, J. Schmidhuber. Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images. NIPS 2012, Lake Tahoe, 2012.

[20d+] I. Arganda-Carreras, S. C. Turaga, D. R. Berger, D. Ciresan, A. Giusti, L. M. Gambardella, J. Schmidhuber, D. Laptev, S. Dwivedi, J. M. Buhmann, T. Liu, M. Seyedhosseini, T. Tasdizen, L. Kamentsky, R. Burget, V. Uher, X. Tan, C. Sun, T. Pham, E. Bas, M. G. Uzunbas, A. Cardona, J. Schindelin, H. S. Seung. Crowdsourcing the creation of image segmentation algorithms for connectomics. Front. Neuroanatomy, November 2015.

Deep Learning in Neural Networks: An Overview [20e] J. Masci, A. Giusti, D. Ciresan, G. Fricout, J. Schmidhuber. A Fast Learning Algorithm for Image Segmentation with Max-Pooling Convolutional Networks. ICIP 2013. Preprint arXiv:1302.1690.

[22] Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks, 61, 85-117. More. Short version at Scholarpedia.

Fibonacci web design

Our impact on the world's most valuable public companies (Google, Apple, Microsoft, Amazon etc)