geICA, Nonlinear Independent Component Analysis, Unsupervised Learning, Redundancy Reduction
.
Jürgen Schmidhuber's page on
Computer Vision with Fast Deep Neural Nets Etc Yield Best Results on Many Visual Pattern Recognition Benchmarks

Nonlinear Independent Component Analysis (ICA), Unsupervised Learning, Redundancy Reduction

.
Weight patterns of receptive fields from ref 13.

Feature detectors found by our old unsupervised methods such as Predictability Minimization (1992; more: 1996) and Low-Complexity Coding and Decoding (LOCOCODE, 1999) resemble those found by our supervised, fast and deep neural computer vision systems based on deep and wide neural networks.

Real world data consists of redundant components. But most pattern recognition algorithms work much better on nonredundant data with statistically independent components. So we are interested in methods that re-encode redundant data by stripping it of the redundancies.

Schmidhuber's Predictability Minimization (PM) [1,3,8,10,13,14,19] apparently was the first non-linear neural algorithm for generating factorial codes with statistically independent components (ICA stands for "independent component analysis"). The input data may consist of non-linear mixtures of basic features. PM is a co-evolutionary, unsupervised learning algorithm based on neural feature detectors and predictors that fight each other in a minimax game (1991 - ). The predictors try to predict detector outputs from outputs of other detectors, while the detectors try to become unpredictable, maximising the same function that the predictors minimise.

PM has various potential advantages over other neural methods for redundancy reduction. When applied to image data, PM automatically comes up with feature detectors reminiscent of those in biological systems (such as orientation sensitive edge detectors, on-center-off-surround detectors, bar detectors).

Predictability Maximization (1993) [7] is just the opposite of PM. It extracts invariances from parallel data streams. In the 2000s, this also became known as "siamese networks".

An alternative method called LOCOCODE (1995 - ) performs ICA as a by-product of discovering simple networks (with low information- theoretic complexity) coding the input data [11,12,15-18]. It can outperform previous methods for ICA and PCA, and establishes a link between regularization and unsupervised learning.

Adaptive methods for sequence compression and sequence coding (scroll down for refs 1a-1j) are even more general approaches to unsupervised learning. In particular, neural history compressors or hierarchical temporal memories (1991) [1d-1f] compactly encode sequential data for Deep Learning.

The Neural Heat Exchanger [13] (presented in talks by Schmidhuber since 1990) is a supervised variant of Hinton and Dayan's 1994 unsupervised "Helmholtz machine".

My first Deep Learner of 1991 + Deep Learning timeline 1962-2013
Unsupervised Neural Networks Fight in a Minimax Game
The formal theory of creativity by Juergen Schmidhuber explains the desire to learn motor skills, to do science, to produce art
Theory of Surprise and Artificial Curiosity
2011: First Superhuman Visual Pattern Recognition
Low- complexity Art
Feedback Network
Our impact on the world's most valuable public companies: Apple (#1), Alphabet (Google, #2), Microsoft (#3), Amazon (#5), ...


Predictors and feature detectors that fight each other: the feature detectors try to become unpredictable.

Related links:

Full publication list
(with additional HTML and pdf links)

Active Exploration

Formal Theory of Fun and Creativity

Theory of beauty and Low-Complexity Art

Recurrent neural networks

Deep learning wins many pattern recognition contests

JS' first Deep Learner of 1991 + Deep Learning Timeline 1962-2013 (also summarises the origins of backpropagation, still the central algorithm of Deep Learning)

1991: Fundamental Deep Learning Problem discovered and analysed and partially solved

Deep Learning since 1991

German home

18. S. Hochreiter and J. Schmidhuber. Source separation as a by-product of regularization. In M. S. Kearns, S. A. Solla, D. A. Cohn, eds., Advances in Neural Information Processing Systems 11, NIPS'11, p. 459-465, MIT Press, Cambridge MA, 1999. PDF . HTML.

17. S. Hochreiter and J. Schmidhuber. LOCOCODE performs nonlinear ICA without knowing the number of sources. In J.-F. Cardoso and C. Jutten and P. Loubaton, eds., Proceedings of the First International Workshop on Independent Component Analysis and Signal Separation (ICA'99), 149-154, Aussois, France, 1999.

16. S. Hochreiter and J. Schmidhuber. Feature extraction through LOCOCODE. PDF . HTML (some pictures missing). Neural Computation 11(3): 679-714, 1999 (28 pages, 20 figures, 703 K, 4.9 M gunzipped).

15. S. Hochreiter and J. Schmidhuber. LOCOCODE versus PCA and ICA. In L. Niklasson and M. Boden and T. Ziemke, eds., Proceedings of the International Conference on Artificial Neural Networks, Sweden, p. 669-674, Springer, London, 1998.

14. J.  Schmidhuber. Neural predictors for detecting and removing redundant information. In H. Cruse, J. Dean, and H. Ritter, editors, Adaptive Behavior and Learning. Kluwer, 1998. PDF . HTML.

13. N. N. Schraudolph, M. Eldracher, J. Schmidhuber. Processing Images by Semi-Linear Predictability Minimization. Network, 10(2): 133-169, 1999 (1766 K). PDF .

12. S. Hochreiter and J. Schmidhuber. Low-complexity coding and decoding. In K. M. Wong, I. King, D. Yeung, eds., Theoretical Aspects of Neural Computation: a Multidisciplinary Perspective, pages 297-306, Springer, 1997.

11. S. Hochreiter and J. Schmidhuber. Unsupervised coding with LOCOCODE. In W. Gerstner, A. Germond, M. Hasler, J.-D. Nicoud, eds., Proceedings of the International Conference on Artificial Neural Networks, Lausanne, Switzerland, Springer, 655-660, 1997.

10. J. Schmidhuber and M. Eldracher and B. Foltin. Semilinear predictability minimzation produces well-known feature detectors. Neural Computation, 8(4):773-786, 1996 (260 K). PDF . HTML.

9. J.  Schmidhuber. The Neural Heat Exchanger. In S. Amari, L. Xu, L. Chan, I. King, K. Leung, eds., Progress in Neural Information Processing: Proceedings of the Intl. Conference on Neural Information Processing, pages 194-197, Springer, Hongkong, 1996. Earlier presentations in talks at universities since 1990. PDF . HTML.

8. J. Schmidhuber and B. Foltin. Semilinear predictability minimization produces orientation sensitive edge detectors. Technical Report FKI-201-94, Fakultät für Informatik, Technische Universität München, December 1994.

7. J. Schmidhuber and D. Prelinger. Discovering predictable classifications. Neural Computation, 5(4):625-635, 1993 (51 K). PDF. HTML.

6. J.  Schmidhuber and D. Prelinger. Unsupervised extraction of predictable abstract features. In Proceedings of the International Conference on Artificial Neural Networks, Amsterdam, pages 601-604. Springer, 1993.

5. J.  Schmidhuber and D. Prelinger. A novel unsupervised classification method. In Proc. of the Intl. Conf. on Artificial Neural Networks, Brighton, pages 91-96. IEE, 1993.

4. J.  Schmidhuber, M. C. Mozer, and D. Prelinger. Continuous history compression. In H. Hüning, S. Neuhauser, M. Raus, and W. Ritschel, editors, Proc. of Intl. Workshop on Neural Networks, RWTH Aachen, pages 87-95. Augustinus, 1993.

3. J. Schmidhuber. Learning factorial codes by predictability minimization. Neural Computation, 4(6):863-879, 1992 (53 K). PDF. HTML.

2. J.  Schmidhuber and D. Prelinger. Discovering predictable classifications. Technical Report CU-CS-626-92, Dept. of Comp. Sci., University of Colorado at Boulder, November 1992.

1. J.  Schmidhuber. Learning factorial codes by predictability minimization. Technical Report CU-CS-565-91, Dept. of Comp. Sci., University of Colorado at Boulder, December 1991.


SEQUENCE COMPRESSION

Adaptive methods for sequence compression and sequence coding are important instances of redundancy reduction and unsupervised learning (compare section above and work on recurrent networks).

1j. M. Klapper-Rybicka, N. N. Schraudolph, J. Schmidhuber. Unsupervised Learning in LSTM Recurrent Neural Networks. In G. Dorffner, H. Bischof, K. Hornik, eds., Proceedings of Int. Conf. on Artificial Neural Networks ICANN'01, Vienna, LNCS 2130, pages 684-691, Springer, 2001. PDF.

1i. J.  Schmidhuber and S.  Heil. Compressing texts with neural nets. In Dale, Moisl and Somers, eds., Handbook of Natural Language Processing, Marcel Dekker, Inc., 1998.

1h. J. Schmidhuber and S. Heil. Sequential neural text compression. IEEE Transactions on Neural Networks, 7(1):142-146, 1996 (68 K). PDF. HTML.

1g. J.  Schmidhuber and S.  Heil. Predictive coding with neural nets: Application to text compression. In G. Tesauro, D. S. Touretzky and T. K. Leen, eds., Advances in Neural Information Processing Systems 7, pages 1047-1054. MIT Press, Cambridge MA, 1995.

1f. J.  Schmidhuber, M. C. Mozer, and D. Prelinger. Continuous history compression. In H. Hüning, S.  Neuhauser, M. Raus, and W. Ritschel, editors, Proc. of Intl. Workshop on Neural Networks, RWTH Aachen, pages 87-95. Augustinus, 1993.

1e. J. Schmidhuber. Learning complex, extended sequences using the principle of history compression. Neural Computation, 4(2):234-242, 1992 (41 K). PDF. HTML.

1d. J.  Schmidhuber. Learning unambiguous reduced sequence descriptions. In J. E. Moody, S. J. Hanson, and R. P. Lippman, editors, Advances in Neural Information Processing Systems 4, NIPS'4, pages 291-298. San Mateo, CA: Morgan Kaufmann, 1992.

1c. J.  Schmidhuber. Adaptive history compression for learning to divide and conquer. In Proc. International Joint Conference on Neural Networks, Singapore, volume 2, pages 1130-1135. IEEE, 1991.

1b. J.  Schmidhuber. Adaptive decomposition of time. In T. Kohonen, K. Mäkisara, O. Simula, and J. Kangas, editors, Artificial Neural Networks, pages 909-914. Elsevier Science Publishers B.V., North-Holland, 1991.

1a. J.  Schmidhuber. Neural sequence chunkers. Technical Report FKI-148-91, Institut für Informatik, Technische Universität München, April 1991.

.