ICA, Nonlinear Independent Component Analysis, Unsupervised Learning, Redundancy Reduction

geICA, Nonlinear Independent Component Analysis, Unsupervised Learning, Redundancy Reduction

Jürgen Schmidhuber's page on

Computer Vision with Fast Deep Neural Nets Etc Yield Best Results on Many Visual Pattern Recognition Benchmarks

Nonlinear Independent Component Analysis (ICA), Unsupervised Learning, Redundancy Reduction

Weight patterns of receptive fields from ref 13.

Feature detectors found by our old unsupervised methods such as Predictability Minimization (1992; more: 1996) and Low-Complexity Coding and Decoding (LOCOCODE, 1999) resemble those found by our supervised, fast and deep neural computer vision systems based on deep and wide neural networks.

Real world data consists of redundant components. But most pattern recognition algorithms work much better on nonredundant data with statistically independent components. So we are interested in methods that re-encode redundant data by stripping it of the redundancies.

Schmidhuber's Predictability Minimization (PM) [1,3,8,10,13,14,19] apparently was the first non-linear neural algorithm for generating factorial codes with statistically independent components (ICA stands for "independent component analysis"). The input data may consist of non-linear mixtures of basic features. PM is a co-evolutionary, unsupervised learning algorithm based on neural feature detectors and predictors that fight each other in a minimax game (1991 - ). The predictors try to predict detector outputs from outputs of other detectors, while the detectors try to become unpredictable, maximising the same function that the predictors minimise.

PM has various potential advantages over other neural methods for redundancy reduction. When applied to image data, PM automatically comes up with feature detectors reminiscent of those in biological systems (such as orientation sensitive edge detectors, on-center-off-surround detectors, bar detectors).

Predictability Maximization (1993) [7] is just the opposite of PM. It extracts invariances from parallel data streams. In the 2000s, this also became known as "siamese networks".

An alternative method called LOCOCODE (1995 - ) performs ICA as a by-product of discovering simple networks (with low information- theoretic complexity) coding the input data [11,12,15-18]. It can outperform previous methods for ICA and PCA, and establishes a link between regularization and unsupervised learning.

Adaptive methods for sequence compression and sequence coding (scroll down for refs 1a-1j) are even more general approaches to unsupervised learning. In particular, neural history compressors or hierarchical temporal memories (1991) [1d-1f] compactly encode sequential data for Deep Learning.

The Neural Heat Exchanger [13] (presented in talks by Schmidhuber since 1990) is a supervised variant of Hinton and Dayan's 1994 unsupervised "Helmholtz machine".

My first Deep Learner of 1991 + Deep Learning timeline 1962-2013

Unsupervised Neural Networks Fight in a Minimax Game

The formal theory of creativity by Juergen Schmidhuber explains the desire to learn motor skills, to do science, to produce art

Theory of Surprise and Artificial Curiosity

2011: First Superhuman Visual Pattern Recognition

Our impact on the world's most valuable public companies: Apple (#1), Alphabet (Google, #2), Microsoft (#3), Amazon (#5), ...

Predictors and feature detectors that fight each other: the feature detectors try to become unpredictable.