LOCOCODE, our novel approach to unsupervised learning and sensory coding, does not define code optimality solely by properties of the code itself but takes into account the information-theoretic complexity of the mappings used for coding and decoding. The resulting lococodes typically compromise between conflicting goals. They tend to be sparse and exhibit low but not minimal redundancy -- if the costs of generating minimal redundancy are too high. Lococodes tend towards binary, informative feature detectors, but occasionally there are trinary or continuous-valued code components (where complexity considerations suggest such alternatives).
A general principle? According to our analysis LOCOCODE essentially attempts at describing single inputs with as few and as simple features as possible. Depending on the statistical properties of the input, this can result in either local, factorial, or sparse codes, although biologically plausible sparseness is the most common case. Unlike the objective functions of previous methods (e.g., Olshausen and Field 1996), however, LOCOCODE's does not contain an explicit term enforcing, say, sparse codes -- sparseness or factoriality are not viewed as a good things a priori. This seems to suggest that LOCOCODE's objective may embody a general principle of unsupervised learning going beyond previous, more specialized ones.
Regularizers and unsupervised learning. Another way of looking at our results is this: there is at least one representative (FMS) of a broad class of algorithms (regularizer algorithms that reduce net complexity) which can do optimal feature extraction as a by-product. This reveils an interesting, previously ignored connection between two important fields (regularizer research and ICA-related research), and may represent a first step towards unification of regularization and unsupervised learning.
Advantages. LOCOCODE is appropriate if single inputs (with many input components) can be described by few features computable by simple functions. Hence, assuming that visual data can be reduced to few simple causes, LOCOCODE is appropriate for visual coding. Unlike simple ICA, LOCOCODE (a) is not inherently limited to the linear case, and (b) does not need a priori information about the number of independent data sources. Even when the number of sources is known, however, LOCOCODE can outperform other coding methods. This has been demonstrated by our LOCOCODE implementation based on FMS-trained autoassociators (AAs), which easily solves coding tasks that have been described as hard by other authors, and whose input causes are not perfectly separable by standard AAs, PCA, and ICA. Furthermore, when applied to realistic visual data, LOCOCODE produces familiar on-center-off-surround receptive fields and biologically plausible sparse codes (standard AAs do not). Codes obtained by ICA, PCA and LOCOCODE convey about the same information, as indicated by the reconstruction error. But LOCOCODE's coding efficiency is higher: it needs fewer bits per input pixel. Our experiments also demonstrate the utility of LOCOCODE-based data preprocessing for subsequent classification.
Limitations. FMS' order of computational complexity depends on the number of output units. For typical classification tasks (requiring few output units) it equals standard backprop's. In the AA case, however, the output's dimensionality grows with the input's. That's why large scale FMS-trained AAs seem to require parallel implementation. Furthermore, although LOCOCODE works well for visual inputs, it may be less useful for discovering input causes that can only be represented by high-complexity input transformations, or for discovering many features (causes) collectively determining single input components (as, e.g., in acoustic signal separation, where ICA does not suffer from the fact that each source influences each input component and none is computable by a low-complexity function).
Future work. Encouraged by the familiar lococodes obtained in our experiments with visual data we intend to move on to higher-dimensional inputs and larger receptive fields. This may lead to even more pronounced feature detectors like those observed by Schmidhuber et al. (1996). It will also be interesting to test whether successive LOCOCODE stages, each feeding its code into the next, will lead to complex feature detectors such as those discovered in deeper regions of the mammalian visual cortex. Finally, encouraged by our successful application to vowel classification, we intend to look at more complex pattern recognition tasks.
We also intend to look at alternative LOCOCODE implementations besides FMS-based AAs. Finally we would like to improve our understanding of the relationship between low-complexity codes, low-complexity art (see Schmidhuber, 1997b) and informal notions such as ``beauty'' and ``good art''.