next up previous
Next: EXPERIMENTS Up: EFFECTS OF THE ADDITIONAL Previous: SPECIAL CASE: LINEAR OUTPUT


Low-Complexity Autoassociators

Given some data set, FMS can be used to find a low-complexity autoassociator (AA) whose hidden layer activations code the individual training exemplars. The AA can be split into two modules: one for coding, one for decoding.

Previous autoassociators (AAs). Backprop-trained AAs without a narrow hidden bottleneck (``bottleneck'' refers to a hidden layer containing fewer units than other layers) typically produce redundant, continuous-valued codes and unstructured weight patterns. Baldi and Hornik (1989) studied linear AAs with a hidden layer bottleneck and found that their codes are orthogonal projections onto the subspace spanned by the first principal eigenvectors of a covariance matrix associated with the training patterns. They showed that the mean squared error (MSE) surface has an unique minimum. Nonlinear codes have been obtained by nonlinear bottleneck AAs with more than 3 (e.g., 5) layers, e.g., Kramer (1991), Oja (1991) or DeMers and Cottrell (1993). None of these methods produces sparse, factorial or local codes -- instead they produce first principal components or their nonlinear equivalents (``principal manifolds''). We will see that FMS-based AAs yield quite different results.

FMS-based AAs. According to subsections 3.1 and 3.2, because of the low-complexity coding aspect the codes tend to (C1) be binary for sigmoid units with activation function $f_i(x) = \frac{1}{1+\exp(-x)}$ ($f_i'(s_i)$ is small for $y^i$ near 0 or 1), (C2) require few separated code components or hidden units (HUs), and (C3) use simple component functions. Because of the low-complexity decoding part, codes also tend to (D1) have many HUs near zero and, therefore, be sparsely (or even locally) distributed, (D2) have code components conveying information useful for generating as many output activations as possible. (C1), (C2) and (D2) encourage minimally redundant, binary codes. (C3), (D1) and (D2), however, encourage sparse distributed (local) codes. (C1) - (C3) and (D1) - (D2) lead to codes with simply computable code components (C1, C3) that convey a lot of information (D2), and with as few active code components as possible (C2, D1). Collectively this makes code components represent simple input features.


next up previous
Next: EXPERIMENTS Up: EFFECTS OF THE ADDITIONAL Previous: SPECIAL CASE: LINEAR OUTPUT
Juergen Schmidhuber 2003-02-13


Back to Independent Component Analysis page.