** Next:** EXPERIMENTS
** Up:** EFFECTS OF THE ADDITIONAL
** Previous:** SPECIAL CASE: LINEAR OUTPUT

##

Low-Complexity Autoassociators

Given some data set, FMS can be used to find a
low-complexity autoassociator (AA)
whose hidden layer activations code the individual
training exemplars.
The AA can be split into two modules: one for coding, one
for decoding.
**Previous autoassociators (AAs).**
Backprop-trained
AAs *without* a narrow hidden bottleneck
(``bottleneck'' refers to a hidden layer containing fewer units than
other layers) typically produce
redundant, continuous-valued codes and unstructured weight patterns.
Baldi and Hornik (1989) studied
linear AAs *with* a hidden layer bottleneck
and found that their codes
are orthogonal projections
onto the subspace spanned by the first principal
eigenvectors of a covariance matrix associated with the
training patterns. They showed that the mean squared error
(MSE)
surface has an unique minimum.
Nonlinear codes have been obtained
by nonlinear bottleneck AAs with more than 3 (e.g., 5) layers,
e.g., Kramer (1991), Oja (1991) or DeMers and Cottrell (1993).
None of these methods produces sparse,
factorial or local
codes -- instead they produce first principal components
or their nonlinear equivalents (``principal manifolds'').
We will see that FMS-based AAs yield quite different results.

**FMS-based AAs.**
According to subsections 3.1 and 3.2,
because of the low-complexity *coding* aspect
the codes tend to
(C1) be binary for sigmoid units
with activation function
( is small for near 0 or 1),
(C2) require few separated code components or hidden units (HUs),
and (C3) use simple component functions.
Because of the low-complexity *de*coding part,
codes also tend to
(D1) have many HUs near zero
and, therefore, be sparsely (or even locally)
distributed,
(D2) have code components conveying information useful for
generating as many output activations as possible.
(C1), (C2) and (D2) encourage minimally redundant, binary codes.
(C3), (D1) and (D2), however, encourage sparse distributed (local)
codes. (C1) - (C3) and (D1) - (D2) lead to codes with
simply computable code components (C1, C3) that convey a lot of
information (D2), and with as
few active code components as possible (C2, D1).
*Collectively this makes code components represent simple input
features.*

** Next:** EXPERIMENTS
** Up:** EFFECTS OF THE ADDITIONAL
** Previous:** SPECIAL CASE: LINEAR OUTPUT
Juergen Schmidhuber
2003-02-13

Back to Independent Component Analysis page.