EXPERIMENT 1: local, sparse, factorial codes -- feature detectors

The following five experiments demonstrate effects of various input representations, data distributions, and architectures according to Table 1. The data always consists of 8 input vectors. Code units are initialized with a negative bias of -2.0.

**Constant Parameters.**
,
(2-phase learning).

**Experiment 1.1:**
We use
uniformly distributed inputs and
500,000 training examples.
*Parameters:* learning rate: 0.1,
the ``tolerable error''
,
*Architecture:* (8-5-8) (8 input units, 5 HUs, 8 output units).

**Results: factorial codes.**
In 7 out of 10 trials, FMS effectively pruned 2 HUs,
and produced
a *factorial binary code* with statistically independent code components.
In 2 trials FMS pruned 2 HUs and produced an almost binary code --
with one trinary unit taking on values of 0.0, 0.5, 1.0.
In one trial FMS produced a binary code with only one HU being
pruned away.
Obviously, under certain constraints on
the input data, FMS has a strong tendency towards the compact,
nonredundant codes advocated by numerous researchers.

**Experiment 1.2:**
See
Table 1 for differences to
Experiment 1.1. We use 200,000 training examples
and more HUs
to make clear that in this case fewer units are pruned.

**Results: local codes.**
10 trials were conducted. FMS always produced a binary code.
In 7 trials, only 1 HU was pruned, in the remaining
trials 2 HUs.
Unlike with standard BP,
*almost all inputs almost always were coded in an entirely local manner*,
i.e., only one HU was switched on, the others switched off.
Recall that local codes were also advocated by many researchers - but
they are
precisely ``the opposite'' of the factorial codes from the previous
experiment.
How can LOCOCODE justify
such different codes? How to explain this apparent discrepancy?

**Explanation.**
The reason is: with the different input representation, the additional
HUs do not necessarily result in much more additional complexity
of the mappings for coding and decoding. The zero-valued inputs allow for
low weight precision (low coding complexity) for connections leading to
HUs (similarly for connections leading to output units).
In contrast to Experiment 1.1 it is possible to describe the -th possible
input by the following feature: ``the -th input component does not
equal zero''. It can be implemented by a low-complexity component
function. This contrasts the features in Experiment 1.1,
where there are only 5 hidden units and no zero input components: there
it is better to code with as few code components as possible,
which yields a factorial code.

**Experiment 1.3:**
like Experiment 1.2
but with *one-dimensional* input.
*Parameters:* learning rate: 0.1,
.

**Results: feature detectors.**
10 trials were conducted. FMS always produced the following code:
one binary HU making a distinction between input values less than 0.5 and
input values greater than 0.5,
2 HUs with continuous values, one of which
is zero (or one) whenever the binary unit is on, while the other
is zero (one) otherwise.
All remaining HUs adopt constant values of either 1.0 or 0.0, thus
being essentially pruned away.
The binary unit serves as a binary *feature detector*,
grouping the inputs into 2 classes.

**Lococode recognizes the causes.**
The data of Experiment 1.3 may be viewed as being generated as follows:
(1) first choose with uniform probability a value
from ; then (2) choose one
from
; then (3) add the two values.
The first cause of the data is recognized perfectly but the second is divided
among two code components, due to the
non-linearity of the output unit: adding to 0 is different from
adding to 0.75 (consider the first order derivatives).

**Experiment 1.4:**
like Experiment 1.1
but with nonuniformly distributed inputs.
*Parameters:* learning rate: 0.005,
.

**Results: sparse codes.**
In 4 out of 10 trials, FMS found a binary code (no HUs pruned).
In 3 trials: a binary code with one HU pruned. In one
trial: a code with one HU removed, and a trinary unit
adopting values of 0.0, 0.5, 1.0. In 2 trials: a code with
one pruned HU and 2 trinary HUs.
Obviously, with this set-up,
FMS prefers codes known as *sparse distributed representations*.
Inputs with higher probability are coded by fewer active code
components than inputs with lower probability. Typically, inputs with
probability lead to one active code component,
inputs with probability to two, and
others to three.

**Explanation.**
Why is the result different from Experiment 1.1's?
To achieve equal error contributions to all inputs,
the weights for coding/decoding highly probable inputs have to be
given with higher precision than the weights for coding/decoding
inputs with low probability:
the input distribution from Experiment 1.1 will result in a
more complex network.
The next experiment will make this effect even more pronounced.

**Experiment 1.5:**
like Experiment 1.4, but with architecture (8-8-8).

**Results: sparse codes.**
In 10 trials, FMS always produced binary codes.
In 2 trials only 1 HU was pruned.
In 1 trial 3 units were pruned.
In 7 trials 2 units were pruned.
Unlike with standard BP,
*almost all inputs almost always were coded in a sparse, distributed
manner:* typically, 2 HUs were switched on, the others switched off,
and most HUs responded to exactly 2 different input patterns.
The mean probability of a unit being switched on was 0.28,
and the probabilities of different HUs being switched on
tended to be equal.

Table 1 provides an overview over Experiments 1.1 -- 1.5.

**Conclusion.**
FMS always finds codes quite different from
standard BP's rather unstructured ones.
It tends to discover and represent the underlying causes.
Usually the resulting lococode is
sparse and based on informative feature detectors.
Depending on properties of the data it may become factorial or local.
This suggests that LOCOCODE may represent a general
principle of unsupervised learning subsuming
previous, COCOF-based approaches.

Feature-based lococodes automatically take into account input/output properties (binary?, local?, input probabilities?, noise?, number of zero input components?).

Back to Independent Component Analysis page.