next up previous
Next: PREDICTABILITY MINIMIZATION AND TIME Up: EXPERIMENTS Previous: OCCAM'S RAZOR AT WORK

NON-UNIFORMLY DISTRIBUTED INPUTS

The input ensemble considered in this subsection consists of four different patterns denoted by $x_a$, $x_b$, $x_c$, and $x_d$, respectively. The probabilities of the patterns were

\begin{displaymath}
P(x^a) = \frac{1}{9},
P(x^b) = \frac{2}{9},
P(x^c) = \frac{2}{9},
P(x^d) = \frac{4}{9}.
\end{displaymath}

This ensemble allows for binary factorial codes, one of which is denoted by the following

code $F$: $y^a = (1,1)^T$, $y^b = (0,1)^T$, $y^c = (1,0)^T$, $y^d = (0,0)^T$.

With code $F$, the total objective function $V_C$ becomes $V^F_C = 2$. A non-factorial but invertible (information-preserving) code is given by

code $B$: $y^a = (0,1)^T$, $y^b = (0,0)^T$, $y^c = (1,0)^T$, $y^d = (1,1)^T$.

With code $B$, $V_C = \frac{19}{10}$, which is only $ \frac{1}{10}$ below $V^F_C$. This already indicates that certain local maxima of the internal state's objective function may be very close to the global maxima.

Experiment 1: off-line, $dim(y) = 2$, $dim(x) = 2$, distributed input representation with $x^a = (0,0)^T$, $x^b = (0,1)^T$, $x^c = (1,0)^T$, $x^d = (1,1)^T$, 1 hidden unit per predictor, 2 hidden units shared among the representational modules. 10 test runs with 2,000 epochs for the representational modules were conducted. Here one epoch consisted of the presentation of 9 patterns - $x^a $ was presented once, $x^b $ was presented twice, $x^c $ was presented twice, $x^d $ was presented four times.

In 7 cases, the system found a global maximum corresponding to a factorial code. In the remaining cases the code was not invertible.

Experiment 2 (Occam's Razor): Like experiment 1, but with $dim(y) = 3$. In all but one of the 10 test runs the system developed a factorial code (including one unused unit). In the remaining test run the code was at least invertible.

With local input representation and $dim(x) = 4$, $dim(y) = 2$, the success rate dropped below 50 percent. With $dim(y) = 3$, the system usually found invertible but rarely factorial codes. This reflects the fact that with certain input ensembles there is a trade-off between redundancy and invertibility: Superfluous degrees of freedom among the representational units may increase the probability that an information-preserving code is found, while at the same time decreasing the probability of finding an optimal factorial code.


next up previous
Next: PREDICTABILITY MINIMIZATION AND TIME Up: EXPERIMENTS Previous: OCCAM'S RAZOR AT WORK
Juergen Schmidhuber 2003-02-13


Back to Independent Component Analysis page.