Schmidhuber (1992)
shows how
can be defined with the help of *intra-representational*
adaptive predictors that try to predict each output unit of some
from its remaining output units, while
each output unit in turn tries to extract properties of the environment that
allow it to *escape* predictability. This was called the *principle
of predictability minimization*.
This principle encourages each output unit of to represent
environmental properties
that are statistically independent from environmental properties
represented by the remaining output units.
The procedure aims at generating binary `factorial codes' [Barlow et al., 1989].
It is our preferred method, because (unlike the methods
used by Linsker (1988), Becker and Hinton (1989), and Zemel
and Hinton (1991) )
it has a potential for removing even non-linear statistical
dependencies^{3}among the output units of some classifier.

Let us define

(8) |

To encourage even distributions in output space,
we slightly modify by
introducing a term similar to the one in equation (5), subsection 2.1
and obtain

(9) |