Schmidhuber (1992) shows how can be defined with the help of intra-representational adaptive predictors that try to predict each output unit of some from its remaining output units, while each output unit in turn tries to extract properties of the environment that allow it to escape predictability. This was called the principle of predictability minimization. This principle encourages each output unit of to represent environmental properties that are statistically independent from environmental properties represented by the remaining output units. The procedure aims at generating binary `factorial codes' [Barlow et al., 1989]. It is our preferred method, because (unlike the methods used by Linsker (1988), Becker and Hinton (1989), and Zemel and Hinton (1991) ) it has a potential for removing even non-linear statistical dependencies3among the output units of some classifier.
Let us define
To encourage even distributions in output space,
we slightly modify by
introducing a term similar to the one in equation (5), subsection 2.1