MAXIMIZING CONSTRAINED OUTPUT VARIANCE

Next: AUTO-ENCODERS Up: ALTERNATIVE DEFINITIONS OF Previous: ALTERNATIVE DEFINITIONS OF

MAXIMIZING CONSTRAINED OUTPUT VARIANCE

We write

$\begin{displaymath} D_l = - \frac{1}{2} \sum_p \sum_i (y^{p,l}_i- \bar{y_i}^l)^2 + \frac{\lambda}{2} \sum_i [ \frac{1}{q} - \bar{y_i}^l]^2 \end{displaymath}$

(5)

and minimize

subject to the constraint

$\begin{displaymath} \forall p: \sum_i y^{p,l}_i =1. \end{displaymath}$

(6)

Here, as well as throughout the remainder of this paper, subscripts of symbols denoting vectors denote vector components:

denotes the

-th element of some vector

. $\lambda$ is a positive constant, and $\bar{y_i}^l$ denotes the mean of the

-th output unit of

. It is possible to show that the first term on the right hand side of (5) is maximized subject to (6) if each input pattern is locally represented (just like with winner-take-all networks) by exactly one corner of the

-dimensional hypercube spanned by the possible output vectors, if there are sufficient output units [Prelinger, 1992] ². Maximizing the second negative term encourages each local class representation to become active in response to only $\frac{1}{q}$ -th of all possible input patterns.

Constraint (6) is enforced by setting

$\begin{displaymath} y^{p,j}_i = \frac{u^{p,j}_i}{ \sum_i u^{p,j}_i}, \end{displaymath}$

where $u^{p,j}$ is the activation vector (in response to $x^{p,j}$ ) of a

-dimensional layer of hidden units of

which can be considered as its unnormalized output layer.

This novel method is easy to implement - it achieves an effect similar to the one of the recent entropy-based method by Bridle and MacKay (1992).

Next: AUTO-ENCODERS Up: ALTERNATIVE DEFINITIONS OF Previous: ALTERNATIVE DEFINITIONS OF

Juergen Schmidhuber 2003-02-13