next up previous
Next: AUTO-ENCODERS Up: ALTERNATIVE DEFINITIONS OF Previous: ALTERNATIVE DEFINITIONS OF

MAXIMIZING CONSTRAINED OUTPUT VARIANCE

We write

\begin{displaymath}
D_l = - \frac{1}{2} \sum_p \sum_i (y^{p,l}_i- \bar{y_i}^l)^2
+ \frac{\lambda}{2} \sum_i [ \frac{1}{q} - \bar{y_i}^l]^2
\end{displaymath} (5)

and minimize $D_l$ subject to the constraint
\begin{displaymath}
\forall p:
\sum_i y^{p,l}_i =1.
\end{displaymath} (6)

Here, as well as throughout the remainder of this paper, subscripts of symbols denoting vectors denote vector components: $v_i$ denotes the $i$-th element of some vector $v$. $\lambda$ is a positive constant, and $\bar{y_i}^l$ denotes the mean of the $i$-th output unit of $T_l$. It is possible to show that the first term on the right hand side of (5) is maximized subject to (6) if each input pattern is locally represented (just like with winner-take-all networks) by exactly one corner of the $q$-dimensional hypercube spanned by the possible output vectors, if there are sufficient output units [Prelinger, 1992] 2. Maximizing the second negative term encourages each local class representation to become active in response to only $\frac{1}{q}$-th of all possible input patterns.

Constraint (6) is enforced by setting

\begin{displaymath}
y^{p,j}_i = \frac{u^{p,j}_i}{ \sum_i u^{p,j}_i},
\end{displaymath}

where $u^{p,j}$ is the activation vector (in response to $x^{p,j}$) of a $q$-dimensional layer of hidden units of $T_j$ which can be considered as its unnormalized output layer.

This novel method is easy to implement - it achieves an effect similar to the one of the recent entropy-based method by Bridle and MacKay (1992).


next up previous
Next: AUTO-ENCODERS Up: ALTERNATIVE DEFINITIONS OF Previous: ALTERNATIVE DEFINITIONS OF
Juergen Schmidhuber 2003-02-13