next up previous
Next: Low-Complexity Autoassociators Up: Second term favors few, Previous: Second term favors few,


SPECIAL CASE: LINEAR OUTPUT ACTIVATION.

Since our targets will usually be in the linear range of a sigmoid output activation function, let us consider the linear case in more detail. Suppose all output units $k$ use the same linear activation function $f_k\left(x\right) = C x$ (where $C$ is a real-valued constant). Then $\frac{\partial y^k}{\partial y^i}= C w_{ki}$ for hidden unit $i$. We obtain

\begin{displaymath}
T2 \ = \ W \log
\left( \left\vert O\right\vert \ left\ve...
...i\right\Vert \ \left\Vert W_u\right\Vert} \right)
\mbox{ ,}
\end{displaymath}

where $W_i$ denotes the outgoing weight vector of unit $i$ with $[W_i]_k := w_{ki}$, $\left\Vert. \right\Vert$ the Euclidean vector norm $\left\Vert x \right\Vert = \sqrt{\sum_i x_i^2}$, and $[.]_k$ the $k$th component of a vector.

Few component functions preferred. We observe that hidden units whose outgoing weight vectors have near-zero weights yield small contributions to $T2$, that is, the number of CFs will get minimized.

Common component functions preferred. Outgoing weight vectors of hidden units are encouraged to have a large effect on the output (see denominator in the last term in the brackets of $T2$). This implies preference of CFs that can be used for generating many or all output components.

CF separation -- few relevant CFs per output unit. On the other hand, two hidden units whose outgoing weight vectors do not solely consist of near-zero weights are encouraged to influence the output in different ways by not representing the same input feature (see numerator in the last term in the brackets of $T2$). In fact, FMS punishes not only outgoing weight vectors with same or opposite directions but also vectors obtained by flipping the signs of the weights (multiple reflections from hyperplanes trough the origin and orthogonal to one axis). Hence two units performing redundant tasks, such as both activating some output unit, or one activating it and the other de-activating it, will cause large contributions to $T2$. This encourages separation of CFs and use of few CFs per output unit.


next up previous
Next: Low-Complexity Autoassociators Up: Second term favors few, Previous: Second term favors few,
Juergen Schmidhuber 2003-02-13


Back to Independent Component Analysis page.