Next: AUTO-ENCODERS
Up: ALTERNATIVE DEFINITIONS OF
Previous: ALTERNATIVE DEFINITIONS OF
We write
|
(5) |
and minimize subject to the
constraint
|
(6) |
Here, as well as throughout the remainder of this paper,
subscripts of symbols denoting vectors denote vector components:
denotes the -th element of some vector .
is a positive constant, and
denotes the mean of the -th output unit of .
It is possible to show that
the first term on the right hand side of
(5) is maximized subject to (6) if each input pattern is locally
represented (just like with winner-take-all networks) by exactly
one corner of the -dimensional hypercube spanned
by the possible output vectors, if there are sufficient
output units [Prelinger, 1992]
2.
Maximizing the second negative term
encourages each local class representation to
become active in response to only
-th of all possible input patterns.
Constraint (6) is enforced by setting
where is the activation vector
(in response to ) of a -dimensional layer
of hidden units of
which can be considered as
its unnormalized output layer.
This novel method is easy to implement -
it achieves an effect similar to the one of the recent
entropy-based method by
Bridle and MacKay (1992).
Next: AUTO-ENCODERS
Up: ALTERNATIVE DEFINITIONS OF
Previous: ALTERNATIVE DEFINITIONS OF
Juergen Schmidhuber
2003-02-13