Simple component functions (CFs).
The term
(1) | |||
(2) | |||
(3) | |||
(4) | |||
(5) | |||
(6) | |||
(7) | |||
(8) | |||
(9) | |||
(10) | |||
(11) | |||
(12) | |||
(13) | |||
(14) | |||
(15) | |||
makes (1) unit activations decrease to zero in proportion to their fan-outs, (2) first-order derivatives of activation functions decrease to zero in proportion to their fan-ins, and (3) the influence of units on the output decrease to zero in proportion to the unit's fan-in. For a detailed analysis see Hochreiter and Schmidhuber (1997a). is the reason why low-complexity (or simple) CFs are preferred.
Sparseness. Point (1) above favors sparse hidden unit activations (here: few active components); point (2) favors non-informative hidden unit activations hardly affected by small input changes. Point (3) favors sparse hidden unit activations in the sense that ``few hidden units contribute to producing the output''. In particular, sigmoid hidden units with activation function favor near-zero activations.