FMS: A Novel Analysis

Next: EXPERIMENTS Up: FLAT MINIMUM SEARCH: REVIEW Previous: FLAT MINIMUM SEARCH: REVIEW

FMS: A Novel Analysis

Simple basis functions (BFs). A BF is the function determining the activation of a code component in response to a given input. Minimizing 's term

$\begin{displaymath} T1 \ := \ \sum_{i,j: \ i \in O \cup H} \log \sum_{k \in O} \left(\frac{\partial y^k}{\partial w_{ij}}\right)^{2} \end{displaymath}$

obviously reduces output sensitivity with respect to weights (and therefore units).

is responsible for pruning weights (and, therefore, units).

is one reason why low-complexity (or simple) BFs are preferred: weight precision (or complexity) is mainly determined by $\frac{\partial y^k}{\partial w_{ij}}$ .

Sparseness. Because tends to make unit activations decrease to zero it favors sparse codes. But also favors a sparse hidden layer in the sense that few hidden units contribute to producing the output. 's second term

$\begin{displaymath} T2 \ \ := \ \ W \log \sum_{k \in O} \left( \sum_{i,j: \ i \... ...tial y^k}{\partial w_{ij}}\right)^{2}}} \right)^{2} \mbox{ } \end{displaymath}$

punishes units with similar influence on the output. We reformulate it:

		$\displaystyle T2 = W \log \left( \sum_{i,j: \ i \in O \cup H} \ \ \sum_{u,v: \... ...t{\sum_{k \in O} \left(\frac{\partial y^k}{\partial y^u}\right)^{2}}} \right) =$
		$\displaystyle W \log \left( \left\vert O\right\vert \ \left\vert O \times H\rig... ...t{\sum_{k \in O} \left(\frac{\partial y^k}{\partial y^u}\right)^{2}}} \right) .$

See intermediate steps in [15]. We observe: (1) an output unit that is very sensitive with respect to two given hidden units will heavily contribute to

(compare the numerator in the last term of

). (2) This large contribution can be reduced by making both hidden units have large impact on other output units (see denominator in the last term of

Few separated basis functions. Hence FMS tries to figure out a way of using (1) as few BFs as possible for determining the activation of each output unit, while simultaneously (2) using the same BFs for determining the activations of as many output units as possible (common BFs). (1) and separate the BFs: the force towards simplicity (see ) prevents input information from being channelled through a single BF; the force towards few BFs per output makes them non-redundant. (1) and (2) cause few BFs to determine all outputs.

Summary. Collectively and (which make up ) encourage sparse codes based on few separated simple basis functions producing all outputs. Due to space limitations a more detailed analysis (e.g. linear output activation) had to be left to a TR [15] (on the WWW).

Next: EXPERIMENTS Up: FLAT MINIMUM SEARCH: REVIEW Previous: FLAT MINIMUM SEARCH: REVIEW

Juergen Schmidhuber 2003-02-25