next up previous
Next: FMS: A Novel Analysis Up: Source Separation as a Previous: INTRODUCTION

FLAT MINIMUM SEARCH: REVIEW AND ANALYSIS


FMS is a general gradient-based method for finding low-complexity networks with high generalization capability. FMS finds a large region in weight space such that each weight vector from that region has similar small error. Such regions are called ``flat minima''. In MDL terminology, few bits of information are required to pick a weight vector in a ``flat'' minimum (corresponding to a low-complexity network) -- the weights may be given with low precision. FMS automatically prunes weights and units, and reduces output sensitivity with respect to remaining weights and units. Previous FMS applications focused on supervised learning [13,14].

Notation. Let $O,H,I$ denote index sets for output, hidden, and input units, respectively. For $l \in O \cup H$, the activation $y^l$ of unit $l$ is $y^l = f\left(s_l\right)$, where $s_l= \sum_m w_{lm} y^m$ is the net input of unit $l$ ($m \in H$ for $l \in O$ and $m \in I$ for $l \in H$), $w_{lm}$ denotes the weight on the connection from unit $m$ to unit $l$, $f$ denotes the activation function, and for $m \in I$, $y^m$ denotes the $m$-th component of an input vector. $W=\left\vert\left(O \times H\right) \cup \left(H \times I\right)\right\vert$ is the number of weights.

Algorithm. FMS' objective function $E$ features an unconventional error term:

\begin{displaymath}
B = \sum_{i,j: \ i \in O \cup H}
\log \sum_{k \in O} \left(...
...rac{\partial y^k}{\partial w_{ij}}\right)^{2}}} \right)^{2}
.
\end{displaymath}

$E = E_q + \lambda B$ is minimized by gradient descent, where $E_q$ is the training set mean squared error (MSE), and $\lambda$ a positive ``regularization constant'' scaling $B$'s influence. Choosing $\lambda$ corresponds to choosing a tolerable error level (there is no a priori ``optimal'' way of doing so). $B$ measures the weight precision (number of bits needed to describe all weights in the net). Given a constant number of output units, FMS can be implemented efficiently, namely, with standard backprop's order of computational complexity [14].



Subsections
next up previous
Next: FMS: A Novel Analysis Up: Source Separation as a Previous: INTRODUCTION
Juergen Schmidhuber 2003-02-25