Where does come from? To discover flat minima FMS searches for large axis-aligned hypercuboids (boxes) in weight space such that weight vectors within the box yield similar network behavior. Boxes satisfy two flatness conditions, FC1 and FC2. FC1 enforces ``tolerable'' output variation in response to weight vector perturbations, i.e., near-flatness of the error surface around the current weight vector (in all weight space directions). Among the boxes satisfying FC1, FC2 selects a unique one with minimal net output variance. is the negative logarithm of this box's volume (ignoring constant terms that have no effect on the gradient descent algorithm). Hence is the number of bits (save a constant) required to describe the current net function, which does not change significantly by changing weights within the box. The box edge length determines the required weight precision. See Hochreiter and Schmidhuber (1997a) for details of 's derivation.