   Next: EXPERIMENTAL RESULTS Up: DERIVATION OF THE ALGORITHM Previous: DERIVATION OF THE ALGORITHM

## MDL-JUSTIFICATION OF FLATNESS CONDITION 2

Let us assume a sender wants to send a description of the function induced by to a receiver who knows the inputs but not the targets , where . The MDL principle suggests that the sender wants to minimize the expected description length of the net function. Let denote the mean value of on the box. Expected description length is approximated by , where are positive constants. One way of seeing this is to apply Hinton and van Camp's bits back'' argument to a uniform weight prior ( corresponds to the output variance). However, we prefer to use a different argument: we encode each weight of the box center by a bitstring according to the following procedure ( is given):
(0) Define a variable interval .
(1) Make equal to the interval constraining possible weight values.
(2) While :
Divide into 2 equally-sized disjunct intervals and .
If then ; write 1'.
If then ; write 0'.
The final set corresponds to a bit-box'' within our box. This bit-box'' contains 's center and is described by a bitstring of length , where the constant is independent of the box . From ( is the center of the bit-box'') and the bitstring describing the bit-box'', the receiver can compute as follows: he selects an initialization weight vector within the bit-box'' and uses gradient descent to decrease until , where in the bit-box denotes the receiver's current approximation of ( is constantly updated by the receiver). This is like FMS without targets'' - recall that the receiver knows the inputs . Since corresponds to the weight vector with the highest degree of local flatness within the bit-box'', the receiver will find the correct . is described by a Gaussian distribution with mean zero. Hence, the description length of is (Shannon, 1948). , the center of the bit-box'', cannot be known before training. However, we do know the expected description length of the net function, which is ( is a constant independent of ). Let us approximate :  .

Among those that lead to equal (the negative logarithm of the box volume plus ), we want to find those with minimal description length of the function induced by . Using Lagrange multipliers (viewing the as variables), it can be shown that is minimal under the condition iff flatness condition 2 holds. To conclude: with given box volume, we need flatness condition 2 to minimize the expected description length of the function induced by .   Next: EXPERIMENTAL RESULTS Up: DERIVATION OF THE ALGORITHM Previous: DERIVATION OF THE ALGORITHM
Juergen Schmidhuber 2003-02-13

Back to Financial Forecasting page