We did not use Infomax methods in our experiments for the following reasons:

(a) There is no efficient and general method for maximizing mutual information. (b) With our basic approach from section 1, Infomax makes sense only in situations where it automatically enforces high variance of the outputs of the (possibly under certain constraints). This holds for the simplifying Gaussian noise models studied by Linsker, but it does not hold for the general case. (c) Even under appropriate Gaussian assumptions, with more than one-dimensional representations, Infomax implies maximization of functions of the determinant of the covariance matrix of the output activations [Shannon, 1948]. In a small application, Linsker explicitly calculated 's derivatives. In general, however, this is clumsy.

Juergen Schmidhuber 2003-02-13