During training, the images were randomly presented according to the probabilities of English language. The unsupervised system had 150 input units, 16 code units, and 1 ``bias'' unit. Each predictor had 15 input units, 1 ``bias'' unit, and 1 output unit. The learning rate of the predictors was 10 times as high as the learning rate of the code units. Within 10000 pattern presentations, the system often learned to generate a loss-free code of the ensemble such that the code was much less redundant than the original data. The redundancy (see the definition in section 1.2) corresponding to the original DEC dataset is . The redundancy corresponding to a 16-bit code discovered by the system is . See [14], [13], and [24] for details.
This result corresponds to a dramatic reduction of redundant information, although the achieved value is not optimal. In many realistic cases, however, approximations of nonredundant codes should be satisfactory. It is intended to apply the method to the problem of unsupervised segmentation of real world images. See [30] for an application to simple stereo vision.
One might speculate about whether the brain uses a similar principle based on ``code neurons'' trying to escape the predictions of ``predictor neurons''.