Becker and Hinton report that their system (based on binary probabilistic units) was able to extract the `shift' between two simple stereoscopical binary images only if IMAX was applied in successive `layer by layer' bootstrap stages. In addition, they heuristically tuned the learning rate during learning. Finally they introduced a maximal weight change for each weight during gradient ascent.
In contrast, the method described herein (based on continuous-valued units) does not rely on successive bootstrap stages or any other heuristic considerations.
We minimized (4) with defined by predictability minimization according to (9).
With a first experiment, we employed a different set of weights for each network. With ten test runs involving 100,000 training patterns the networks always learned to extract the stereoscopical shift. This performance of our non-bootstrapped system is comparable to the performance of Becker's and Hinton's bootstrapped system.
With a second experiment, we used only one set of weights for both networks (this leads to a reduction of free parameters). The result was a significant decrease of learning time - with ten test runs the system needed between 20,000 and 50,000 training patterns to learn to extract the shift.