EXPERIMENTS

We compare LOCOCODE to ``independent component analysis'' (ICA, e.g., [5,1,4,19]) and ``principal component analysis'' (PCA, e.g., [21]). ICA is realized by Cardoso's JADE algorithm, which is based on whitening and subsequent joint diagonalization of 4th-order cumulant matrices. To measure the information conveyed by resulting codes we train a standard backprop net on the training set used for code generation. Its inputs are the code components; its task is to reconstruct the original input. The test set consists of 500 off-training set exemplars (in the case of real world images we use a separate test image). Coding efficiency is the average number of bits needed to code a test set input pixel. The code components are scaled to the interval

and partitioned into discrete intervals. Assuming independence of the code components we estimate the probability of each discrete code value by Monte Carlo sampling on the training set. To obtain the test set codes' bits per pixel (Shannon's optimal value) the average sum of all negative logarithms of code component probabilities is divided by the number of input components. All details necessary for reimplementation are given in [15].

Noisy bars adapted from [11,12]. The input is a $5 \times 5$ pixel grid with horizontal and vertical bars at random positions. The task is to extract the independent features (the bars). Each of the 10 possible bars appears with probability $\frac{1}{5}$ . In contrast to [11,12] we allow for bar type mixing -- this makes the task harder. Bar intensities vary in

; input units that see a pixel of a bar are activated correspondingly others adopt activation

. We add Gaussian noise with variance 0.05 and mean 0 to each pixel. For ICA and PCA we have to provide information about the number (ten) of independent sources (tests of versions with

assumed sources will be denoted by ICA-

and PCA-

). LOCOCODE does not require this -- using 25 hidden units (HUs) we expect LOCOCODE to prune the 15 superfluous HUs.

Results. See Table 1. While the reconstruction errors of all methods are similar, LOCOCODE has the best coding efficiency. 15 of the 25 HUs are indeed automatically pruned: LOCOCODE finds an optimal factorial code which exactly mirrors the pattern generation process. PCA codes and ICA-15 codes, however, are unstructured and dense. While ICA-10 codes are almost sparse and do recognize some sources, the sources are not clearly separated like with LOCOCODE -- compare the weight patterns shown in [15].

Real world images. Now we use more realistic input data, namely subsections of: 1) the aerial shot of a village, 2) an image of wood cells, and 3) an image of striped piece of wood. Each image has $150 \times 150$ pixels, each taking on one of 256 gray levels. $7 \times 7$ ( $5 \times 5$ for village) pixels subsections are randomly chosen as training inputs. Test sets stem from images similar to 1), 2), and 3).

Results. For the village image LOCOCODE discovers on-center-off-surround hidden units forming a sparse code. For the other two images LOCOCODE also finds appropriate feature detectors -- see weight patterns shown in [15]. Using its compact, low-complexity features it always codes more efficiently than ICA and PCA.

Table: Overview of experiments: name of experiment, input field size, coding method, number of relevant code components (code size), reconstruction error, nature of code observed on the test set. PCA's and ICA's code sizes need to be prewired. LOCOCODE's, however, are found automatically (we always start with 25 HUs). The final 4 columns show the coding efficiency measured in bits per pixel, assuming the real-valued HU activations are partitioned into 10, 20, 50, and 100 discrete intervals. LOCOCODE codes most efficiently.

exp.	input	meth.	num.	rec.	code	bits per pixel: # intervals
	field		comp.	error	type	10	20	50	100
bars	$5 \times 5$	LOC	10	1.05	sparse	0.584	0.836	1.163	1.367
bars	$5 \times 5$	ICA	10	1.02	sparse	0.811	1.086	1.446	1.678
bars	$5 \times 5$	PCA	10	1.03	dense	0.796	1.062	1.418	1.655
bars	$5 \times 5$	ICA	15	0.71	dense	1.189	1.604	2.142	2.502
bars	$5 \times 5$	PCA	15	0.72	dense	1.174	1.584	2.108	2.469
village	$5 \times 5$	LOC	8	1.05	sparse	0.436	0.622	0.895	1.068
village	$5 \times 5$	ICA	8	1.04	sparse	0.520	0.710	0.978	1.165
village	$5 \times 5$	PCA	8	1.04	dense	0.474	0.663	0.916	1.098
village	$5 \times 5$	ICA	10	1.11	sparse	0.679	0.934	1.273	1.495
village	$5 \times 5$	PCA	10	0.97	dense	0.578	0.807	1.123	1.355
village	$7 \times 7$	LOC	10	8.29	sparse	0.250	0.368	0.547	0.688
village	$7 \times 7$	ICA	10	7.90	dense	0.318	0.463	0.652	0.796
village	$7 \times 7$	PCA	10	9.21	dense	0.315	0.461	0.648	0.795
village	$7 \times 7$	ICA	15	6.57	dense	0.477	0.694	0.981	1.198
village	$7 \times 7$	PCA	15	8.03	dense	0.474	0.690	0.972	1.189
cell	$7 \times 7$	LOC	11	0.840	sparse	0.457	0.611	0.814	0.961
cell	$7 \times 7$	ICA	11	0.871	sparse	0.468	0.622	0.829	0.983
cell	$7 \times 7$	PCA	11	0.722	sparse	0.452	0.610	0.811	0.960
cell	$7 \times 7$	ICA	15	0.360	sparse	0.609	0.818	1.099	1.315
cell	$7 \times 7$	PCA	15	0.329	dense	0.581	0.798	1.073	1.283
piece	$7 \times 7$	LOC	4	0.831	sparse	0.207	0.269	0.347	0.392
piece	$7 \times 7$	ICA	4	0.856	sparse	0.207	0.276	0.352	0.400
piece	$7 \times 7$	PCA	4	0.830	sparse	0.207	0.269	0.348	0.397
piece	$7 \times 7$	ICA	10	0.716	sparse	0.535	0.697	0.878	1.004
piece	$7 \times 7$	PCA	10	0.534	sparse	0.448	0.590	0.775	0.908