next up previous
Next: DISCUSSION Up: SEMILINEAR PREDICTABILITY MINIMIZATION PRODUCES Previous: PREDICTABILITY MINIMIZATION: DETAILS

APPLICATION: IMAGE PROCESSING

We apply PM to static black and white images of driving cars (see Figure 2). Each image is divided into $566 \times 702$ square pixels. Each pixel can take on 16 different grey levels represented as integers scaled around zero.

Input generation. There is a circular ``input area''. Its diameter is 64 pixel widths. There are 32 code units. For each code unit, there is a ``bias input unit'' with constant activation 1.0, and a circular receptive field of 81 evenly distributed additional input units. The diameter of each receptive field is 20 pixel widths. Receptive fields partly overlap. The positions of code units and receptive fields relative to the input area are fixed. See Figure 3. The rotation of the input area is chosen randomly. Its position is chosen randomly within the boundaries of the image. The activation of an input unit is the average grey level value of the closest pixel and the four adjacent pixels (see Figure 4).

Learning: heuristic simplifications. To achieve extreme computational simplicity (and also biological plausibility), we simplify the general method from section 2. Heuristic simplifications are: (1) No error signals are propagated through the predictor input units down into the code network. (2) We focus on semilinear networks as opposed to general nonlinear ones (no hidden units within predictors and code generating net - see Figure 1). (3) Predictors and code units learn simultaneously (also, each code unit sees only part of the total input). These simplifications make the method local in both space and time -- to change the weight of any predictor connection, we can use the simple delta rule, which needs to know only the current activations of the two connected units, and the current activation of the unit to be predicted: in response to input pattern $x_p$, each weight $w$ of predictor $P_i$ changes according to

\begin{displaymath}
w \leftarrow - \eta_P \frac{\partial (P_i^p - y_i^p)^2}{\partial w},
\end{displaymath}

where $\eta _P$ is a positive constant. Likewise, to change the weight of any connection to a code unit, we need to know only the current activations of the two connected units, and the current activation of the corresponding predictor output unit: in response to input pattern $x_p$, each weight $v$ leading to code unit $i$ changes according to

\begin{displaymath}
v \leftarrow \eta_C \frac{\partial (P_i^p - y_i^p)^2}{\partial v},
\end{displaymath}

where $\eta _C$ is a positive constant.

Figure 2: A typical image from the image data base.
\begin{figure}\centerline{\psfig{figure=cars.ps,width=10cm}}\end{figure}

Figure 3: Small circles represent partly overlapping receptive fields of code units. Their positions are shown relative to the input area (grey). See Figure 4 for details.
figure=receptivefields.eps,width=
Figure 4: Details of a receptive field inside the input area. Small circles indicate pixel positions. Crosses indicate positions of rotated input units. The activation of each input unit is the average grey level value of the closest pixel and the four adjacent pixels (indicated by dotted lines).
figure=activations.eps,width=

Figure 5: Success rate (measuring information throughput) plotted against number of training pattern presentations (logarithmic scale). Results are shown for various pairs of predictor learning rates $\eta _P$ and code unit learning rates $\eta _C$. $a:~\eta_P = 0.001, ~ \eta_C = 0.00004$. $b:~\eta_P = 0.01, ~ \eta_C = 0.00011$. $c:~\eta_P = 0.1, ~ \eta_C = 0.005$. $d:~\eta_P = 1.0, ~ \eta_C = 0.0042$.
\begin{figure}\centerline{\psfig{figure=plot.eps,width=13cm}}\end{figure}

Figure 6: Weights corresponding to a typical receptive field (after 5000 pattern presentations). Bright (dark) circles represent positive (negative) weights. The connections are divided into two groups, one with inhibitory connections, the other one with excitatory connections. Both groups are separated by a ``fuzzy axis'' (black division line) through the center of the receptive field. Its rotation angle determines the alignment of the edge provoking maximal response.
figure=weights.eps,width=
Figure 7: For all receptive fields, typical post training boundaries between inhibitory and excitatory weights are shown. Distances between field centers are ``blown up'' to avoid confusion caused by overlaps.
figure=rotations.eps,width=

Figure 8: Bright (dark) circles represent positive (negative) weights. With nearly 200 input units per field and a symmetric arrangement of receptive field centers (essentially, on the circular boundary of each field there are 6 other field centers), the weight patterns generated by the system tend to be either off-center-on-surround-like (see figure) or on-center-off-surround-like.
\begin{figure}\centerline{\psfig{figure=ofcos.eps,width=15.5cm}}\end{figure}

Measuring information throughput. Unsupervised learning is occasionally switched off. Then the number $N$ of pairwise different output patterns in response to 5000 randomly generated input patterns is determined (the activation of each output unit is taken to be 0 if below 0.05, 1 if above 0.95, and 0.5 otherwise). The success rate is defined by $\frac{N}{5000}$. Clearly, a success rate close to 1.0 implies high information throughput.

Results. Figure 5 plots success rate against number of training pattern presentations. Results are shown for various pairs of predictor learning rates $\eta _P$ and code unit learning rates $\eta _C$. For instance, with $\eta _P$ close to 1.0 and $\eta _C$ being one or two orders of magnitude smaller, high success rates are obtained. Although the learning rates do have an influence on learning speed, the basic shapes of the learning curves are similar.

Edge detectors. With the set-up described above, to maximize information throughput, the system tends to create orientation sensitive edge detectors in an unsupervised manner. Weights corresponding to a typical receptive field (after 5000 pattern presentations) are shown in Figure 6. The connections are divided into two groups, one with inhibitory connections, the other one with excitatory connections. Both groups are separated by a ``fuzzy'' axis through the center of the receptive field. Its rotation angle determines the alignment of the edge provoking maximal response. In general, receptive fields of different code units exhibit different rotation angles. See Figure 7. Obviously, to represent the inputs in an informative but compact, efficient, redundancy-poor way, the creation of feature detectors specializing on certain rotated edges proves useful.

It is noticeable that in Figure 6 there appears to be a smooth gradient of weight strengths from strongly positive to strongly negative as one moves perpendicular to the positive/negative border. This is different from, e.g., MacKay and Miller (1990), where all weights tend to zero at the edge of the receptive field. At the moment, we don't have a good explanation for this effect.

On-center-off-surround / Off-center-on-surround. The nature of the receptive fields partly depends on receptive field size and degree of overlap. For instance, with nearly 200 input units per field and a more symmetric arrangement of receptive field centers (essentially, on the circular boundary of each field there are 6 other field centers), the system tends to generate another well-known kind of feature detector: the weight patterns become either on-center-off-surround-like or off-center-on-surround-like structures. See Figure 8 for an example.


next up previous
Next: DISCUSSION Up: SEMILINEAR PREDICTABILITY MINIMIZATION PRODUCES Previous: PREDICTABILITY MINIMIZATION: DETAILS
Juergen Schmidhuber 2003-02-17


Back to Independent Component Analysis page.