Next: THE BASIC PRINCIPLE AND Up: LEARNING FACTORIAL CODES BY Previous: INTRODUCTION

FORMULATING THE PROBLEM

Let us assume

different adaptive input processing representational modules which see a single input at a time. The output of each module can be implemented as a set of neuron-like units. Throughout this paper I focus on the simplest case: One output unit (also called a representational unit) per module. The

-th module (or unit) produces an output value $y^p_i \in [0, 1]$ in response to the current external input vector

. In what follows,

denotes the probability of event

, $P(A \mid B)$ denotes the conditional probability of event

given

, $\bar{y_i}$ denotes the mean of the activations of unit

, and

denotes the expectation operator.

The methods described in this paper are primarily devoted to finding binary or at least quasi-binary codes. Each code symbol participating in a quasi-binary code is either 0 or 1 in response to a given input pattern or emits a constant value in response to every input pattern. Therefore, binary codes are a special case of quasi-binary codes. Most of our quasi-binary codes will be created by starting out from real-valued codes.

Recall that there are three criteria that a binary factorial code must fulfill:

1. The binary criterion: Each code-symbol should be either 1 or 0 in response to a given input pattern.

2. The invertibility criterion: It must be possible to reconstruct the input from the code. In cases where the environment is too complex (or too noisy) to be fully coded into limited internal representations (i.e., in the case of binary codes where there are more than $2^{dim(y)}$ input patterns), we want to relax the invertibility criterion. In that case, we still want the internal representations to convey maximal information about the inputs. The focus of this paper, however, is on situations like the ones studied in (Barlow et. al, 1989): Noise-free environments and sufficient representational capacity in the representational units. In the latter case, reversibility is equivalent to Infomax à la Linsker (1988).

3. The independence criterion: The occurrence of each code symbol ought to be independent of all other code symbols. If the binary criterion is fulfilled, then we may rewrite the independence criterion by requiring that

$\begin{displaymath} E(y_i \mid \{y_k, k \neq i \}) =P(y_i = 1 \mid \{y_k, k \neq i \}) = P(y_i = 1) = E(y_i). \end{displaymath}$

The latter condition implies that

does not depend on $\{y_k, k \neq i \}$ . In other words, $E(y_i \mid \{y_k, k \neq i \})$ is computable from a constant. Note that with real-valued codes the criterion $E(y_i \mid \{y_k, k \neq i \}) = E(y_i)$ does not necessarily imply that the

are independent.

Next: THE BASIC PRINCIPLE AND Up: LEARNING FACTORIAL CODES BY Previous: INTRODUCTION

Juergen Schmidhuber 2003-02-13

Back to Independent Component Analysis page.