next up previous
Next: HOW TO USE THE Up: PREDICTIVE CODING WITH NEURAL Previous: INTRODUCTION

A PREDICTOR OF CONDITIONAL PROBABILITIES

Assume that the alphabet contains $k$ possible characters $z_1, z_2, \ldots, z_k$. The (local) representation of $z_i$ is a binary $k$-dimensional vector $r(z_i)$ with exactly one non-zero component (at the $i$-th position). $P$ has $nk$ input units and $k$ output units. $n$ is called the ``time-window size''. We insert $n$ default characters $z_0$ at the beginning of each file. The representation of the default character, $r(z_0)$, is the $k$-dimensional zero-vector. The $m$-th character of file $f$ (starting from the first default character) is called $c^f_m$.

For all $f \in F$ and all possible $m>n$, $P$ receives as an input

\begin{displaymath}
r(c^f_{m-n})
\circ
r(c^f_{m-n+1})
\circ
\ldots
\circ
r(c^f_{m-1}) ,
\end{displaymath} (1)

where $\circ$ is the concatenation operator for vectors. $P$ produces as an output $ P^f_m$, a $k$-dimensional output vector. Using back-propagation [6][2][3][4], $P$ is trained to minimize


\begin{displaymath}
\frac{1}{2}
\sum_{f \in F}
\sum_{m > n}
\mid \mid r(c^f_{m}) - P^f_m \mid \mid ^2.
\end{displaymath} (2)

Expression (2) is minimal if $ P^f_m$ always equals

\begin{displaymath}
E( r(c^f_{m}) \mid c^f_{m-n}, \ldots, c^f_{m-1}),
\end{displaymath} (3)

the conditional expectation of $r(c^f_{m})$, given $r(c^f_{m-n})
\circ
r(c^f_{m-n+1})
\circ
\ldots
\circ
r(c^f_{m-1})$. Due to the local character representation, this is equivalent to $(P^f_m)_i$ being equal to the conditional probability
\begin{displaymath}
Pr(c^f_m = z_i \mid c^f_{m-n}, \ldots, c^f_{m-1})
\end{displaymath} (4)

for all $f$ and for all appropriate $m>n$, where $(P^f_m)_j$ denotes the $j$-th component of the vector $ P^f_m$.

In general, the $(P^f_m)_i$ will not quite match the corresponding conditional probabilities. For normalization purposes, we define

\begin{displaymath}
P^f_m(i) = \frac {(P^f_m)_i}{\sum_{j=1}^k (P^f_m)_j }.
\end{displaymath} (5)

No normalization is used during training, however.


next up previous
Next: HOW TO USE THE Up: PREDICTIVE CODING WITH NEURAL Previous: INTRODUCTION
Juergen Schmidhuber 2003-02-25