next up previous
Next: SIMULATIONS Up: EXAMPLE 2: Text Compression Previous: PREDICTING CONDITIONAL PROBABILITIES

USING THE PREDICTOR FOR COMPRESSION

With the help of a copy of $P$, an unknown file $f$ can be compressed as follows: Again, $n$ default characters are inserted at the beginning. For each character $c^f_m~(m>n)$, the predictor emits its output $ P^f_m$ based on the $n$ previous characters. There will be a $k$ such that $c^f_m = z_k$. The estimate of $ P(c^f_m = z_k \mid c^f_{m-n}, \ldots, c^f_{m-1})$ is given by $P^f_m(k)$. The code of $c^f_m$, the bitstring $code(c^f_m)$, is generated by feeding the probability estimates of the predictor into the Arithmetic Coding algorithm (see e.g. [41]). $code(c^f_m)$ is written into the compressed file.

The information in the compressed file is sufficient for reconstructing the original file. This is done with the ``uncompress'' algorithm, which works as follows: Again, for each character $c^f_m~(m>n)$, the predictor (sequentially) emits its output $ P^f_m$ based on the $n$ previous characters, where the $c^f_l$ with $n < l < m$ were gained sequentially by feeding the approximations $P^f_l(k)$ of the probabilities $ P(c^f_l = z_k \mid c^f_{l-n}, \ldots, c^f_{l-1})$ into the inverse Arithmetic Coding procedure ( e.g. [41]). The latter is able to correctly decode $c^f_l$ from $code(c^f_l)$. In other words, to correctly decode some character, we first need to decode all previous characters.


next up previous
Next: SIMULATIONS Up: EXAMPLE 2: Text Compression Previous: PREDICTING CONDITIONAL PROBABILITIES
Juergen Schmidhuber 2003-02-19