next up previous
Next: HOW TO ``UNCOMPRESS'' DATA Up: HOW TO USE THE Previous: HUFFMAN CODING

ARITHMETIC CODING

In general, Arithmetic Coding works slightly better than Huffman Coding. For sufficiently long messages, Arithmetic Coding achieves expected code lenghts arbitrarily close to the information-theoretic lower bound. This is true even if the character probabilities are not powers of $\frac{1}{2}$ (see e.g. [7]).

The basic idea of Arithmetic Coding is: a message is encoded by an interval of real numbers from the unit interval $[0, 1[$. The output of Arithmetic Coding is a binary representation of the boundaries of the corresponding interval. This binary representation is incrementally generated during message processing. Starting with the unit interval, for each observed character the interval is made smaller, essentially in proportion to the probability of the character. A message with low information content (and high corresponding probability) is encoded by a comparatively large interval whose precise boundaries can be specified with comparatively few bits. A message with a lot of information content (and low corresponding probability) is encoded by a comparatively small interval whose boundaries require comparatively many bits to be specified.

Although the basic idea is elegant and simple, additional technical considerations are necessary to make Arithmetic Coding practicable. See [7] for details.

Neither Huffman Coding nor Arithmetic Coding requires that the probability distribution on the characters remains fixed. This allows for using ``time-varying'' conditional probability distributions as generated by the neural predictor.


next up previous
Next: HOW TO ``UNCOMPRESS'' DATA Up: HOW TO USE THE Previous: HUFFMAN CODING
Juergen Schmidhuber 2003-02-25