E. COMPARISON OF METHODS 1, 2, 3

Next: IV. SIMULATIONS Up: III. OFF-LINE METHODS Previous: Application to Text Compression

E. COMPARISON OF METHODS 1, 2, 3

With a given probability distribution on the characters, Huffman Coding guarantees minimal expected code length, provided all character probabilities are integer powers of $\frac{1}{2}$ . In general, however, Arithmetic Coding works slightly better than Huffman Coding. For sufficiently long messages, Arithmetic Coding achieves expected code lengths arbitrarily close to the information-theoretic lower bound. This is true even if the character probabilities are not powers of $\frac{1}{2}$ (see e.g. [5]).

METHOD 3 is of interest if typical files contain long sequences of predictable characters. Among the methods above, it is the only one that explicitly encodes strings of characters (as opposed to single characters). It does not make use of all the information about conditional probabilities, however.

Once the current conditional probability distribution is known, the computational complexity of Huffman Coding is . The computational complexity of Arithmetic Coding is . So is the computational complexity of METHOD 3. In practical applications, however, the computational effort required for all three variants is negligible in comparison to the effort required for the predictor updates.

Next: IV. SIMULATIONS Up: III. OFF-LINE METHODS Previous: Application to Text Compression

Juergen Schmidhuber 2003-02-19