next up previous
Next: ARITHMETIC CODING Up: HOW TO USE THE Previous: HOW TO USE THE

HUFFMAN CODING

With a given probability distribution on a set of possible characters, Huffman Coding (e.g. [1]) encodes characters by bitstrings as follows.

Characters are terminal nodes of a binary tree to be built in an incremental fashion. The probability of a terminal node is defined as the probability of the corresponding character. The probability of a non-terminal node is defined as the sum of the probabilities of its sons. Starting from the terminal nodes, a binary tree is built as follows:

Repeat as long as possible:

Among those nodes that are not children of any non-terminal nodes created earlier, pick two with lowest associated probabilities. Make them the two sons of a newly generated non-terminal node.

The branch to the ``left'' son of each non-terminal node is labeled by a 0. The branch to its ``right'' son is labeled by a 1. The code of a character $c$, $code(c)$, is the bitstring obtained by following the path from the root to the corresponding node. Obviously, if $c \neq d$, then $code(c)$ cannot be the prefix of $code(d)$. This makes the code uniquely decipherable.

Characters with high associated probability are encoded by short bitstrings. Characters with low associated probability are encoded by long bitstrings. Huffman Coding guarantees minimal expected code length, provided all character probabilities are integer powers of $\frac{1}{2}$.


next up previous
Next: ARITHMETIC CODING Up: HOW TO USE THE Previous: HOW TO USE THE
Juergen Schmidhuber 2003-02-25