next up previous
Next: SIMULATION Up: EXAMPLE 1: Sequence Classification Previous: SIMULATION

THE DIFFICULT TASK

The information conveyed by the sequence source (the automaton above) can be measured in terms of the entropy of the probability distribution of the possible sequences: $ -\sum_{all~possible~sequences~p} P(p)logP(p)$. Our next step will be to modify the automaton such that its entropy remains the same but the redundancy among the sequence components increases. All transitions of the form $i \rightarrow^x j$ (read: ``go from state $i$ to state $j$ and emit symbol $x$'') are replaced by a subautomaton consisting of the deterministic sequence of transitions $i \rightarrow^x i_2 \rightarrow^x i_3
\rightarrow^x \ldots \rightarrow^x i_{20} \rightarrow^x j$. The modified automaton generates redundant sequences like this one (from class 2):

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaeeeeeeeeeeeeeeeeeeee

ccccccccccccccccccccccccccccccccccccccccdddddddddddddddddddd.

This sequence is redundant because many of its components are predictable from other components.



Subsections

Juergen Schmidhuber 2003-02-19