Theorems for EOMs and GTMs next up previous contents
Next: Tighter Bounds? Up: Probability vs Descriptive Complexity Previous: Probability vs Descriptive Complexity

Theorems for EOMs and GTMs

Theorem 5.3   For $x \in B^{\sharp}$ with PE(x) > 0,

\begin{displaymath}KP^E(x) - 1 \leq K^E(x) \leq KP^E(x) + K^E(KP^E(x)) + O(1).
\end{displaymath} (38)

Using $K^E(y) \leq log~y + 2log~log~y + O(1)$ for y interpreted as an integer -- compare Def. 2.8 -- this yields

\begin{displaymath}2^{-K^E(x)} < P^E(x) \leq O(2^{-K^E(x)})(K^E(x))^2.
\end{displaymath} (39)

That is, objects that are hard to describe (in the sense that they have only long enumerating descriptions) have low probability.

Proof. The left-hand inequality follows by definition. To show the right-hand side, one can build an EOM T that computes $x \in B^{\sharp}$ using not more than KPE(x) + KT(KPE(x)) + O(1) input bits in a way inspired by Huffman-Coding [#!Huffman:52!#]. The claim then follows from the Invariance Theorem. The trick is to arrange T's computation such that T's output converges yet never needs to decrease lexicographically. T works as follows:

(A) Emulate UE to construct a real enumerable number 0.s encoded as a self-delimiting input program r, simultaneously run all (possibly forever running) programs on UE dovetail style; whenever the output of a prefix q of any running program starts with some $x \in B^*$ for the first time, set variable V(x) := V(x) + 2-l(q) (if no program has ever created output starting with x then first create V(x) initialized by 0); whenever the output of some extension q' of q (obtained by possibly reading additional input bits: q'=q if none are read) lexicographically increases such that it does not equal x any more, set V(x) := V(x) - 2-l(q').

(B) Simultaneously, starting at the right end of the unit interval [0,1), as the V(x) are being updated, keep updating a chain of disjoint, consecutive, adjacent, half-open (at the right end) intervals IV(x) = [LV(x), RV(x)) of size V(x) = RV(x) - LV(x) in alphabetic order on x, such that the right end of the IV(x) of the largest x coincides with the right end of [0,1), and IV(y) is to the right of IV(x) if $y \succ x$. After every variable update and each change of s, replace the output of T by the x of the IV(x) with $0.s \in IV(x)$.

This will never violate the EOM constraints: the enumerable s cannot shrink, and since EOM outputs cannot decrease lexicographically, the interval boundaries RV(x) and LV(x) cannot grow (their negations are enumerable, compare Lemma 4.1), hence T's output cannot decrease.

For $x \in B^*$ the IV(x) converge towards an interval I(x) of size PE(x). For $x \in B^{\infty}$ with PE(x) > 0, we have: for any $\epsilon > 0$ there is a time t0 such that for all time steps t>t0 in T's computation, an interval $I_{\epsilon}(x)$ of size $P^E(x) -
\epsilon$ will be completely covered by certain IV(y) satisfying $x
\succ y$ and $0.x - 0.y < \epsilon$. So for $\epsilon \rightarrow 0$ the $I_{\epsilon}(x)$ also converge towards an interval I(x) of size PE(x). Hence T will output larger and larger y approximating x from below, provided $0.s \in I(x)$.

Since any interval of size c within [0,1) contains a number 0.z with l(z) = -lg c, in both cases there is a number 0.s (encodable by some r satisfying $r \leq l(s) + K_T(l(s)) + O(1))$) with l(s) = -lg PE(x) + O(1), such that $T(r) \leadsto x$, and therefore $K_T(x)
\leq l(s) + K_T(l(s)) + O(1)$. $\Box$

Less symmetric statements can also be derived in very similar fashion:

Theorem 5.4   Let TM T induce approximable CPT(x) for all $x \in B^*$ (compare Defs. 4.10 and 4.12; an EOM would be a special case). Then for $x \in B^{\sharp}$, PT(x) > 0:

 \begin{displaymath}K^G(x) \leq KP_T(x) + K^G(KP_T(x)) + O(1).
\end{displaymath} (40)

Proof. Modify the proof of Theorem 5.3 for approximable as opposed to enumerable interval boundaries and approximable 0.s. $\Box$

A similar proof, but without the complication for the case $x \in B^{\infty}$, yields:

Theorem 5.5   Let $\mu$ denote an approximable semimeasure on $x \in B^*$; that is, $\mu(x)$ is describable. Then for $\mu(x) > 0$:

\begin{displaymath}Km^G(x) \leq K\mu(x) + Km^G(K\mu(x)) + O(1);
\end{displaymath} (41)


\begin{displaymath}K^G(x) \leq K\bar{\mu}(x) + K^G(K\bar{\mu}(x)) + O(1).
\end{displaymath} (42)

As a consequence,

 \begin{displaymath}\frac{\mu(x)}
{K\mu(x)log^2K\mu(x)}
\leq O(2^{-Km^G(x)}); ...
...(x)}
{K\bar{\mu}(x)log^2K\bar{\mu}(x)}
\leq O(2^{-K^G(x)}).
\end{displaymath} (43)

Proof. Initialize variables $V_{\lambda} := 1$ and $IV_{\lambda} := [0,1)$. Dovetailing over all $x \succ \lambda$, approximate the GTM-computable $\bar{\mu}(x) = \mu(x)-\mu(x0) -\mu(x1)$ in variables Vx initialized by zero, and create a chain of adjacent intervals IVx analogously to the proof of Theorem 5.3.

The IVx converge against intervals Ix of size $\bar{\mu}(x)$. Hence x is GTM-encodable by any program r producing an output s with $0.s \in I_x$: after every update, replace the GTM's output by the x of the IVx with $0.s \in IV_x$. Similarly, if 0.s is in the union of adjacent intervals Iy of strings y starting with x, then the GTM's output will converge towards some string starting with x. The rest follows in a way similar to the one described in the final paragraph of the proof of Theorem 5.3. $\Box$

Using the basic ideas in the proofs of Theorem 5.3 and 5.5 in conjunction with Corollary 4.3 and Lemma 4.2, one can also obtain statements such as:

Theorem 5.6   Let $\mu_0$ denote the universal CEM from Theorem 4.1. For $x \in B^*$,

 \begin{displaymath}K\mu_0(x) - O(1) \leq Km^E(x) \leq K\mu_0(x) + Km^E(K\mu_0(x)) + O(1)
\end{displaymath} (44)

While PE dominates PM and PG dominates PE, the reverse statements are not true. In fact, given the results from Sections 3.2 and 5, one can now make claims such as the following ones:

Corollary 5.1   The following functions are unbounded:

\begin{displaymath}\frac{\mu^E(x)}{\mu^M(x)}; ~~
\frac{P^E(x)}{P^M(x)}; ~~
\frac{P^G(x)}{P^E(x)}.
\end{displaymath}

Proof. For the cases $\mu^E$ and PE, apply Theorems 5.2, 5.6 and the unboundedness of (12). For the case PG, apply Theorems 3.3 and 5.3.


next up previous contents
Next: Tighter Bounds? Up: Probability vs Descriptive Complexity Previous: Probability vs Descriptive Complexity
Juergen Schmidhuber
2001-01-09


Related links: In the beginning was the code! - Zuse's thesis - Life, the universe, and everything - Generalized Algorithmic Information - Speed Prior - The New AI