Physicists and economists and other inductive scientists
make predictions based on observations.
So does everybody in daily life.
Did you know that there is a theoretically
optimal
way of predicting?
Every
scientist should know about it.
Normally we do not know the true conditional probability distribution
p(next event | past). But assume we do know that p is in some set P of
distributions. Choose a fixed weight w_q for each q in P such that the
w_q add up to 1 (for simplicity, let P be countable). Then construct
the Bayesmix M(x) = Sum_q w_q q(x), and predict using M instead of the
optimal but unknown p.
How wrong is it to do that? The recent exciting work of
Marcus Hutter
(funded through Juergen Schmidhuber's SNF research grant "Unification of Universal
Induction and Sequential Decision Theory") provides general and sharp
loss bounds:
Let LM(n) and Lp(n) be the total expected losses of the M-predictor and
the p-predictor, respectively, for the first n events. Then LM(n)-Lp(n)
is at most of the order of sqrt[Lp(n)]. That is, M is not much worse
than p. And in general, no other predictor can do better than that!
In particular, if p is deterministic, then the M-predictor soon won't
make any errors any more!
If P contains ALL computable distributions, then M becomes the celebrated
enumerable universal prior. That is, after decades of somewhat stagnating
research we now have sharp loss bounds for
Ray Solomonoff's
universal (but incomputable) induction scheme (1964,
1978).
Alternatively, reduce M to what you get if you just add up weighted
estimated future finance data probabilities generated by 1000 commercial
stock-market prediction software packages. If only one of them happens
to work fine (but you do not know which) you still should get rich.
Note that the approach is much more general than what is normally done in
traditional statistical learning theory, where the often quite unrealistic
assumption is that the observations are statistically independent.