next up previous
Next: EIRA FOR ALS Up: Solving POMDPs with Levin Previous: LEVIN SEARCH (LS)


As mentioned above, LS is not necessarily optimal for ``incremental'' learning problems where experience with previous problems may help to reduce future search costs. To make an incremental search method out of non-incremental LS, we introduce a simple, heuristic, adaptive LS extension (ALS) that uses experience with previous problems to adaptively modify LS' underlying probability distribution. ALS essentially works as follows: whenever LS found a program $q$ that computed a solution for the current problem, the probabilities of $q$'s instructions $q_1, q_2, \ldots, q_{l(q)}$ are increased (here $q_i \in \{p_1,\ldots,p_r\}$ denotes $q$'s $i$-th instruction, and $l(q)$ denotes $q$'s length -- if LS did not find a solution ($q$ is the empty program), then $l(q)$ is defined to be 0). The probability adjustment is controlled by a learning rate $\gamma$ ($0 < \gamma$ $< 1$). ALS is related to the linear reward-inaction algorithm (e.g., [Kaelbling, 1993]) -- the main difference is: ALS uses LS to search through program space as opposed to single action space. As in section 2, the probability distribution $P_M$ is determined by $M$. Initially, all $M_{ij} = \frac{1}{r}$. However, given a sequence of problems $(N_1, N_2, ..., N_k)$, the $M_{ij}$ may undergo changes caused by ALS:

ALS (problems $(N_1, N_2, ..., N_k)$, variable matrix $M$)

for $i$ := 1 to $k$ do:
$q$ := Levin search($N_i$, $M$); Adapt($q$, $M$).

where the procedure Adapt works as follows:

Adapt(program $q$, variable matrix $M$)

for $i$ := 1 to $l(q)$, $j$ := 1 to $r$ do:
if ($q_i$ = $p_j$) then $M_{ij} := M_{ij} + \gamma (1 - M_{ij})$
else $M_{ij} := (1 - \gamma) M_{ij}$

Critique of adaptive LS. Although ALS seems a reasonable first step towards making LS adaptive (and actually leads to very nice experimental results -- see section 5), there is no theoretical proof that it will generate only probability modifications that will speed up the process of finding solutions to new tasks - sometimes ALS may produce harmful instead of beneficial results. To address this issue, in the next section we augment ALS by a recent backtracking technique called ``Environment-Independent Reinforcement Acceleration'' (EIRA). EIRA ensures that the system will keep only probability modifications representing a lifelong history of performance improvements.

next up previous
Next: EIRA FOR ALS Up: Solving POMDPs with Levin Previous: LEVIN SEARCH (LS)
Juergen Schmidhuber 2003-02-25

Back to Optimal Universal Search page
Back to Reinforcement Learning page
Back to Program Evolution page