next up previous
Next: Plugging ALS into the Up: Implementation 1: Plugging LS Previous: Levin Search (LS)

Adaptive Levin Search (ALS)

LS is not necessarily optimal for ``incremental'' learning problems where experience with previous problems may help to reduce future search costs. To make an incremental search method out of non-incremental LS, we introduce a simple, heuristic, adaptive LS extension (ALS) that uses experience with previous problems to adaptively modify LS' underlying probability distribution. ALS essentially works as follows: whenever LS found a program $q$ that computed a solution for the current problem, the probabilities of $q$'s instructions $q_1, q_2, \ldots, q_{l(q)}$ are increased (here $q_i \in \{b_1,\ldots,b_{n_{ops}}\}$ denotes $q$'s $i$-th instruction, and $l(q)$ denotes $q$'s length -- if LS did not find a solution ($q$ is the empty program), then $l(q)$ is defined to be 0). This will increase the probability of the entire program. The probability adjustment is controlled by a learning rate $\gamma$ ($0 < \gamma$ $< 1$). ALS is related to the linear reward-inaction algorithm, e.g., [#!Narendra:74!#,#!Kaelbling:93!#] -- the main difference is: ALS uses LS to search through program space as opposed to single action space. As in the previous section, the probability distribution $D_P$ is determined by $P$. Initially, all $P_{ij} = \frac{1}{n_{ops}}$. However, given a sequence of problems $(N_1, N_2, ..., N_r)$, the $P_{ij}$ may undergo changes caused by ALS:

ALS (problems $(N_1, N_2, ..., N_r)$, variable matrix $P$)

for $i$ := 1 to $r$ do:
$q$ := Levin search($N_i$, $P$); Adapt($q$, $P$).

where the procedure Adapt works as follows:

Adapt(program $q$, variable matrix $P$)

for $i$ := 1 to $l(q)$, $j$ := 1 to $n_{ops}$ do:
if ($q_i$ = $b_j$) then $P_{ij} := P_{ij} + \gamma (1 - P_{ij})$
else $P_{ij} := (1 - \gamma) P_{ij}$


next up previous
Next: Plugging ALS into the Up: Implementation 1: Plugging LS Previous: Levin Search (LS)
Juergen Schmidhuber 2003-02-25


Back to Optimal Universal Search page
Back to Reinforcement Learning page
Back to Program Evolution page