next up previous
Next: Basic Cycle Modification Up: Semantics of Instruction Heads Previous: SSA-enabling Instructions

Primitive Learning Algorithms

All the module-modifying LIs below save the module columns they change on a stack. The first LEFT-modifying instruction executed after some EnableSSALEFT() instruction will start a new block of LEFT-modifications, to be ended by the next EnableSSALEFT(). Analogously for RIGHT.


1. Call SSALEFT().

2. If there is not already some $(x,.)$ entry above the most recent (checkpoint, $RL$(checkpoint)) pair in Stack${\sc Left }$, then push the pair $(x, {\sc Left}_x)$ onto Stack${\sc Left }$.

3. Set $ {\sc Left}_{x,k} \leftarrow \lambda {\sc
Left}_{x,k}$ $\forall k \neq y_1.$ Set ${\sc Left}_{x,y_1}
\leftarrow 1 - \lambda (1 - {\sc Left}_{x,y_1})$.

Here $ 0 < \lambda < 1$ is a real-valued constant. In the experiments I arbitrarily use $\lambda = 0.3$. Note that step 3. includes renormalization.

IncProbLEFT has no effect though if the corresponding module modifications would lead to at least one LEFT value below MinProb, a small positive real value (I use MinProb $=0.004$).

Comment: IncProbLEFT is an LI that permits LEFT to modify itself (provided RIGHT agrees). IncProbLEFT instructions may be used in conjunction with other instructions to form complex probabilistic learning algorithms (running between subsequent checkpoints).

DecProbLEFT($x_1,x_2,y_1$): like IncProbLEFT, but step 3. is different:

3. ${\sc Left}_{x,d} \leftarrow \lambda {\sc
Left}_{x,y_1}$; $\forall k \neq y_1: {\sc Left}_{x,k} \leftarrow
\frac{1 - \lambda {\sc Left}_{x,k} }{1 - {\sc Left}_{x,k}} {\sc

MoveDistLEFT( $x_1,x_2,y_1,y_2$): like IncProbLEFT, but step 3. is different:

3. ${\sc Left}_{x} \leftarrow {\sc Left}_{y}$.

IncProbRIGHT($x_1,x_2,y_1$): analogous to IncProbLEFT.

DecProbRIGHT($x_1,x_2,y_1$): analogous to DecProbLEFT.

MoveDistRIGHT( $x_1,x_2,y_1,y_2$): analogous to MoveDistLEFT.

IncProbBOTH($x_1,x_2,y_1$): call IncProbLEFT and IncProbRIGHT in random order.

DecProbBOTH($x_1,x_2,y_1$): call DecProbLEFT and DecProbRIGHT in random order.

SSAandCopy($x_1$): If $x_1 \geq 5$ then exit (the value 5 is chosen arbitrarily). Set BlockSSALEFT and BlockSSARIGHT $=$ FALSE. Call SSALEFT() and SSARIGHT() in random order. Test if one of the modules has received more reward per time (since the most recent checkpoint still in its stack) than the other. If so:

Find those columns in the superior module that differ from the corresponding columns in the ``loser.'' (In my implementation this is done efficiently by using a separate stack and a marker array tracing module differences as they occur.) Push the loser's different columns onto the loser's stack (just like with IncProbLEFT and all other LIs). Then copy the winner's different columns onto the loser's.

Comment: the LI SSAandCopy() allows for ending ``unfair'' matches in the case one module consistently outperforms the other. SSAandCopy will make both modules identical, although their stacks will in general be quite different and reflect quite different histories of successful module modifications.

next up previous
Next: Basic Cycle Modification Up: Semantics of Instruction Heads Previous: SSA-enabling Instructions
Juergen Schmidhuber 2003-03-10

Back to Active Learning - Exploration - Curiosity page