SSA: Exemplary Application
Stochastic self-modifying policy with actions that can edit the policy itself: metalearning
Stochastic multiagent environment
A priori unknown reward delays up to 300,000 time steps.
Back to J. Schmidhuber's Metalearning page