Test problem 2: partially observable, multi-mode pole balancing
State of the environment:
- Observation:
- must be learned
- 1st second of episode (50 it.): “mode of operation”
- mode A: action 1 is left, action 2 is right
- mode B: action 2 is left, action 1 is right
- Requires combination of continuous & discrete internal state, and to remember “mode of operation” indefinitely
Back to J. Schmidhuber's Recurrent neural network page