MARKET MODELS FOR MACHINE LEARNING -
REINFORCEMENT LEARNING ECONOMIES
Holland invented the first
reinforcement learning
(RL) economy (Proc. Intl. Conf. on
Genetic Algorithms, Hillsdale, NJ, 1985). His "bucket brigade
algorithm" for multiagent systems tries to
solve a given complex task as follows.
The world gives money to agents which happen to
execute the final step of a solution to the problem. In any given round,
agents can bid for and buy the right to act from other agents.
By acting they may achieve desirable subgoals
setting the stage for subsequently active agents.
Then they may sell the right to act to the
highest-bidding agents in the next round,
thus hopefully making a profit. Bankrupt agents are
removed and replaced by mutated ones endowed
with an initial amount of money. The entire
system learns in the sense that useful and profitable
specialists for subtasks survive.
Holland's economy suffered from
certain drawbacks though. For instance, there is no credit conservation
law --- money can be generated out of nothing.
This was overcome in 1987:
Pages 23-51 of ref [1] are devoted to a reinforcement learning
approach called "prototypical self-referential learning
mechanisms" (PSALM 1 - PSALM 3).
PSALMs use competing "metalearning" agents
with actions for generating and connecting agents and for assigning credit to
agents, subject to the constraint that total credit is conserved
(except for external reward and consumption).
Apparently this was the first credit-conserving RL economy.
Refs [2,3] describe a related but less general credit-conserving
RL economy of neurons. External reward
pays incoming weights of currently active output units. Active unit U's
outgoing weights to other active units pay to U's incoming weights (money
= weight substance). Competition stems from partitioning the set of units
into winner-take-all subsets.
Apparently this was the second credit-conserving RL economy.
- 4.
-
I. Kwee, M. Hutter, J. Schmidhuber.
Market-Based Reinforcement Learning in Partially Observable Worlds.
In G. Dorffner, H. Bischof, K. Hornik, eds.,
Proceedings of Int. Conf. on Artificial Neural Networks
ICANN'01, Vienna, LNCS 2130, pages 865-873, Springer, 2001.
- 3.
-
J. Schmidhuber.
A local learning algorithm for dynamic feedforward and
recurrent networks.
Connection Science, 1(4):403-412, 1989.
(The Neural Bucket Brigade - figures omitted!)
PDF.
HTML.
- 2.
- J. Schmidhuber.
The neural bucket brigade.
In R. Pfeifer, Z. Schreter,
Z. Fogelman, and L. Steels, editors,
Connectionism in Perspective, pages 439-446. Amsterdam: Elsevier,
North-Holland, 1989.
- 1.
- J. Schmidhuber.
Evolutionary principles in self-referential learning, or on learning
how to learn: The meta-meta-... hook. Diploma thesis,
Institut für Informatik, Technische Universität München, 1987.
HTML.
|