Reinforcement Learning Economies


Holland invented the first reinforcement learning (RL) economy (Proc. Intl. Conf. on Genetic Algorithms, Hillsdale, NJ, 1985). His "bucket brigade algorithm" for multiagent systems tries to solve a given complex task as follows. The world gives money to agents which happen to execute the final step of a solution to the problem. In any given round, agents can bid for and buy the right to act from other agents. By acting they may achieve desirable subgoals setting the stage for subsequently active agents. Then they may sell the right to act to the highest-bidding agents in the next round, thus hopefully making a profit. Bankrupt agents are removed and replaced by mutated ones endowed with an initial amount of money. The entire system learns in the sense that useful and profitable specialists for subtasks survive.
Holland's economy suffered from certain drawbacks though. For instance, there is no credit conservation law --- money can be generated out of nothing. This was overcome in 1987: Pages 23-51 of ref [1] are devoted to a reinforcement learning approach called "prototypical self-referential learning mechanisms" (PSALM 1 - PSALM 3). PSALMs use competing "metalearning" agents with actions for generating and connecting agents and for assigning credit to agents, subject to the constraint that total credit is conserved (except for external reward and consumption). Apparently this was the first credit-conserving RL economy.

Refs [2,3] describe a related but less general credit-conserving RL economy of neurons. External reward pays incoming weights of currently active output units. Active unit U's outgoing weights to other active units pay to U's incoming weights (money = weight substance). Competition stems from partitioning the set of units into winner-take-all subsets. Apparently this was the second credit-conserving RL economy.

I. Kwee, M. Hutter, J. Schmidhuber. Market-Based Reinforcement Learning in Partially Observable Worlds. In G. Dorffner, H. Bischof, K. Hornik, eds., Proceedings of Int. Conf. on Artificial Neural Networks ICANN'01, Vienna, LNCS 2130, pages 865-873, Springer, 2001.

J. Schmidhuber. A local learning algorithm for dynamic feedforward and recurrent networks. Connection Science, 1(4):403-412, 1989. (The Neural Bucket Brigade - figures omitted!) PDF. HTML.

J.  Schmidhuber. The neural bucket brigade. In R. Pfeifer, Z. Schreter, Z. Fogelman, and L. Steels, editors, Connectionism in Perspective, pages 439-446. Amsterdam: Elsevier, North-Holland, 1989.

J.  Schmidhuber. Evolutionary principles in self-referential learning, or on learning how to learn: The meta-meta-... hook. Diploma thesis, Institut für Informatik, Technische Universität München, 1987. HTML.

Back to