Developmental Robotics and Artificial Curiosity

Developmental Robotics. A developmental program is embodied in a robot, enabling it to learn by interacting with its environment. Developmental programs should be simple, general, and potentially powerful, driven by the (formally verified) insight that it is more likely for a simpler solution to a problem to be able to generalize well to slightly different problems compared to a complex solution. Instead of progamming every minute detail, some (ideally) basic (but, practically, much more) structure is provided, with which the robot can improve upon by learning. Very complex (and very impressive) robot demonstrations can break when the environment changes just a little bit. More intelligent robots could adapt to a wide range of environments, by ``absorbing structure'' from its interactions with its environment.

Artificial Curiosity. Developmental algorithms must be capable of compressing massive amounts of sensory and motor data into a useful form. The robot collects the data incrementally, often through its own actions. It is useful for the robot to have a drive to find the best data for its developmental program, i.e., which enables it to learn the quickest. The robot should be curious. Curiosity-driven agents use reinforcement learning to quickly adapt control policies to maximize intrinsic reward, which is a measurable improvement in the compressor or predictor or world-model (A brief review of artificial curiosity).


Our Katana robot arm curiously plays with wooden blocks, using vision, reaching, and grasping. It is intrinsically motivated to explore its world. As a by-product, it learns how to place blocks stably, and how to stack blocks.

The Upper Confidence Weighted Learning (UCWL) framework calculates intrinsic rewards through estimating the confidence intervals of the agent's predictions, which allows for efficient exploration in human-robot interaction scenarios with binary feedback.

Curiosity-Driven Modular Incremental Slow Feature Analysis (CD-MISFA) is a hierarchical curiosity-driven learning system that learns multiple abstract slow-feature based representations (in order of their complexity) from a robot's high-dimensional visual input stream. Such abstractions encode raw pixel inputs in a useful manner for the robot to learn skills, in a manner inspired by continual learning.

Planning to achieve intrinsic reward with ``perceptual'' and ``cognitive'' learning. While external reward signals are typically stationary, intrinsic signals based on artificial curiosity change rapidly. Model-based RL using intrinsic rewards adapts the state values quickly enough to respond effectively to an ever-changing intrinsic reward landscape.

Neuroevolution is combined with an unsupervised sensory pre-processor or compressor that is trained on images generated from the environment by the population of evolving recurrent neural network controllers. The compressor not only reduces the input dimension for the controllers, but also biases the search toward novel controllers by rewarding those controllers that discover images that it reconstructs poorly.