Hung Ngo

Deep Learning of Behavioral Hierarchy for Autonomous Visual Navigation

DeepSkill Demo

During our lifetimes, humans never stop learning new, increasingly complex skills by continually refining and building upon prior abilities. One of the ultimate goals of Artificial Intelligence is thus to build adaptive agents, including robots, that can learn skills in a similar way via autonomous lifetime activity. An intelligent agent should be able to progressively gain various skills in their complex environments. This concerns at least two important issues: how to equip an agent with intrinsic motivation-a curiosity drive-to efficiently explore its environment and learn the way the world works, and how to autonomously construct, retain, refine, and master novel, complex skills.

We develop a novel deep, sparse, unsupervised learning framework that is capable of progressively constructing a behavioral hierarchy from sensorimotor experience of a curious agent. Similar to existing work in deep Reinforecement Learning (RL), we combine deep learning on sensory data for basis construction, with reinforcement learning for control. However, the novelty of our approach lies in the mechanism of self-generating goals, which exploits the environment structure and regularities.

In the current implementation, a sparse coding layer is stacked on top of a deep, unsupervised feature learning module (whose outputs also serve as bases of our RL modules) to produce a set of sparse coding features (SCFs). The key insight is that each SCF is highly selective to the input, encoding a particular structure/regularity of the input space that we can exploit. We use the response of each SCF as an intrinsic reward signal for the associated RL module---called curion. Each curion thus learns to navigate in the representation space (i.e., also in the physical environment) so as to reach a particular "landmark" encoded by the maximum activation of the associated SCF. All the curions learn simultaneously from a single curious exploration behavior, thanks to the recent convergent off-policy TD-learning methods.

This video demonstrates our initial results in an autonomous visual navigation task: The agent exhibits a set of skills for reaching specific locations, each encoded by a specific controller, after initial phase of exploration (for representation learning and autonomous skill discovery, not shown). The bottom image in the video shows the responses of all 32 curions tiled in a rectangle, with reddish color corresponds to higher SCF activation, hence higher probability that the agent is at the location encoded by the associated SCF. Note that all the curions use the same feature bases learned from deep temporal coherence network after exploration phase.

Last modified Feb. 27, 2014.