One day there will be humanoid robots among us doing our boring, time-consuming, or dangerous tasks. They might cook a delicious meal for us or do the groceries. For this to become reality, many advances need to be made to the artificial intelligence of humanoid robots. The ever-increasing available computational processing power opens new doors for such advances. In this thesis we develop novel algorithms for humanoid control and vision that harness this power. We apply these methods on an iCub humanoid upper-body with 41 degrees of freedom.
For control, we develop Natural Gradient Inverse Kinematics (NGIK), a sampling-based optimiser that applies natural evolution strategies to perform inverse kinematics. The resulting algorithm makes very few assumptions and gives much more freedom in definable constraints than its Jacobian-based counterparts. A special graph-building procedure is introduced to build Task-Relevant Roadmaps (TRM) by iteratively applying NGIK and storing the results. TRMs form searchable graphs of kinematic configurations on which a wide range of task-relevant humanoid movements can be planned. Through coordinating several instances of NGIK, a fast parallelised version of the TRM building algorithm is developed. To contrast the offline TRM algorithms, we also develop Natural Gradient Control which directly uses the optimisation pass in NGIK as an online control signal.
For vision, we develop dynamic vision algorithms that form cyclic information flows that affect their own processing. Deep Attention Selective Networks (dasNet) implement feedback in convolutional neural networks through a gating mechanism that is steered by a policy. Through this feedback, dasNet can focus on different features in the image in light of previously gathered information and improve classification, with state-of-the-art results at the time of publication. Then, we develop PyraMiD-LSTM, which processes 3D volumetric data by employing a novel convolutional Long Short-Term Memory network (C-LSTM) to compute pyramidal contexts for every voxel, and combine them to perform segmentation. This resulted in state-of-the-art performance on a segmentation benchmark.
The work on control and vision is integrated into an application on the iCub robot. A Fast-Weight PyraMiD-LSTM is developed that dynamically generates weights for a C-LSTM layer given actions of the robot. An explorative policy using NGC generates a stream of data, which the Fast-Weight PyraMiD-LSTM has to predict. The resulting integrated system learns to model the effects of head and hand movements and their effects on future visual input. To our knowledge, this is the first effective visual prediction system on an iCub.