Abstract We present a combined machine learning and computer vision approach for robots to localize objects. It allows our iCub humanoid to quickly learn to provide accurate 3D position estimates (in the centimetre range) of objects seen. Biologically inspired approaches, such as Artificial Neural Networks (ANN) and Genetic Programming (GP), are trained to provide these position estimates using the two cameras and the joint encoder readings. No camera calibration or explicit knowledge of the robot’s kinematic model is needed. We find that ANN and GP are not just faster and have lower complexity than traditional techniques, but also learn without the need for extensive calibration procedures. In addition, the approach is localizing objects robustly, when placed in the robot’s workspace at arbitrary positions, even while the robot is moving its torso, head and eyes.