People Detection and Tracking

Open source implementation for RGBD-based people detection and tracking from small-footprint ground robots

Download »


Small-footprint mobile ground robots, such as the popular Turtlebot and Kobuki platforms, are by necessity equipped with sensors which lie close to the ground. Reliably detecting and tracking people from this viewpoint is a challenging problem, whose solution is a key requirement for many applications involving sharing of common spaces and close human-robot interaction. Here you can find a robust solution for cluttered indoor environments, using an inexpensive RGB-D sensor such as the Microsoft Kinect or Asus Xtion. A MATLAB real-time ROS-enabled implementation and evaluation datasets are available on this webpage.


Please refer to the following publications describing our system.

Kinect-based People Detection and Tracking from Small-Footprint Ground Robots
A. Pesenti Gritti, O. Tarabini, J. Guzzi, G. A. Di Caro, V. Caglioti, L. M. Gambardella, A. Giusti
In Proc. International Conference on Intelligent Robots and Systems (IROS) 2014. [preprint PDF]

  booktitle={Proc. International Conference on Intelligent Robots and Systems (IROS) 2014},
  title={Kinect-based People Detection and Tracking from Small-Footprint Ground Robots},
  author={Armando Pesenti Gritti and Oscar Tarabini and Jerome Guzzi and Gianni A. Di Caro and Vincenzo Caglioti and Luca M. Gambardella and Alessandro Giusti}

Video: Perceiving People from a Low-Lying Viewpoint
A. Pesenti Gritti, O. Tarabini, A. Giusti, J. Guzzi, G. A. Di Caro, V. Caglioti, L. M. Gambardella
In Proc. Human Robot Interaction (HRI) 2014. [1-page abstract PDF (preprint)]


  booktitle={Proc. Human Robot Interaction (HRI) 2014},
  title={Video: Perceiving People from a Low-lying Viewpoint},
  author={Armando Pesenti Gritti and Oscar Tarabini and Alessandro Giusti and Jerome Guzzi and Gianni A. Di Caro and Vincenzo Caglioti and Luca M. Gambardella}

Getting Started

The system is implemented in MATLAB, with the most computationally expensive tasks written as mex functions able to exploit multi-core CPUs thanks to OpenMP support.



The implementation has been tested under Mac OSX and Ubuntu Linux. In order to build and use the system, the following are required:


(Note: For some linux distribution you may have linking problems with libstdc++, that will result in an error message when running the code: in this case, force matlab to compile using libstdc++ of your system and not its own version. One way to do so is to temporarily make the symbolic link in MATLABDIR/sys/os/ARCH/* to point to the system libstdc++ (typically under /usr/lib/)).

Up to this stage you can use the system:

If you want to integrate the system in a ROS environment, reading sensor data on the topics "/camera/depth_registered/image_raw" and "/camera/rgb/image_raw", reading odometry data on the topic "/odom" and publishing the traked people information on the topic "/people" with the custom message type "people_msg/People", you need to perform the following additional steps:

(Note: the ROS MATLAB BRIDGE used by our system needs an updated version of the "google-collect.jar" library. It's necessary to replace the file MATLABDIR/java/jarext/google-collect.jar with the file that can be downloaded here, renaming it from "guava-13.0.1.jar" to "google-collect.jar" and copying it to MATLABDIR/java/jarext/ directory).


The files in the directory examples contain detailed explanations about the usage of the system with the various source types (live, recorded videos, ROS). To obtain more information about a particular function use the MATLAB command help.

Testing Datasets

Three testing datasets with associated ground truth are available as supplementary material, to promote quantitative comparisons with future systems.
The testing scenarios are the following:

We manually labelled each single frame, by marking every visible leg and indicating which person it belonged to. The resulting ground truth contains for each visible leg an unique person id to which it belongs and its x y z coordinates in the sensor reference system (in millimeters). The id assigned to a person in the video remains the same throughout the whole length of the video.
An example of CSV ground truth representation is the following:

Frame 1
Frame 2
Frame 3

In the first frame two legs of person 1 are visibile, in the second frame no leg is in the field of view and finally in the third frame a leg is visible for person 1 and one for person 2.
A ".mat" MATLAB file is also available, containing a structured version of the same data found in the CSV.

Scenario .oni video CSV ground truth .mat ground truth
S-Easy S-Easy.oni S-Easy.csv S-Easy.mat
S-Medium S-Medium.oni S-Medium.csv S-Medium.mat
S-Difficult S-Difficult.oni S-Difficult.csv S-Difficult.mat

In the following videos we show the results obtained by the current version of system:





This video shows the behaviour of our system deployed on the quadruped robot StarlETH, developed at the Autonomous Systems Lab, ETH Zürich.