RoboComp Logo

A simple robotics framework.

GSoC’20 RoboComp project: Human recognition (identification) using multi-modal perception system

4th June 2020

About me

I am Shubh Maheshwari, in my final year at IIIT-H. I am intrested in Machine Learning, Computer Vision, Graphics. I am doing honors under Prof. C.V. Jawahar. I have also been teaching assistant for SMAI (Statistical Methods In AI) and Optimization Methods.

I created a tour guide robot for our lab. The agent was able to track, tell jokes and interact with users. Currently, I am working on skeleton-based action recognition & synthesis.

About the Project

In Robotics, it is crucial to identify humans and efficiently distinguish them. This gives the ability to perform further challenging tasks such as personalized interactions, social navigation and surveillance. Currently, Robocomp only uses face recognition for human identification. The aim of this project is to integrate other modalities, such as rgb image, skeleton-based pose, and silhouette of the body. Using these modalities, different human identification components (such as gait recognition & person re-identification) will be evaluated. Finally a multi-modal pipeline will be created to integrate these components into Robocomp.


Fig-1. Pipeline for human identification. (a) Detect humans and assign them a tracking id. (b) Segment human and extract gait features (c) Detect face and extract features (d) Combine features & search identity from the database.


As shown in figure-1, 3 seperate techniques capture different modalites of the input video.

a) Multi-Person Tracking Component

This component detects persons at every rgb frames. Then it associates them with humans detected in previous frames. Thus, it tracks humans present and assign them a tracking id(eg. XXXXX). This is also essential for short-term human identification.

b) Gait Recognition Component

Gait analysis is a soft biometric for long-term human identification. The segmentation mask for every human is extracted and stored across several frames. Based on the walking pattern, a deep learning model extractes features for recognition.

c) Face Recognition Component

Face recognition is the most widely used method for human-identication. First step is to detect the face followed by extracting features using a deep learning model.

d) Multi Modal Human Identification Component

This component will make a call to different components, combine their features and search the database/gallery to identify the person. If no person is identified, the data is added to the gallery as an unknown person.

Back To The Top

Shubh Maheshwari