16 June 2022
Human-Like Perception in Surveillance
For Mehul S Raval, Professor and Associate Dean - Experiential Learning at the School of Engineering and Applied Science, the big question is always how one can integrate human-like perception and reasoning abilities in machines to ensure more effective collaboration between humans and machines. One of his current areas of research is person retrieval in surveillance videos, which basically questions if a machine, such as the CCTV, can be taught to identify individuals solely on the basis of broad descriptions or soft biometrics given as natural language descriptors to it. What this means is, the next time you describe a person as a tall, middle-aged woman wearing grey trousers and a red sweatshirt, will the CCTV be able to understand the input and place a handful of shortlisted probables identified by the machine right in front of you?
“Well, one thinks about how agonising it can be to identify a lost child in a crowded area even if there are multiple CCTVs monitoring the space. Or can one lock-in on a criminal based on the description by an eye-witness of a crime scene?” asks Professor Raval, speaking about his new paper and research on the topic. His paper titled Person Retrieval in Surveillance Videos using Attribute Recognition has been accepted for publication in Springer’s Journal of Ambient Intelligence and Humanized Computing.
He has also received a research grant as Principal Investigator for Semantic Person Retrieval in Surveillance (2022-2025) from the Gujarat Council on Science and Technology (GUJCOST). Unlike the existing approaches for identifying people, the proposed method uses a single deep network and fewer attributes to achieve state-of-the-art average results with reduced errors and quicker turnaround time.
The proposed approach uses five attributes: age, upper body (uBody) clothing colour, uBody clothing type, lower body (lBody) clothing colour, and lBody clothing type. Mask R-CNN is used for person detection, and the approach weighs each attribute to generate a ranking score for every detected person. “In Person Attribute Recognition (PAR), an individual is described by his or her appearance. PAR-based person retrieval is a crossmodal problem where the input is a textual description of the person’s appearance and the output is an image of the person. The human describable features are used to automatically retrieve the person from the recorded surveillance video,” he explains.
Professor Raval has co-authored his paper at Springer with students Hiren Galiyawala, PhD scholar at Ahmedabad University, and Meet Patel, LD College of Engineering. The research grant is also a collaborative effort between him and Co-Investigator Paawan Sharma, Associate Professor of the School of Technology, Pandit Deendayal Energy University.