The Team

Human Interaction with Machine Vision Systems

Automated visual scene understanding has remained a very hard problem for uncontrolled environments, like security, surveillance, and disaster response operations. For the foreseeable future, it is natural to expect that humans and computer vision systems will be working together in applications like disaster response. This project explores the fundamental scientific challenges in the coordination between humans and machine vision systems. We explore the problems of limited labeled data for training recognition systems, and video summarization for efficient representation of data for human analysis.


Active Learning

Theme 1

Most of the state-of-the-art approaches to human activity recognition in video need an intensive training stage and assume that all of the training examples are labeled and available beforehand. They also use hand-crafted features. These assumptions are unrealistic for many applications where we have to deal with streaming videos. In these videos, as new activities are seen, they can be leveraged upon to improve the current activity recognition models. Under this project, we develop an incremental activity learning framework that is able to continuously update the activity models and learn new ones as more videos are seen. Our proposed approach leverages upon state-of-the-art machine learning tools, most notably active learning and deep learning. It does not require tedious manual labeling of every incoming example of each activity class.

Sample Publications


Camera Network Summarization

Theme 2
Numbers above the frame (E1, · · · , E26) represent the event number whereas the numbers below (V1, · · · , V4) indicate the view from which the event is detected.

With the recent explosion of big video data, it is becoming increasingly important to automatically extract a brief yet informative summary of these videos in order to enable a more efficient and engaging viewing experience. Video summarization automates this process by providing a succinct representation of a video or a set of videos. The goal of this project is to summarize multiple videos captured in a network of surveillance cameras without requiring any prior knowledge on the field of view of the cameras. We solve the task of summarizing multi-view videos by formulating a sparse representative selection approach over a learned subspace shared by the multiple videos.

Sample Publications