My research till date is broadly in two areas, Privacy Protected Video Surveillance and Multimedia Archival & Retrieval Systems.


Privacy Protected Video Surveillance:

Recent widespread deployment and increased sophistication of video surveillance systems have raised many concerns on their potential threats to individual`s right of privacy. The combination of ubiquitous cameras, wireless connectivity and powerful recognition algorithms make it easier than ever to monitor every aspect of our daily activities. The goals of this project are to develop the necessary technologies to protect the privacy of individuals without compromising the benefits brought forth by modern video surveillance.

The privacy protected surveillance system we are developing looks as below.


The system has several units over the pipeline - starting with the RFID system to detect the user whose privacy need to be protected, object identification unit which is basically some sort of image processing techniques like background subtraction followed by kalman filter based object tracking, next unit being the Object Obfuscation Unit which ranges from simple blackboxing to advanced techniques like complete object removal using video inpainting.

The pipeline till this stage are same for most of the privacy protection systems at a higher level. But one drawback of these techniques is once the video is modified as part of privacy protection there is no way to retrieve the original video and its completely lost. In video surveillance, there is a need to authenticate such modification process and the privacy information needs to be preserved as potential evidence. So we propose to use the datahiding technique to store the privacy video within the modified video itself in an imperceptual but also codec compliant stream fashion.

In the video above, the top left is the original surveillance video. The private object is extracted on the top right and the modified video is shown in the bottom left. The private object is then compressed, encrypted and embedded into the compressed domain of the host video, shown in the bottom right.

Privacy Preservation using DataHiding

The main technical challenge is to embed a large amount of information without significantly distorting the host video and raising the bitrate. If we embed watermark bits in fixed high frequency coefficients, the compression algorithm fails and shoots up the output bit rate. To tackle this, I developed a new rate-distortion based watermarking algorithm, which dynamically parity embeds depending on the particular DCT block substrate. Our proposed rate distortion algorithm minimizes both the perceptual distortion and output bit rate by exploiting the features of the compression technique used in the codec. This is achieved by analyzing the Lagrangian cost of embedding the watermark in various locations of the image, also providing a control between rate an distortion based on the requirement. The block diagram of data hiding module along with the regular DCT based video codec is shown below.


As shown in above block diagram, the data hiding is done within the motion compensation loop of the codec in the DCT domain of the residue frame. The embedding is done in two different ways - First one is the irreversible parity embedding based on QIM which offers high embedding capacity but causes some distortion due to data embedding. Second one is the irreversible embedding based on histogram shift offers limited embedding capacity but the datahiding distortion is reversible at a frame level at the decoder. The R-D optimization framework outputs the best possible embedding locations for a given rate-distortion requirement.

The R-D framework uses Watson`s perceptual model to quantify the embedding distortion and rate models are obtained using a simulated dummy entropy coder. The perfect exhaustive optimization is difficult to achieve and hence using some assumptions the problem is solved at a block level first and then extended to entire frame by treating it as a bit allocation problem as shown above.

Results of the algorithm can be seen below under varying rate-distortion parameter. The bit-rate decreases i.e compression efficiency goes better from left to right in above figure while embedding distortion becomes more perceptible.

Privacy Management Architecture

As explained before and in above fig, the privacy system offers variable levels of security to different subjects in the video and outputs video to the clients accordingly. In fact, the original video can only be retrieved to a particular client only after getting the permission from the subject. The secure management of this privacy information and other metadata is handled using secure three software agent architecture as shown below.


Video similarity Search using audio-visual features

In this summer work, I developed a video similarity search engine for a video database consisting of more than 3 million videos. The main application of this software is to pick the copyrighted videos which are uploaded into the video database and cannot be searched with respect to traditional meta data like the textual tags. Some other applications that can be addressed are to avoid redundant results for a search query and to provide the alternate version to the user in case of network failure from one server where the exact video is residing.

This software uses ordinal features as the visual features and power difference over consecutive frequency bands and time windows as the audio feature. These features from the entire video are summarized to a compact signature using a random projection based technique called VISIG. Each video gives some 100 such signatures with reference to each feature and stored in corresponding feature database. Using the traditional inverted file concept and some post-processing, query videos are matched for similarity score over the entire database. The chosen features and post processing brings in the required robustness again minor variations in the video due to compression,insertion of logos, geometric distortion, changes in brightness, contrast, and color, temporal reordering, deletion, insertion, transition effect etc. The video sharing hosts can use this system for picking copyrighted videos as shown below.

Apart from direct matching, this software also offers an offline step of video clustering. The clustering is done over similarity scores using concept of Minimum spanning forest and clusters are obtained based on thresholding the edge density scores. This step improves the recall of some specific videos as the search results also return from clusters along with direct matching.

The video search attained a 90% + in both recall and precision with search query response as less as 100ms. The software is implemented in C++ and PERL using opencv and ffmpeg libraries for media handling. The software is scalable in the sense, it works such that the videos can be handled from various machines with their corresponding feature databases. Only the clustering database needs to be updated in a unique server.