Chidansh Amitkumar Bhatt

Probabilistic temporal multimedia datamining

Supervisor(s) and Committee member(s): Mohan S. Kankanhalli (supervisor)


Advances in data acquisition and storage technology have led to the growth of very large multimedia databases. Analyzing this huge amount of multimedia data to discover useful knowledge is a challenging problem. This challenge has opened the opportunity for research in Multimedia Data Mining (MDM), the process of finding interesting patterns from media data such as audio, video, image and text that are not ordinarily accessible by basic queries and associated results. The motivation of doing MDM is to use the discovered patterns to improve decision making. MDM has therefore attracted significant research efforts in developing methods and tools to organize, manage, search and perform specific tasks for data from domains such as surveillance, meetings, broadcast television, sports, archives, movies, medical data, as well as personal and online media collections.

Existing MDM methods consider either low-level content features (e.g., color, texture etc.) or high-level text meta-data features (e.g., object, action etc.) for mining purposes. While the low-level features describe the actual content of the signal data they are unable to provide high level semantics of the mined data. Such, high level semantics are essential for applications like behavior analysis, semantic similarity etc. On the other hand, high-level text meta-data (e.g., tags, comments etc.) are capable of providing semantic interpretation for mining but they are noisy and require manual effort. However, existing MDM techniques assume that the automatically obtained labels (e.g., concepts, events etc.) from detectors are accurate. However, in reality detectors label the events/concepts from different modalities with a certain confidence measure over a time-interval. Therefore, it is important to consider the uncertainties associated with the detected concepts over time in the process of multimedia data mining.

This thesis proposes a framework for multimedia data mining which leverages on the probabilistic, temporal and multimodal characteristics of multimedia data. The proposed Probabilistic Temporal Multimodal (PTM) data mining framework for multimedia applications effectively handles issues like incorporating semantic knowledge, data sparsity in semantic representation of multimedia data, inaccuracy of binary concept detectors, dynamic temporal correlation etc. The utility of the proposed framework is demonstrated in the following three multimedia applications,

  • Frequent event patterns for group meeting behavior analysis.
  • Concept-based near-duplicate video clip clustering for novelty re-ranking of web video search results.
  • Adaptive ontology rule based classification for composite concept detection.

Towards the end of the thesis, we present our conclusions and future research directions.

Multimedia Analysis and Synthesis Lab


Our research philosophy is to do problem-driven research in multimedia systems which explore the development of novel algorithms and techniques. There are two main themes of our research work: content-based multimedia information processing and multimedia information security. Both are systems research areas which have fundamental conceptual issues arising out of real-world problems. So their flavor is a blend of both basic and applied research.

The long-term goal of this research program is to develop fundamental techniques, algorithms and applications which can allow multimedia data to be utilized with as much ease as text data can be on today's computers. The medium-term aim of this research is to develop techniques for semantic content-based processing of image, video and audio, to provide intuitive access and retrieval to image, video and audio information, and to provide tools for processing, analyzing & synthesizing images, video and audio. In the short term, this research will have a major impact on many areas such as consumer electronics, web-based services, social media, video surveillance and media security & privacy. The long-term impact will be on the advancement of the state of understanding of sensory information processing.

Bookmark the permalink.