In Search of Video Event Semantics
Supervisor(s) and Committee member(s): Advisor(s): Arnold W.M. Smeulders (promotor), Cees G.M. Snoek (co-promotor).
In this thesis we aim to represent an event in a video using semantic features. We start from a bank of concept detectors for representing events in video.
At first we considered the relevance of concepts to the event inside the video representation. We address the problem of video event classification using a bank of concept detectors. Different from existing work, which simply relies on a bank containing all available detectors, we propose an algorithm that learns from examples what concepts in bank are most informative per event.
Secondly, we concentrated on the accuracy of concept detectors. Different from existing works, which obtain a semantic representation by training concepts over entire video clips, we propose an algorithm that learns a set of relevant frames as the concept prototypes from web video examples, without the need for frame-level annotations, and use them for representing an event video.
Thirdly, we consider the problem of searching video events with concepts. We aim at querying web videos for events using only a handful of video query examples, where the standard approach learns a ranker from hundreds of examples. We consider a semantic representation, consisting of off-the-shelf concept detectors, to capture the variance in semantic appearance of events.
Finally, we consider the problem of video event search without semantic concepts. The prevailing solutions in literature rely on a semantic video representation obtained from thousands of pre-trained concept detectors. Different from them, we propose a new semantic video representation that is based on freely available social tagged videos only, without the need for training any intermediate concept detectors.
Intelligent Sensory Information Systems group
The world is full of digital images and videos. In this deluge of visual information, the grand challenge is to unlock its content. This quest is the central research aim of the Intelligent Sensory Information Systems group. We address the complete knowledge chain of image and video retrieval by machine and human. Topics of study are semantic understanding, image and video mining, interactive picture analytics, and scalability. Our research strives for automation that matches human visual cognition, interaction surpassing man and machine intelligence, visualization blending it all in interfaces giving instant insight, and database architectures for extreme sized visual collections. Our research culminates in state-of-the-art image and video search engines which we evaluate in leading benchmarks, often as the best performer, in user studies, and in challenging applications.