Geometry-aware analysis of high-dimensional visual information set
Supervisor(s) and Committee member(s): Pascal Frossard (supervisor)
Over the past few decades we have been experiencing a data explosion; massive amounts of data are increasingly collected and multimedia databases, such as YouTube and Flickr, are rapidly expanding. At the same time rapid technological advancements in mobile devices and vision sensors have led to the emergence of novel multimedia mining architectures. These produce even more multimedia data, which are possibly captured under geometric transformations and need to be efficiently stored and analyzed. It is also common in such systems that data are collected distributively. This very fact poses great challenges in the design of effective methods for analysis and knowledge discovery from multimedia data. In this thesis, we study various instances of the problem of classification of visual data under the view-point of modern challenges. Roughly speaking, classification corresponds to the problem of categorizing an observed object to a particular class (or category), based on previously seen examples. We address important issues related to classification, namely flexible data representation for joint coding and classification, robust classification in the case of large geometric transformations and classification with multiple object observations in both centralized and distributed settings.
We start by identifying the need for flexible data representation methods that are efficient in both storage and classification of multimedia data. Such flexible schemes offer the potential to significantly impact the efficiency of current multimedia mining systems, as they permit the classification of multimedia patterns directly in their compressed form, without the need for decompression. We propose a framework, called semantic approximation, which is based on sparse data representations. It formulates dimensionality reduction as a matrix factorization problem, under hybrid criteria that are posed as a trade-off between approximation for efficient compression and discrimination for effective classification. We demonstrate that the proposed methodology competes with state-of-the-art solutions in image classification and face recognition, implying that compression and classification can be performed jointly without performance penalties with respect to expensive disjoint solutions.
Next, we allow the multimedia patterns to be geometrically transformed and we focus on transformation invariance issues in pattern classification. When a pattern is transformed, it spans a manifold in a high dimensional space. We focus on the problem of computing the distance of a certain test pattern from the manifold, which is also closely related to the image alignment problem. This is a hard non-convex problem that has only been sub-optimally addressed before. We represent transformation manifolds based on sparse geometric expansions, which results in a closed-form representation of the manifold equation with respect to the transformation parameters. When the transformation consists of a synthesis of translations, rotations and scalings, we prove that the objective function of this problem can be decomposed as a difference of convex functions (DC). This very property allows us to solve optimally our optimization problem with a cutting plane algorithm, which is well known to successfully find the global minimizer in practice. We showcase applications in robust face recognition and image alignment.
The classification problem with multiple observations is addressed next. Multiple observations are typically produced in practice when an object is observed over successive time instants or under different viewpoints. In particular, we focus on the problem of classifying an object when multiple geometrically transformed observations of it are available. These multiple observations typically belong to a manifold and the classification problem resides in determining which manifold the observations belong to. We show that this problem can be viewed as a special case of semi-supervised learning, where all unlabelled examples belong to the same class. We design a graph-based algorithm, which exploits the structure of the manifold. Estimating the unknown object class then results into a discrete optimization problem that can be solved efficiently. We show the performance of our algorithm in classification of multiple handwritten digit images and in video-based face recognition. Next, we study the problem of classification of multiple observations in distributed scenarios, such as camera networks. In this case the classification is performed iteratively and distributively, without the presence of a central coordinator node. The main goal is to reach a global classification decision using only local computation and communication, while ensuring robustness to changes in network topology. We propose to use consensus algorithms in order to design a distributed version of the aforementioned graph-based algorithm. We show that the distributed classification algorithm has similar performance as its centralized counterpart, provided that the training set is sufficiently large. Finally, we delve further into the convergence properties of consensus-based distributed algorithms and we propose an acceleration methodology for fast convergence that uses the memory of the sensors. Our simulations show that the convergence is indeed accelerated in both static and dynamic networks, and that distributed classification in sensor networks can significantly benefit from them.
Overall, the present thesis addresses a few important issues related to pattern analysis and classification in modern multimedia systems. Our solutions for semantic approximation and transformation invariance can impact the efficiency and robustness of classification in multimedia systems. Furthermore, our graph-based framework for multiple observations is able to perform effective classification in both centralized and distributed environments. Finally, our fast consensus algorithms can significantly contribute to the accelerated convergence of distributed classification algorithms in sensor networks.