Conformal Predictions in Multimedia Pattern Recognition
Supervisor(s) and Committee member(s): Sethuraman Panchanathan (advisory), Jieping Ye, Baoxin Li, Vladimir Vovk (opponents)
The field of multimedia pattern recognition is on a fundamental quest to design intelligent systems that can learn and behave the way humans do. One important aspect of human intelligence that has so far not been given sufficient attention in these fields is the capability of humans to hedge decisions. Humans can express when they are certain about a decision they have made, and when they are not. Unfortunately, machine learning techniques today are not yet fully equipped to be trusted with this critical task. This work seeks to address this fundamental knowledge gap. Existing approaches that provide a measure of confidence of a learning algorithm on a prediction such as those based on the Bayesian theory or the Probably Approximately Correct learning theory require strong assumptions or often produce results that are not practical or reliable. However, the recently developed Conformal Predictions (CP) framework – which is based on the principles of hypothesis testing, transductive inference and algorithmic randomness – provides a game-theoretic approach to the estimation of confidence with several desirable properties such as online calibration and generalizability to all classification and regression methods.
This dissertation builds on the theory of Conformal Predictions to compute reliable confidence measures that aid decision-making in real-world multimedia problems. The theory behind the CP framework guarantees that the confidence values obtained using this transductive inference framework manifest as the actual error frequencies in the online setting, i.e. they are well-calibrated. Further, this framework can be used with any classifier, meta-classifier or regressor (such as Support Vector Machines, k-Nearest Neighbors, Adaboost, ridge regression, etc). The key contributions of this dissertation (outlined below) are validated on four problems from the domains of healthcare and assistive technologies: two classification-based applications (risk prediction in cardiac decision support and multimodal person recognition), and two regression-based applications (head pose estimation and saliency prediction in radiological images). The cost of errors in decision-making is often high in these application domains, and hence these problems are selected to validate the contributions. The key contributions of this work are summarized below:
(1) Efficiency Maximization in Conformal Predictors: The CP framework has two important properties that define its utility: validity and efficiency. Validity refers to controlling the frequency of errors within a pre-specified error threshold. Also, since the framework outputs a set of possible predictions as the result, it is essential that the prediction sets are as small as possible. This property is called efficiency. Evidently, an ideal implementation of the framework would ensure that the algorithm provides high efficiency along with validity. However, this is not a straightforward task, and depends on the learning algorithm (classification or regression, as the case may be) as well as the non-conformity measure chosen in a given context. In this work, a novel framework to learn a kernel (or distance metric) that will maximize the efficiency in a given context has been proposed and validated on different risk-sensitive applications.
(2) Conformal Predictions for Information Fusion: The CP framework ensures the calibration property in the estimation of confidence in pattern recognition. Most of the existing work in this context has been carried out using single classification systems or ensemble classifiers (such as boosting). However, there been a recent growth in the use of multimodal fusion algorithms and multiple classifier systems. A study of statistical approaches to combine p-values from multiple classifiers and regressors has been performed, which revealed the usefulness of quantile combination methods for calibrated confidence values in information fusion contexts.
(3) Online Active Learning using Conformal Predictors: As newer data are encountered, it becomes essential to select appropriate data instances for labeling and updating the classifier to facilitate a continuously learning system. Using the p-values computed by the CP framework, a novel online active learning approach has been proposed and validated. This active learning method can also be extended to an information fusion setting, where there are multiple information sources or multiple modalities.
The results obtained in this work demonstrate promise and potential in using these contributions to provide reliable measures of confidence in multimedia pattern recognition problems in real-world settings.
Center for Cognitive Ubiquitous Computing, Arizona State University
The Center for Cognitive Ubiquitous Computing (CUbiC) at Arizona State University is an inter-disciplinary research center focused on human-centered multimedia computing in the domains of assistive, rehabilitative and healthcare technologies. CUbiC employs a transdisciplinary research approach, which includes computer scientists, cognitive scientists, psychologists, healthcare professionals, engineers, and designers, for solving the challenges in human-centered multimedia computing. Existing approaches have largely relied on the so-called "able" population to derive insights to shape the efforts towards achieving human-centeredness. In contrast, CUbiC has proposed a new archetype to human-centered multimedia computing inspired by the needs of individuals with disabilities. The study of sensory, motor, perceptual and cognitive disabilities helps us understand the subtleties of human capabilities and limitations, thereby necessitating the design of newer methodologies for data capture, information processing and multimodal delivery. This approach results in not only the design and development of innovative multimedia solutions for enriching the lives of individuals with disabilities/disorders, but is also valuable for gaining a deeper understanding towards realizing unique solutions for mainstream multimedia applications.
The focal application domains (assistive, rehabilitative and healthcare technologies) represent unique facets of human-machine interaction, which provide unique perspectives to our research. The healthcare domain primarily deals with how a disability or deficit, in the broader sense of the term, is diagnosed in a user (and further treated appropriately); the rehabilitative domain deals with how a technology is closely associated with the user for a temporary period of time to help the user overcome the disability and regain normalcy; and the assistive domain deals with how a technology is associated with a user for long periods of time (sometimes an entire lifetime) to support and enrich daily activities, due to the presence of a chronic disability. This disability-inspired approach to multimedia computing has led to fundamental research advancements in various fields including multimodal sensing, signal processing, pattern recognition, machine learning, human-computer interaction and multimodal delivery. These advances have taken the shape of several projects under the umbrella of iCARE (information technology Centric Assistive and Rehabilitative Environments), including the Reader, Note Taker, Information Assistant, Environment Perception, Multimodal (audio and haptic) Interfaces, and the Interaction Assistant. Our work thus far has demonstrated that research centered on individuals with disabilities and deficits has far reaching implications for the general population and in advancing the core principles of human-centered multimedia computing.