Jia Li

Learning-based Visual Saliency Computation

Supervisor(s) and Committee member(s): Wen Gao

URL: http://www.jdl.ac.cn/doc/JiaLiPhdThesis.pdf

With the rapid development of Internet, the amounts of images and videos are now growing explosively, leading to many new challenges on image/video processing. On one hand, the processing capability of computer is limited and the computational resource should be allocated to the important visual information with high priorities. On the other hand, the analysis results given by computer should be consistent with human cognition. To solve these two problems, this
thesis will focus on learning-based visual saliency computation and the main objective can be described as predicting, locating and mining the important visual information that is consistent with human cognition. The main contributions of this thesis can be summarized as follows:

Firstly, this thesis presents a probabilistic multi-task learning approach for computing visual saliency by simultaneously integrating the bottom-up and top-down factors. To the best of our knowledge, it is the first approach that explores the problem of visual saliency computation with the multi-task learning algorithm. In our approach, the bottom-up and the top-down factors are considered simultaneously in a probabilistic framework. In this framework, a bottom-up component simulates the low-level processes in human vision system using multi-scale wavelet decomposition; while a top-down component simulates the high-level processes to bias the competition of the input visual stimuli. Moreover, we propose a multi-task learning algorithm to optimize the models and model fusion strategies for various scenes. Extensive experiments on several datasets show that this approach demonstrates high robustness and effectiveness in computing visual saliency.

Secondly, this thesis proposes a cost-sensitive rank learning approach for visual saliency computation. To the best of our knowledge, it is the first approach that formulates the problem of visual saliency computation in a rank learning framework. For the video dataset with sparse eye-fixations, this approach avoids the explicit selection of reliable positive and negative samples. Instead, all the positive and unlabeled data are directly integrated into a cost-sensitive rank learning framework. Experimental results show that the rank learning framework can simultaneously take the influences of local visual attributes and pair-wise “target-distractor” correlations into account, resulting in better performance on the video dataset with sparse eye fixations.

Thirdly, this thesis presents a multi-task rank learning approach for visual saliency computation. In this approach, the problem of visual saliency computation is formulated in a multi-task rank learning framework to infer multiple saliency models that apply to different scene clusters. In the training process, this approach can infer multiple visual saliency models simultaneously. With an appropriate sharing of information across models, the generalization ability of each model can be greatly improved. Extensive experiments on the eye-fixation dataset show that our approach is highly effective in computing visual saliency in various scenes.

Fourthly, the thesis proposes a novel approach for salient object extraction by using complementary saliency maps. Then a video advertising system is developed to demonstrate its feasibility. This system consists of mainly two modules: the pull advertising module and the push advertising module. In these two modules, the interesting/salient objects are extracted through simple user interactions or complementary saliency maps, respectively. These interesting/salient objects, along with the user preferences, are used to provide content-related and user-targeted ads in a low-intrusive way. In the future, this system will be integrated by HuaWei, a worldwide well-known telecommunication company, into their intelligent streaming media service products.

In summary, this thesis investigates three important issues in learning-based visual saliency computation. Moreover, tentative studies have been carried out on salient object extraction and its application in saliency-based video advertising. To the best of our knowledge, this thesis presents a systematic study on how to apply machine learning into visual saliency computation for the first time. Moreover, this thesis demonstrates the feasibility and effectiveness of learning-based visual saliency computation. This will spark a great interest of research in the related communities in years to come.

Bookmark the permalink.