An Adaptive Framework for Scalable Multi-view Video Coding in H.264/AVC
Supervisor(s) and Committee member(s): Mahmoud Reza Hashemi (Supervisor), Shervin Shirmohammadi (Advisor)
With the growing demand for 3D video, efforts are underway to incorporate it in the next generation of broadcast and streaming applications and standards. Scalability is one possible solution to reduce the amount of data in multi-view/3D video in heterogeneous environments. But using Scalable Multi-view Video Coding (SMVC) for multi-view/3D video still has many unresolved challenges. In this thesis, we propose an adaptive framework to use SMVC in various 3D video applications effectively.
For this issue, first, the proper scalable modality should be selected according to the application at hand, its related features and requirements. To the best of our knowledge, no work has systematically defined new and proper scalable modalities specifically for multi-view 3D video, so far. Hence, at the first step of the proposed framework, we will suggest a methodology to extract the proper scalable modalities for multi-view/3D video.
In addition, while SMVC can help support heterogeneous receivers, the question becomes: how to scale the 3D video content in a given type of scalability and a specified application in order to achieve the highest performance and satisfy the receivers’ constraint as much as possible? In other words, the proper mechanism to assign SMVC data to various layers should be clearly determined. This issue is considered as the second step of our proposed framework. This method uses the inter-layer and intra-layer disparity concepts. Note that specific features of any given scalable modality should be used to define these concepts in that specific scalable modality. Simulation results indicate that the proposed method achieves relatively better compression rate for each layer, with much less overhead.
At the next step of our proposed framework, we propose an analytical view-level rate model for multi-view video coding. Our rate model takes into account both previous theoretical results as well as new results specifically obtained for multi-view video and confirmed by comprehensive practical experiments. Simulation results show that our model can predict the rate of each view with relatively high precision and a low estimation error of 12% on average for tested sequences.
In addition, the evaluation of the overall visual quality of scalable multi-view video requires a new objective perceptual quality measure specifically designed for scalable multi-view/3D video. Although several subjective and objective quality assessment methods have been proposed for multi-view/3D sequences, no comparable attempt has been made for quality assessment of scalable multi-view/3D video so far. Hence, in this framework, we propose a new methodology to build suitable objective quality assessment metrics for different scalable modalities in multi-view/3D video. Our proposed methodology considers the importance of each layer and its content as a quality of experience factor in the overall quality. Furthermore, in addition to the quality of each layer, the concept of inter-layer and intra-layer disparity is considered as an effective feature to evaluate overall perceived quality more accurately. Our simulation results indicate that the correlation coefficient between our extracted objective quality evaluation metric and subjective quality assessment is 0.8 on average for tested video sequences.
At the last step of our proposed framework, we present a novel method for rate-distortion optimization in scalable multi-view video that tries to minimize the perceptual distortion of decoded video under the conditions that the sum of bits generated from different views is constrained within a given bit budget. Since the constraint-based optimization problem is usually computational intensive, our proposed approach considers the concept of intra-layer and inter-layer disparity to reduce this computational complexity. Experimental results show that the proposed approach uses on average 24% and 42% less bitrate than the H.264/AVC rate-distortion optimization for base and base plus enhancement layers, respectively.
Although the thesis is in Farsi (Persian), the following English papers capture most of its essence:
H. Roodaki, M.R. Hashemi, and S. Shirmohammadi, “A New Methodology to Derive Objective Quality Assessment Metrics for Scalable Multi-view 3D Video Coding”, ACM Transactions on Multimedia Computing, Communications, and Applications, Vol. 8, No. 3S, Article 44, September 2012, 25 pages.
H. Roodaki, M.R. Hashemi, and S. Shirmohammadi, “Rate-Distortion Optimization for Scalable Multi-View Video Coding”, Proc. IEEE International Conference on Multimedia and Expo, Chengdu, China, July 14-18 2014, 6 pages.
H. Roodaki, Z. Iravani, M.R. Hashemi, S. Shirmohammadi, and M. Gabbouj, “A New Rate Distortion Model for Multi-View/3D Video Coding”, Proc. IEEE International Workshop on Hot Topics in 3D, in Proc. IEEE International Conference on Multimedia and Expo, July 15-19 2013, San Jose, USA, 6 pages.
H. Roodaki, M.R. Hashemi, and S. Shirmohammadi, “New Scalable Modalities in Multi-view 3D Video”, Proc. ACM Workshop on Mobile Video, Oslo, Norway, February 27 2013, pp. 25-30.
Multimedia Processing Laboratory (MPL), and Distributed and Collaborative Virtual Environment Research (DISCOVER) Lab
Research at the DISCOVER Lab is directed towards the enhancement of next generation human-human and human-information communication and interaction through advanced multimedia technology and virtual environments. Through our many projects we are developing new ideas and technology that will make easy-to-use multimedia environments and systems a reality. Research projects at the DISCOVER lab typically fall into the following categories:
• Networked Games and Collaborative Virtual Environments
• Multimedia Systems and Applications
• 3D Physical Modelling
• Ambient Intelligent Multimedia Environments
• Intelligent Sensor Networks and Ubiquitous Computing
• Haptics and Teleoperation
• Multimedia-Assisted Rehabilitation Engineering
The Multimedia Processing Laboratory (MPL) at the University of Tehran hosts research projects in Multimedia Systems and Networking, specifically:
• Receiver aware video encoding and adaptation
• Scalable multi-view video coding
• Cloud media and Cloud gaming
• Dynamic mapping of multimedia applications on a cloud of MPSoCs
• Reconfigurable hardware architectures for multimedia processing
• Hardware implementation of multimedia applications.