JPEG Column: 84th JPEG Meeting in Brussels, Belgium

The 84th JPEG meeting was held in Brussels, Belgium.

This meeting was characterised by significant progress in most of JPEG projects and also exploratory studies. JPEG XL, the new image coding system, has issued the Committee Draft, giving shape to this new effective solution for the future of image coding. JPEG Pleno, the standard for new imaging technologies, Part 1 (Framework) and Part 2 (Light field coding) have also reached Draft International Standard status.

Moreover, exploration studies are ongoing in the domain of media blockchain and on the application of learning solutions for image coding (JPEG AI). Both have triggered a number of activities providing new knowledge and opening new possibilities on the future use of these technologies in future JPEG standards.

The 84th JPEG meeting had the following highlights: 84th meetingTE-66694113_10156591758739370_4025463063158194176_n

  • JPEG XL issues the Committee Draft
  • JPEG Pleno Part 1 and 2 reaches Draft International Standard status
  • JPEG AI defines Common Test Conditions
  • JPEG exploration studies on Media Blockchain
  • JPEG Systems –JLINK working draft
  • JPEG XS

In the following, a short description of the most significant activities is presented.

 

JPEG XL

The JPEG XL Image Coding System (ISO/IEC 18181) has completed the Committee Draft of the standard. The new coding technique allows storage of high-quality images at one-third the size of the legacy JPEG format. Moreover, JPEG XL can losslessly transcode existing JPEG images to about 80% of their original size simplifying interoperability and accelerating wider deployment.

The JPEG XL reference software, ready for mobile and desktop deployments, will be available in Q4 2019. The current contributors have committed to releasing it publicly under a royalty-free and open source license.

 

JPEG Pleno

A significant milestone has been reached during this meeting: the Draft International Standard (DIS) for both JPEG Pleno Part 1 (Framework) and Part 2 (Light field coding) have been completed. A draft architecture of the Reference Software (Part 4) and developments plans have been also discussed and defined.

In addition, JPEG has completed an in-depth analysis of existing point cloud coding solutions and a new version of the use-cases and requirements document has been released reflecting the future role of JPEG Pleno in point cloud compression. A new set of Common Test Conditions has been released as a guideline for the testing and evaluation of point cloud coding solutions with both a best practice subjective testing protocol and a set of objective metrics.

JPEG Pleno holography activities had significant advances on the definition of use cases and requirements, and description of Common Test Conditions. New quality assessment methodologies for holographic data defined in the framework of a collaboration between JPEG and Qualinet were established. Moreover, JPEG Pleno continues collecting microscopic and tomographic holographic data.

 

JPEG AI

The JPEG Committee continues to carry out exploration studies with deep learning-based image compression solutions, typically with an auto-encoder architecture. The promise that these types of codecs hold, especially in terms of coding efficiency, will be evaluated with several studies. In this meeting, a Common Test Conditions was produced, which includes a plan for subjective and objective quality assessment experiments as well as coding pipelines for anchor and learning-based codecs. Moreover, a JPEG AI dataset was proposed and discussed, and a double stimulus impairment scale experiment (side-by-side) was performed with a mix of experts and non-experts in a controlled environment.

 

JPEG exploration on Media Blockchain

Fake news, copyright violation, media forensics, privacy and security are emerging challenges in digital media. JPEG has determined that blockchain and distributed ledger technologies (DLT) have great potential as a technology component to address these challenges in transparent and trustable media transactions. However, blockchain and DLT need to be integrated closely with a widely adopted standard to ensure broad interoperability of protected images. JPEG calls for industry participation to help define use cases and requirements that will drive the standardization process. In order to clearly identify the impact of blockchain and distributed ledger technologies on JPEG standards, the committee has organised several workshops to interact with stakeholders in the domain.

The 4th public workshop on media blockchain was organized in Brussels on Tuesday the 16th of July 2019 during the 84th ISO/IEC JTC 1/SC 29/WG1 (JPEG) Meeting. The presentations and program of the workshop are available on jpeg.org.

The JPEG Committee has issued an updated version of the white paper entitled “Towards a Standardized Framework for Media Blockchain” that elaborates on the initiative, exploring relevant standardization activities, industrial needs and use cases.

To keep informed and to get involved in this activity, interested parties are invited to register to the ad hoc group’s mailing list.

 

JPEG Systems – JLINK

At the 84th meeting, IS text reviews for ISO/IEC 19566-5 JUMBF and ISO/IEC 19566-6 JPEG 360 were completed; IS publication will be forthcoming.  Work began on adding functionality to JUMBF, Privacy & Security, and JPEG 360; and initial planning towards developing software implementation of these parts of JPEG Systems specification.  Work also began on the new ISO/IEC 19566-7 Linked media images (JLINK) with development of a working draft.

 

JPEG XS

The JPEG Committee is pleased to announce new Core Experiments and Exploration Studies on compression of raw image sensor data. The JPEG XS project aims at the standardization of a visually lossless low-latency and lightweight compression scheme that can be used as a mezzanine codec in various markets. Video transport over professional video links (SDI, IP, Ethernet), real-time video storage in and outside of cameras, memory buffers, machine vision systems, and data compression onboard of autonomous vehicles are among the targeted use cases for raw image sensor compression. This new work on raw sensor data will pave the way towards highly efficient close-to-sensor image compression workflows with JPEG XS.

 

Final Quote

“Completion of the Committee Draft of JPEG XL, the new standard for image coding is an important milestone. It is hoped that JPEG XL can become an excellent replacement of the widely used JPEG format which has been in service for more than 25 years.” said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.

About JPEG

The Joint Photographic Experts Group (JPEG) is a Working Group of ISO/IEC, the International Organisation for Standardization / International Electrotechnical Commission, (ISO/IEC JTC 1/SC 29/WG 1) and of the International Telecommunication Union (ITU-T SG16), responsible for the popular JPEG, JPEG 2000, JPEG XR, JPSearch, JPEG XT and more recently, the JPEG XS, JPEG Systems, JPEG Pleno and JPEG XL families of imaging standards.

More information about JPEG and its work is available at www.jpeg.org.

Future JPEG meetings are planned as follows:

  • No 85, San Jose, California, U.S.A., November 2 to 8, 2019
  • No 86, Sydney, Australia, January 18 to 24, 2020

MPEG Column: 127th MPEG Meeting in Gothenburg, Sweden

The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects.

Plenary of the 127th MPEG Meeting in Gothenburg, Sweden.

Plenary of the 127th MPEG Meeting in Gothenburg, Sweden.

The 127th MPEG meeting concluded on July 12, 2019 in Gothenburg, Sweden with the following topics:

  • Versatile Video Coding (VVC) enters formal approval stage, experts predict 35-60% improvement over HEVC
  • Essential Video Coding (EVC) promoted to Committee Draft
  • Common Media Application Format (CMAF) 2nd edition promoted to Final Draft International Standard
  • Dynamic Adaptive Streaming over HTTP (DASH) 4th edition promoted to Final Draft International Standard
  • Carriage of Point Cloud Data Progresses to Committee Draft
  • JPEG XS carriage in MPEG-2 TS promoted to Final Draft Amendment of ISO/IEC 13818-1 7th edition
  • Genomic information representation – WG11 issues a joint call for proposals on genomic annotations in conjunction with ISO TC 276/WG 5
  • ISO/IEC 23005 (MPEG-V) 4th Edition – WG11 promotes the Fourth edition of two parts of “Media Context and Control” to the Final Draft International Standard (FDIS) stage

The corresponding press release of the 127th MPEG meeting can be found here: https://mpeg.chiariglione.org/meetings/127

Versatile Video Coding (VVC)

The Moving Picture Experts Group (MPEG) is pleased to announce that Versatile Video Coding (VVC) progresses to Committee Draft, experts predict 35-60% improvement over HEVC.

The development of the next major generation of video coding standard has achieved excellent progress, such that MPEG has approved the Committee Draft (CD, i.e., the text for formal balloting in the ISO/IEC approval process).

The new VVC standard will be applicable to a very broad range of applications and it will also provide additional functionalities. VVC will provide a substantial improvement in coding efficiency relative to existing standards. The improvement in coding efficiency is expected to be quite substantial – e.g., in the range of 35–60% bit rate reduction relative to HEVC although it has not yet been formally measured. Relative to HEVC means for equivalent subjective video quality at picture resolutions such as 1080p HD or 4K or 8K UHD, either for standard dynamic range video or high dynamic range and wide color gamut content for levels of quality appropriate for use in consumer distribution services. The focus during the development of the standard has primarily been on 10-bit 4:2:0 content, and 4:4:4 chroma format will also be supported.

The VVC standard is being developed in the Joint Video Experts Team (JVET), a group established jointly by MPEG and the Video Coding Experts Group (VCEG) of ITU-T Study Group 16. In addition to a text specification, the project also includes the development of reference software, a conformance testing suite, and a new standard ISO/IEC 23002-7 specifying supplemental enhancement information messages for coded video bitstreams. The approval process for ISO/IEC 23002-7 has also begun, with the issuance of a CD consideration ballot.

Research aspects: VVC represents the next generation video codec to be deployed in 2020+ and basically the same research aspects apply as for previous generations, i.e., coding efficiency, performance/complexity, and objective/subjective evaluation. Luckily, JVET documents are freely available including the actual standard (committee draft), software (and its description), and common test conditions. Thus, researcher utilizing these resources are able to conduct reproducible research when contributing their findings and code improvements back to the community at large. 

Essential Video Coding (EVC)

MPEG-5 Essential Video Coding (EVC) promoted to Committee Draft

Interestingly, at the same meeting as VVC, MPEG promoted MPEG-5 Essential Video Coding (EVC) to Committee Draft (CD). The goal of MPEG-5 EVC is to provide a standardized video coding solution to address business needs in some use cases, such as video streaming, where existing ISO video coding standards have not been as widely adopted as might be expected from their purely technical characteristics.

The MPEG-5 EVC standards includes a baseline profile that contains only technologies that are over 20 years old or are otherwise expected to be royalty-free. Additionally, a main profile adds a small number of additional tools, each providing significant performance gain. All main profile tools are capable of being individually switched off or individually switched over to a corresponding baseline tool. Organizations making proposals for the main profile have agreed to publish applicable licensing terms within two years of FDIS stage, either individually or as part of a patent pool.

Research aspects: Similar research aspects can be described for EVC and from a software engineering perspective it could be also interesting to further investigate this switching mechanism of individual tools or/and fall back option to baseline tools. Naturally, a comparison with next generation codecs such as VVC is interesting per se. The licensing aspects itself are probably interesting for other disciplines but that is another story…

Common Media Application Format (CMAF)

MPEG ratified the 2nd edition of the Common Media Application Format (CMAF)

The Common Media Application Format (CMAF) enables efficient encoding, storage, and delivery of digital media content (incl. audio, video, subtitles among others), which is key to scaling operations to support the rapid growth of video streaming over the internet. The CMAF standard is the result of widespread industry adoption of an application of MPEG technologies for adaptive video streaming over the Internet, and widespread industry participation in the MPEG process to standardize best practices within CMAF.

The 2nd edition of CMAF adds support for a number of specifications that were a result of significant industry interest. Those include

  • Advanced Audio Coding (AAC) multi-channel;
  • MPEG-H 3D Audio;
  • MPEG-D Unified Speech and Audio Coding (USAC);
  • Scalable High Efficiency Video Coding (SHVC);
  • IMSC 1.1 (Timed Text Markup Language Profiles for Internet Media Subtitles and Captions); and
  • additional HEVC video CMAF profiles and brands.

This edition also introduces CMAF supplemental data handling as well as new structural brands for CMAF that reflects the common practice of the significant deployment of CMAF in industry. Companies adopting CMAF technology will find the specifications introduced in the 2nd Edition particularly useful for further adoption and proliferation of CMAF in the market.

Research aspects: see below (DASH).

Dynamic Adaptive Streaming over HTTP (DASH)

MPEG approves the 4th edition of Dynamic Adaptive Streaming over HTTP (DASH)

The 4th edition of MPEG-DASH comprises the following features:

  • service description that is intended by the service provider on how the service is expected to be consumed;
  • a method to indicate the times corresponding to the production of associated media;
  • a mechanism to signal DASH profiles and features, employed codec and format profiles; and
  • supported protection schemes present in the Media Presentation Description (MPD).

It is expected that this edition will be published later this year. 

Research aspects: CMAF 2nd and DASH 4th edition come along with a rich feature set enabling a plethora of use cases. The underlying principles are still the same and research issues arise from updated application and service requirements with respect to content complexity, time aspects (mainly delay/latency), and quality of experience (QoE). The DASH-IF awards the excellence in DASH award at the ACM Multimedia Systems conference and an overview about its academic efforts can be found here.

Carriage of Point Cloud Data

MPEG progresses the Carriage of Point Cloud Data to Committee Draft

At its 127th meeting, MPEG has promoted the carriage of point cloud data to the Committee Draft stage, the first milestone of ISO standard development process. This standard is the first one introducing the support of volumetric media in the industry-famous ISO base media file format family of standards.

This standard supports the carriage of point cloud data comprising individually encoded video bitstreams within multiple file format tracks in order to support the intrinsic nature of the video-based point cloud compression (V-PCC). Additionally, it also allows the carriage of point cloud data in one file format track for applications requiring multiplexed content (i.e., the video bitstream of multiple components is interleaved into one bitstream).

This standard is expected to support efficient access and delivery of some portions of a point cloud object considering that in many cases that entire point cloud object may not be visible by the user depending on the viewing direction or location of the point cloud object relative to other objects. It is currently expected that the standard will reach its final milestone by the end of 2020.

Research aspects: MPEG’s Point Cloud Compression (PCC) comes in two flavors, video- and geometric-based but still requires to be packaged into file and delivery formats. MPEG’s choice here is the ISO base media file format and the efficient carriage of point cloud data is characterized by both functionality (i.e., enabling the required used cases) and performance (such as low overhead).

MPEG 2 Systems/Transport Stream

JPEG XS carriage in MPEG-2 TS promoted to Final Draft Amendment of ISO/IEC 13818-1 7th edition

At its 127th meeting, WG11 (MPEG) has extended ISO/IEC 13818-1 (MPEG-2 Systems) – in collaboration with WG1 (JPEG) – to support ISO/IEC 21122 (JPEG XS) in order to support industries using still image compression technologies for broadcasting infrastructures. The specification defines a JPEG XS elementary stream header and specifies how the JPEG XS video access unit (specified in ISO/IEC 21122-1) is put into a Packetized Elementary Stream (PES). Additionally, the specification also defines how the System Target Decoder (STD) model can be extended to support JPEG XS video elementary streams.

Genomic information representation

WG11 issues a joint call for proposals on genomic annotations in conjunction with ISO TC 276/WG 5

The introduction of high-throughput DNA sequencing has led to the generation of large quantities of genomic sequencing data that have to be stored, transferred and analyzed. So far WG 11 (MPEG) and ISO TC 276/WG 5 have addressed the representation, compression and transport of genome sequencing data by developing the ISO/IEC 23092 standard series also known as MPEG-G. They provide a file and transport format, compression technology, metadata specifications, protection support, and standard APIs for the access of sequencing data in the native compressed format.

An important element in the effective usage of sequencing data is the association of the data with the results of the analysis and annotations that are generated by processing pipelines and analysts. At the moment such association happens as a separate step, standard and effective ways of linking data and meta information derived from sequencing data are not available.

At its 127th meeting, MPEG and ISO TC 276/WG 5 issued a joint Call for Proposals (CfP) addressing the solution of such problem. The call seeks submissions of technologies that can provide efficient representation and compression solutions for the processing of genomic annotation data.

Companies and organizations are invited to submit proposals in response to this call. Responses are expected to be submitted by the 8th January 2020 and will be evaluated during the 129th WG 11 (MPEG) meeting. Detailed information, including how to respond to the call for proposals, the requirements that have to be considered, and the test data to be used, is reported in the documents N18648, N18647, and N18649 available at the 127th meeting website (http://mpeg.chiariglione.org/meetings/127). For any further question about the call, test conditions, required software or test sequences please contact: Joern Ostermann, MPEG Requirements Group Chair (ostermann@tnt.uni-hannover.de) or Martin Golebiewski, Convenor ISO TC 276/WG 5 (martin.golebiewski@h-its.org).

ISO/IEC 23005 (MPEG-V) 4th Edition

WG11 promotes the Fourth edition of two parts of “Media Context and Control” to the Final Draft International Standard (FDIS) stage

At its 127th meeting, WG11 (MPEG) promoted the 4th edition of two parts of ISO/IEC 23005 (MPEG-V; Media Context and Control) standards to the Final Draft International Standard (FDIS). The new edition of ISO/IEC 23005-1 (architecture) enables ten new use cases, which can be grouped into four categories: 3D printing, olfactory information in virtual worlds, virtual panoramic vision in car, and adaptive sound handling. The new edition of ISO/IEC 23005-7 (conformance and reference software) is updated to reflect the changes made by the introduction of new tools defined in other parts of ISO/IEC 23005. More information on MPEG-V and its parts 1-7 can be found at https://mpeg.chiariglione.org/standards/mpeg-v.


Finally, the unofficial highlight of the 127th MPEG meeting we certainly found while scanning the scene in Gothenburg on Tuesday night…

MPEG127_Metallica

The V3C1 Dataset: Advancing the State of the Art in Video Retrieval

Download

In order to download the video dataset as well as its provided analysis data, please follow the instructions described here:

https://github.com/klschoef/V3C1Analysis/blob/master/README.md

Introduction

Standardized datasets are of vital importance in multimedia research, as they form the basis for reproducible experiments and evaluations. In the area of video retrieval, widely used datasets such as the IACC [5], which has formed the basis for the TRECVID Ad-Hoc Video Search Task and other retrieval-related challenges, have started to show their age. For example, IACC is no longer representative of video content as it is found in the wild [7]. This is illustrated by the figures below, showing the distribution of video age and duration across various datasets in comparison with a sample drawn from Vimeo and Youtube.

datasets1

 

datasets2

Its recently released spiritual successor, the Vimeo Creative Commons Collection (V3C) [3], aims to remedy this discrepancy by offering a collection of freely reusable content sourced from the video hosting platform Vimeo (https://vimeo.com). The figures below show the age and duration distributions of the Vimeo sample from [7] in comparison with the properties of the V3C.datasets3

datasets4

The V3C is comprised of three shards, consisting of 1000h, 1200h and 1500h of video content respectively. It consists not only of the original videos themselves, but also comes with video shot-boundary annotations, as well as representative key-frames and thumbnail images for every such video shot. In addition, all the technical and semantic video metadata that was available on Vimeo is provided as well. The V3C has already been used in the 2019 edition of the Video Browser Showdown [2] and will also be used for the TRECVID AVS Tasks (https://www-nlpir.nist.gov/projects/tv2019/) starting 2019 with a plan for future usage in the coming several years. This video provides an overview of the type of content found within the dataset

Dataset & Collections

The three shards of V3C (V3C1, V3C2, and V3C3) contain Creative Commons videos sourced from video hosting platform Vimeo. For this reason, the elements of the dataset may be freely used and publicly shared. The following table presents the composition of the dataset and the characteristics of its shards, as well as the information on the dataset as a whole.

Partition

V3C1

V3C2

V3C3

Total

File Size (videos)

1.3TB

1.6TB

1.8TB

4.8TB

File Size (total)

2.4TB

3.0TB

3.3TB

8.7TB

Number of Videos

7’475

9’760

11’215

28’450

Combined

Video Duration

1’000 hours,

23 minutes,

50 seconds

1’300 hours,

52 minutes,

48 seconds

1’500 hours,

8 minutes,

57 seconds

3801 hours,

25 minutes,

35 seconds

Mean Video Duration

8 minutes,

2 seconds

7 minutes,

59 seconds

8 minutes,

1 seconds

8 minutes,

1 seconds

Number of Segments

1’082’659

1’425’454

1’635’580

4’143’693

Similarly to IACC, V3C contains a master shot reference, which segments every video into non-overlapping shots based on the visual content of the videos. For every single shot, a representative keyframe is included, as well as the thumbnail version of that keyframe. Furthermore, for each video, identified by a unique ID, a metadata file is available that contains both technical as well as semantic information, such as the categories. Vimeo categorizes every video into categories and subcategories. Some of the categories were determined to be non-relevant for visual based multimedia retrieval and analytical tasks, and were dropped during the sourcing process of V3C. For simplicity reasons, subcategories were generalized into their parent categories and are, for this reason, not included. The remaining Vimeo categories are:

  • Arts & Design
  • Cameras & Techniques
  • Comedy
  • Fashion
  • Food
  • Instructionals
  • Music
  • Narrative
  • Reporting & Journals

Ground Truth and Analysis Data

As described above, the ground truth of the dataset consists of (deliberately over-segmented) shot boundaries as well as keyframes. Additionally, for the first shard of the V3C, the V3C1, we have already performed several analyses of the video content and metadata in order to provide an overview of the dataset [1]

In particular, we have analyzed specific content characteristics of the dataset, such as:

  • Bitrate distribution of the videos
  • Resolution distribution of the videos
  • Duration of shots
  • Dominant color of the keyframes
  • Similarity of the keyframes in terms of color layout, edge histogram, and deep features (weights extracted from the last fully-connected layer of GoogLeNet).
  • Confidence range distribution of the best class for shots detected by NasNet (using the best result out of the 1000 ImageNet classes) 
  • Number of different classes for a video detected by NasNet (using the best result out of the 1000 ImageNet classes)
  • Number of shots/keyframes for a specific content class
  • Number of shots/keyframes for a specific number of detected faces

This additional analysis data is available via GitHub, so that other researchers can take advantage of it. For example, one could use a specific subset of the dataset (only shots with blue keyframes, only videos with a specific bitrate or resolution, etc.) for performing further evaluations (e.g., for multimedia streaming, video coding, but also for image and video retrieval, of course). Additionally, due the public dataset and the analysis data, one could easily create an image and video retrieval system and use it either for participation in competitions like the Video Browser Showdown [2], or for submitting other evaluation runs (TRECVID Ad-hoc Video Search Task).

Conclusion

In the broad field of multimedia retrieval and analytics, one of the key components of research is having useful and appropriate datasets in place to evaluate multimedia systems’ performance and benchmark their quality. The usage of standard and open datasets enables researchers to reproduce analytical experiments based on these datasets and thus validate their results. In this context, the V3C dataset proves to be very diverse in several useful aspects (upload time, visual concepts, resolutions, colors, etc.). Also it has no dominating characteristics and provides a low self-similarity (i.e., few near duplicates) [3].

Further, the richness of V3C in terms of content diversity and content attributes enables benchmarking multimedia systems in close-to-reality test environments. In contrast to other video datasets (cf. YouTube-8M [4] and IACC [5]), V3C also provides a vast number of different video encodings and bitrates per second, so that it enables research focusing on video retrieval and analytical tasks regarding those attributes. The large number of different video resolutions (and to a lesser extent frame-rates) makes this dataset interesting for video transport and storage applications such as the development of novel encoding schemes, streaming mechanisms or error-correction techniques. Finally, in contrast to many current datasets, V3C also provides support for creating queries for evaluation competitions, such as VBS and TRECVID [6].

References

[1] Fabian Berns, Luca Rossetto, Klaus Schoeffmann, Christian Beecks, and George Awad. 2019. V3C1 Dataset: An Evaluation of Content Characteristics. In Proceedings of the 2019 on International Conference on Multimedia Retrieval (ICMR ’19). ACM, New York, NY, USA, 334-338.

[2] Jakub Lokoč, Gregor Kovalčík, Bernd Münzer, Klaus Schöffmann, Werner Bailer, Ralph Gasser, Stefanos Vrochidis, Phuong Anh Nguyen, Sitapa Rujikietgumjorn, and Kai Uwe Barthel. 2019. Interactive Search or Sequential Browsing? A Detailed Analysis of the Video Browser Showdown 2018. ACM Trans. Multimedia Comput. Commun. Appl. 15, 1, Article 29 (February 2019), 18 pages.

[3] Rossetto, L., Schuldt, H., Awad, G., & Butt, A. A. (2019). V3C–A Research Video Collection. In International Conference on Multimedia Modeling (pp. 349-360). Springer, Cham.

[4] Abu-El-Haija, S., Kothari, N., Lee, J., Natsev, P., Toderici, G., Varadarajan, B., & Vijayanarasimhan, S. (2016). Youtube-8m: A large-scale video classification benchmark. arXiv preprint arXiv:1609.08675.

[5] Paul Over, George Awad, Alan F. Smeaton, Colum Foley, and James Lanagan. 2009. Creating a web-scale video collection for research. In Proceedings of the 1st workshop on Web-scale multimedia corpus (WSMC ’09). ACM, New York, NY, USA, 25-32. 

[6] Smeaton, A. F., Over, P., and Kraaij, W. 2006. Evaluation campaigns and TRECVid. In Proceedings of the 8th ACM International Workshop on Multimedia Information Retrieval (Santa Barbara, California, USA, October 26 – 27, 2006). MIR ’06. ACM Press, New York, NY, 321-330.

[7] Luca Rossetto & Heiko Schuldt (2017). Web video in numbers-an analysis of web-video metadata. arXiv preprint arXiv:1707.01340.

JPEG Column: 83rd JPEG Meeting in Geneva, Switzerland

The 83rd JPEG meeting was held in Geneva, Switzerland.

The meeting was very dense due to the multiple activities taking place. Beyond the multiple standardization activities, like the new JPEG XL, JPEG Pleno, JPEG XS, HTJ2K or JPEG Systems, the 83rd JPEG meeting had the report and discussion of a new exploration study on the use of learning based methods applied to image coding, and two successful workshops, namely on digital holography applications and systems and the 3rd on media blockchain technology.

The new exploration study on the use of learning based methods applied to image coding was initiated at the previous 82nd JPEG meeting in Lisbon, Portugal. The initial approach provided very promising results and might establish a new alternative for future image representations.

The workshop on digital holography applications and systems, revealed the state of the art on industry applications and current technical solutions. It covered applications such as holographic microscopy, tomography, printing and display. Moreover, insights were provided on state-of-the-art holographic coding technologies and quality assessment procedures. The workshop allowed a very fruitful exchange of ideas between the different invited parties and JPEG experts.

The 3rd workshop of a series organized around media blockchain technology, had several talks were academia and industry shared their views on this emerging solution. The workshop ended with a panel where multiple questions were further elaborated by different panelists, providing the ground to a better understanding of the possible role of blockchain in media technology for the near future.

Two new logos for JPEG Pleno and JPEG XL, were approved and released during the Geneva meeting.

jpegpleno-logo  jpegxl-logo

The two new logos, for JPEG Pleno and JPEG XL

The 83rd JPEG meeting had the following highlights: 55540677_10156332786204370_7011318091044880384_n_h

  • New explorations studies of JPEG AI
  • The new Image Coding System JPEG XL
  • JPEG Pleno
  • JPEG XS
  • HTJ2K
  • JPEG Media Blockchain Technology
  • JPEG Systems – Privacy, Security & IPR, JPSearch and JPEG in HEIF

In the following a short summary of the most relevant achievements of the 83rd meeting in Geneva, Switzerland, are presented.

 

JPEG AI

The JPEG Committee is pleased to announce that it has started exploration studies on the use of learning-based solutions for its standards.

In the last few years, several efficient learning-based image coding solutions have been proposed, mainly with improved neural network models. These advances exploit the availability of large image datasets and special hardware, such as the highly parallelizable graphic processing units (GPUs). Recognizing that this area has received many contributions recently and it is considered critical for the future of a rich multimedia ecosystem, JPEG has created the JPEG AI AhG group to study promising learning-based image codecs with a precise and well-defined quality evaluation methodology.

In this meeting, a taxonomy was proposed and available solutions from the literature were organized into different dimensions. Besides, a list of promising learning-based image compression implementations and potential datasets to be used in the future were gathered.

JPEG XL

The JPEG Committee continues to develop the JPEG XL Image Coding System, a standard for image coding that offers substantially better compression efficiency than relevant alternative image formats, along with features desirable for web distribution and efficient compression of high quality images.

Software for the JPEG XL verification model has been implemented. A series of experiments showed promising results for lossy, lossless and progressive coding. In particular, photos can be stored with significant savings in size compared to equivalent-quality JPEG files. Additionally, existing JPEG files can also be considerably reduced in size (for faster download) while retaining the ability to later reproduce the exact JPEG file. Moreover, lossless storage of images is possible with major savings in size compared to PNG. Further refinements to the software and experiments (including enhancement of existing JPEG files, and animations) will follow.

JPEG Pleno

The JPEG Committee has three activities in JPEG Pleno: Light Field, Point Cloud, and Holographic image coding. A generic box-based syntax has been defined that allows for signaling of these modalities, independently or composing a plenoptic scene represented by different modalities. The JPEG Pleno system also includes a reference grid system that supports the positioning of the respective modalities. The generic file format and reference grid system are defined in Part 1 of the standard, which is currently under development. Part 2 of the standard covers light field coding and supports two encoding mechanisms. The launch of specifications for point cloud and holographic content is under study by the JPEG committee.

JPEG XS

The JPEG committee is pleased to announce the creation of an Amendment to JPEG XS Core Coding System defining the use of the codec for raw image sensor data. The JPEG XS project aims at the standardization of a visually lossless low-latency and lightweight compression scheme that can be used as a mezzanine codec in various markets. Among the targeted use cases for raw image sensor compression, one can cite video transport over professional video links (SDI, IP, Ethernet), real-time video storage in and outside of cameras, memory buffers, machine vision systems, and data compression onboard of autonomous cars. One of the most important benefit of the JPEG XS codec is an end-to-end latency ranging from less than one line to a few lines of the image.

HTJ2K

The JPEG committee is pleased to announce a significant milestone, with ISO/IEC 15444-15 High-Throughput JPEG 2000 (HTJ2K) submitted to ISO for immediate publication as International Standard. HTJ2K opens the door to higher encoding and decoding throughput for applications where JPEG 2000 is used today.

The HTJ2K algorithm has demonstrated an average tenfold increase in encoding and decoding throughput compared to the algorithm currently defined by JPEG 2000 Part 1. This increase in throughput results in an average coding efficiency loss of 10% or less in comparison to the most efficient modes of the block coding algorithm in JPEG 2000 Part 1 and enables mathematically lossless transcoding to and from JPEG 2000 Part 1 codestreams.

JPEG Media Blockchain Technology

In order to clearly identify the impact of blockchain and distributed ledger technologies on JPEG standards, the committee has organized several workshops to interact with stakeholders in the domain. The programs and proceedings of these workshop are accessible on the JPEG website:

  1. 1st JPEG Workshop on Media Blockchain Proceedings, ISO/IEC JTC1/SC29/WG1, Vancouver, Canada, October 16th, 2018

  2. 2nd JPEG Workshop on Media Blockchain Proceedings, ISO/IEC JTC1/SC29/WG1, Lisbon, Portugal, January 22nd, 2019

  3. 3rd JPEG Workshop on Media Blockchain Proceedings, ISO/IEC JTC1/SC29/WG1, Geneva, Switzerland, March 20th, 2019

A 4th workshop is planned during the 84th JPEG meeting to be held in Brussels, Belgium on July 16th, 2019. The JPEG Committee invites experts to participate to this upcoming workshop.

JPEG Systems – Privacy, Security & IPR, JPSearch, and JPEG-in-HEIF.

At the 83rd meeting, JPEG Systems realized significant progress towards improving users’ privacy with the DIS text completion of ISO/IEC 19566-4 “Privacy, Security, and IPR Features” which will be released for ballot. JPEG Systems continued to progress on image search and retrieval with the FDIS text release of JPSearch ISO/IEC 24800 Part 2- 2nd edition. Finally, support for JPEG 2000, JPEG XR, and JPEG XS images encapsulated in ISO/IEC 15444-12 are progressing towards IS stage; this enables these JPEG images to be encapsulated in ISO base media file formats, such as ISO/IEC 23008-12 High efficiency file format (HEIF).

Final Quote

“Intelligent codecs might redesign the future of media compression. JPEG can accelerate this trend by producing the first AI based image coding standard.” said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.

About JPEG

The Joint Photographic Experts Group (JPEG) is a Working Group of ISO/IEC, the International Organisation for Standardization / International Electrotechnical Commission, (ISO/IEC JTC 1/SC 29/WG 1) and of the International Telecommunication Union (ITU-T SG16), responsible for the popular JPEG, JPEG 2000, JPEG XR, JPSearch, JPEG XT and more recently, the JPEG XS, JPEG Systems, JPEG Pleno and JPEG XL families of imaging standards.

The JPEG Committee nominally meets four times a year, in different world locations. The 82nd JPEG Meeting was held on 19-25 January 2018, in Lisbon, Portugal. The next 84th JPEG Meeting will be held on 13-19 July 2019, in Brussels, Belgium.

More information about JPEG and its work is available at jpeg.org or by contacting Antonio Pinheiro or Frederik Temmermans of the JPEG Communication Subgroup.

If you would like to stay posted on JPEG activities, please subscribe to the jpeg-news mailing list.

Future JPEG meetings are planned as follows:

  • No 84, Brussels, Belgium, July 13 to 19, 2019
  • No 85, San Jose, California, U.S.A., November 2 to 8, 2019
  • No 86, Sydney, Australia, January 18 to 24, 2020

MPEG Column: 126th MPEG Meeting in Geneva, Switzerland

The original blog post can be found at the Bitmovin Techblog and has been modified/updated here to focus on and highlight research aspects.

The 126th MPEG meeting concluded on March 29, 2019 in Geneva, Switzerland with the following topics:

  • Three Degrees of Freedom Plus (3DoF+) – MPEG evaluates responses to the Call for Proposal and starts a new project on Metadata for Immersive Video
  • Neural Network Compression for Multimedia Applications – MPEG evaluates responses to the Call for Proposal and kicks off its technical work
  • Low Complexity Enhancement Video Coding – MPEG evaluates responses to the Call for Proposal and selects a Test Model for further development
  • Point Cloud Compression – MPEG promotes its Geometry-based Point Cloud Compression (G-PCC) technology to the Committee Draft (CD) stage
  • MPEG Media Transport (MMT) – MPEG approves 3rd Edition of Final Draft International Standard
  • MPEG-G – MPEG-G standards reach Draft International Standard for Application Program Interfaces (APIs) and Metadata technologies

The corresponding press release of the 126th MPEG meeting can be found here: https://mpeg.chiariglione.org/meetings/126

Three Degrees of Freedom Plus (3DoF+)

MPEG evaluates responses to the Call for Proposal and starts a new project on Metadata for Immersive Video

MPEG’s support for 360-degree video — also referred to as omnidirectional video — is achieved using the Omnidirectional Media Format (OMAF) and Supplemental Enhancement Information (SEI) messages for High Efficiency Video Coding (HEVC). It basically enables the utilization of the tiling feature of HEVC to implement 3DoF applications and services, e.g., users consuming 360-degree content using a head mounted display (HMD). However, rendering flat 360-degree video may generate visual discomfort when objects close to the viewer are rendered. The interactive parallax feature of Three Degrees of Freedom Plus (3DoF+) will provide viewers with visual content that more closely mimics natural vision, but within a limited range of viewer motion.

At its 126th meeting, MPEG received five responses to the Call for Proposals (CfP) on 3DoF+ Visual. Subjective evaluations showed that adding the interactive motion parallax to 360-degree video will be possible. Based on the subjective and objective evaluation, a new project was launched, which will be named Metadata for Immersive Video. A first version of a Working Draft (WD) and corresponding Test Model (TM) were designed to combine technical aspects from multiple responses to the call. The current schedule for the project anticipates Final Draft International Standard (FDIS) in July 2020.

Research aspects: Subjective evaluations in the context of 3DoF+ but also immersive media services in general are actively researched within the multimedia research community (e.g., ACM SIGMM/SIGCHI, QoMEX) resulting in a plethora of research papers. One apparent open issue is the gap between scientific/fundamental research and standards developing organizations (SDOs) and industry fora which often address the same problem space but sometimes adopt different methodologies, approaches, tools, etc. However, MPEG (and also other SDOs) often organize public workshops and there will be one during the next meeting, specifically on July 10, 2019 in Gothenburg, Sweden which will be about “Coding Technologies for Immersive Audio/Visual Experiences”. Further details are available here.

Neural Network Compression for Multimedia Applications

MPEG evaluates responses to the Call for Proposal and kicks off its technical work

Artificial neural networks have been adopted for a broad range of tasks in multimedia analysis and processing, such as visual and acoustic classification, extraction of multimedia descriptors or image and video coding. The trained neural networks for these applications contain a large number of parameters (i.e., weights), resulting in a considerable size. Thus, transferring them to a number of clients using them in applications (e.g., mobile phones, smart cameras) requires compressed representation of neural networks.

At its 126th meeting, MPEG analyzed nine technologies submitted by industry leaders as responses to the Call for Proposals (CfP) for Neural Network Compression. These technologies address compressing neural network parameters in order to reduce their size for transmission and the efficiency of using them, while not or only moderately reducing their performance in specific multimedia applications.

After a formal evaluation of submissions, MPEG identified three main technology components in the compression pipeline, which will be further studied in the development of the standard. A key conclusion is that with the proposed technologies, a compression to 10% or less of the original size can be achieved with no or negligible performance loss, where this performance is measured as classification accuracy in image and audio classification, matching rate in visual descriptor matching, and PSNR reduction in image coding. Some of these technologies also result in the reduction of the computational complexity of using the neural network or can benefit from specific capabilities of the target hardware (e.g., support for fixed point operations).

Research aspects: This topic has been addressed already in previous articles here and here. An interesting observation after this meeting is that apparently the compression efficiency is remarkable, specifically as the performance loss is negligible for specific application domains. However, results are based on certain applications and, thus, general conclusions regarding the compression of neural networks as well as how to evaluate its performance are still subject to future work. Nevertheless, MPEG is certainly leading this activity which could become more and more important as more applications and services rely on AI-based techniques.

Low Complexity Enhancement Video Coding

MPEG evaluates responses to the Call for Proposal and selects a Test Model for further development

MPEG started a new work item referred to as Low Complexity Enhancement Video Coding (LCEVC), which will be added as part 2 of the MPEG-5 suite of codecs. The new standard is aimed at bridging the gap between two successive generations of codecs by providing a codec-agile extension to existing video codecs that improves coding efficiency and can be readily deployed via software upgrade and with sustainable power consumption.

The target is to achieve:

  • coding efficiency close to High Efficiency Video Coding (HEVC) Main 10 by leveraging Advanced Video Coding (AVC) Main Profile and
  • coding efficiency close to upcoming next generation video codecs by leveraging HEVC Main 10.

This coding efficiency should be achieved while maintaining overall encoding and decoding complexity lower than that of the leveraged codecs (i.e., AVC and HEVC, respectively) when used in isolation at full resolution. This target has been met, and one of the responses to the CfP will serve as starting point and test model for the standard. The new standard is expected to become part of the MPEG-5 suite of codecs and its development is expected to be completed in 2020.

Research aspects: In addition to VVC and EVC, LCEVC is now the third video coding project within MPEG basically addressing requirements and needs going beyond HEVC. As usual, research mainly focuses on compression efficiency but a general trend in video coding is probably observable that favors software-based solutions rather than pure hardware coding tools. As such, complexity — both at encoder and decoder — is becoming important as well as power efficiency which are additional factors to be taken into account. Other issues are related to business aspects which are typically discussed elsewhere, e.g., here.

Point Cloud Compression

MPEG promotes its Geometry-based Point Cloud Compression (G-PCC) technology to the Committee Draft (CD) stage

MPEG’s Geometry-based Point Cloud Compression (G-PCC) standard addresses lossless and lossy coding of time-varying 3D point clouds with associated attributes such as color and material properties. This technology is appropriate especially for sparse point clouds.

MPEG’s Video-based Point Cloud Compression (V-PCC) addresses the same problem but for dense point clouds, by projecting the (typically dense) 3D point clouds onto planes, and then processing the resulting sequences of 2D images with video compression techniques.

G-PCC provides a generalized approach, which directly codes the 3D geometry to exploit any redundancy found in the point cloud itself and is complementary to V-PCC and particularly useful for sparse point clouds representing large environments.

Point clouds are typically represented by extremely large amounts of data, which is a significant barrier for mass market applications. However, the relative ease to capture and render spatial information compared to other volumetric video representations makes point clouds increasingly popular to present immersive volumetric data. The current implementation of a lossless, intra-frame G-PCC encoder provides a compression ratio up to 10:1 and acceptable quality lossy coding of ratio up to 35:1.

Research aspects: After V-PCC MPEG has now promoted G-PCC to CD but, in principle, the same research aspects are relevant as discussed here. Thus, coding efficiency is the number one performance metric but also coding complexity and power consumption needs to be considered to enable industry adoption. Systems technologies and adaptive streaming are actively researched within the multimedia research community, specifically ACM MM and ACM MMSys.

MPEG Media Transport (MMT)

MPEG approves 3rd Edition of Final Draft International Standard

MMT 3rd edition will introduce two aspects:

  • enhancements for mobile environments and
  • support of Contents Delivery Networks (CDNs).

The support for multipath delivery will enable delivery of services over more than one network connection concurrently, which is specifically useful for mobile devices that can support more than one connection at a time.

Additionally, support for intelligent network entities involved in media services (i.e., Media Aware Network Entity (MANE)) will make MMT-based services adapt to changes of the mobile network faster and better. Understanding the support for load balancing is an important feature of CDN-based content delivery, messages for DNS management, media resource update, and media request is being added in this edition.

On going developments within MMT will add support for the usage of MMT over QUIC (Quick UDP Internet Connections) and support of FCAST in the context of MMT.

Research aspects: Multimedia delivery/transport is still an important issue, specifically as multimedia data on the internet is increasing much faster than network bandwidth. In particular, the multimedia research community (i.e., ACM MM and ACM MMSys) is looking into novel approaches and tools utilizing exiting/emerging protocols/techniques like HTTP/2, HTTP/3 (QUIC), WebRTC, and Information-Centric Networking (ICN). One question, however, remains, namely what is the next big thing in multimedia delivery/transport as currently we are certainly in a phase where tools like adaptive HTTP streaming (HAS) reached maturity and the multimedia research community is eager to work on new topics in this domain.

Report from ACM MM 2018 – by Ana García del Molino

Seoul, what a beautiful place to host the premier conference on multimedia! Living in never-ending summer Singapore, I fell in love with the autumn colours of this city. The 26th edition of the ACM International Conference on Multimedia was held on October 22-26 of 2018 at the Lotte Hotel in Seoul, South Korea. It packed a full program including a very diverse range of workshops and tutorials, oral and poster presentations, art exhibits, interactive demos, competitions, industrial booths, and plenty of networking opportunities.

For me, this edition was a special one. About to graduate, with my thesis half written, I was presenting two papers. So of course, I was both nervous and excited. I had to fly to Seoul a few days ahead just to prepare myself! I was so motivated, I somehow managed to get myself a Best Social Media Reporter Award (who would have said… Me! A reporter!).

So, enough with the intro. Let’s get to the juice. What happened in Seoul between the 22nd and 26th of October 2018?

The first and last day of the conference were dedicated to Workshops and Tutorials. Those were a mix between Deep Learning themed and social applications of multimedia. The sessions included tutorials like “Interactive Video Search: Where is the User in the Age of Deep Learning?” that discussed the importance of the user in the collection of datasets, evaluation, and also interactive search, as opposed to using deep learning to solve challenges with big labelled datasets. In “Deep Learning Interpretation” Jitao Sang presented the main multimedia problems that can’t be addressed using deep learning. On the other hand, new and important trends related to social media (analysis of information diffusion and contagion, user activities and networking, prediction of real-world events, etc) were discussed in the tutorial “Social and Political Event Analysis using Rich Media”. The workshops were mainly user-centred, with special interest in affective computing and emotion analysis and use for multimedia (EE-USAD, ASMMC – MMAC 2018, AVEC 2018).

The conference kick-started with a wonderful keynote by Marianna Obrist. With “Don’t just Look – Smell, Taste, and Feel the Interaction” she showed us how to bring art into 4D by using technology, driving us through a full sensory experience that let us see, hear, and almost touch and smell. Ernest Edmonds also delved into how to mix art and multimedia in “What has art got to do with it?” but this time the other way around: what can multimedia research learn from the artists? Three industry speakers completed the keynote program. Xian-Sheng Hua from Alibaba Group shared their efforts towards visual Intelligence in “Challenges and Practices of Large-Scale Visual Intelligence in the Real-World”. Gary Geunbae Lee shared Samsung’s AI user experience strategy in “Living with Artificial Intelligence Technology in Connected Devices around Us.” And Bowen Zhou presented JD.com’s brand-new concept of Retail as a Service in “Transforming Retailing Experiences with Artificial Intelligence”.

This year’s program included 209 full papers, from a total of 757 submissions. 64 papers were allocated 15-minute oral presentations, while the others got a 90-second spotlight slot in the fast-forward sessions.  The poster sessions and the oral sessions run at the same time. While this was an inconvenience for poster presenters having to leave the poster to attend the oral sessions or miss them, the coffee breaks took place at the same location as the posters, so that was a win-win: chit-chat while having cookies and fruits? I’m in! In terms of content, half of the submissions were to only two areas: Multimedia and Vision and Deep Learning for Multimedia. But who am I to judge, when I had two of those myself! Many members of the community noted that the conference is becoming more and more deep learning, and less multimodal. To compensate, the workshops, tutorials and demos were mostly pure multimedia.

The challenges, competitions, art exhibits and demos happened in the afternoons, so at times it was hard to choose where to head to. So many interesting things happening all around the place! The art exhibit had some really cool interactive art installations, such as “Cellular Music”, that created music from visual motion. Among the demos, I found particularly interesting AniDance, an LSTM-based algorithm that made 3D models dance to the given music; SoniControl, an ultrasonic firewall for NFC protection; MusicMapp, a platform to augment how we experience music; and The Influence Map project, to explore who has influenced each scientist, and who did they most influence through their career.

Regarding diversity, I feel there is still a long way to go. Being in Asia, it makes sense that almost half of the attendees came from China. However, the submission numbers speak by themselves: less than 20% of submissions came from out of Asia, with just one submission from Africa (that’s a 0.13%!) Diversity is not only about gender, folks! I feel like more efforts are needed to facilitate the integration of more collectives in the multimedia community. One step at a time.

The next edition will take place at the NICE ACROPOLIS Convention Center in Nice, France from 21-25 October 2019. The ACM reproducibility badge system will be implemented for the first time at this 27th edition, so we may be seeing many more open-sourced projects. I am so looking forward to this!

On System QoE: Merging the system and the QoE perspectives

With Quality of Experience (QoE) research having made significant advances over the years, increased attention is being put on exploiting this knowledge from a service/network provider perspective in the context of the user-centric evaluation of systems. Current research investigates the impact of system/service mechanisms, their implementation or configurations on the service performance and how it affects the corresponding QoE of its users. Prominent examples address adaptive video streaming services, as well as enabling technologies for QoE-aware service management and monitoring, such as SDN/NFV and machine learning. This is also reflected in the latest edition of conferences such as the ACM Multimedia Systems Conference (MMSys ‘19), see some selected exemplary papers.

  • “ERUDITE: a Deep Neural Network for Optimal Tuning of Adaptive Video Streaming Controllers” by De Cicco, L., Cilli, G., & Mascolo, S.
  • “An SDN-Based Device-Aware Live Video Service For Inter-Domain Adaptive Bitrate Streaming” by Khalid, A., Zahran, H. & Sreenan C.J.
  • “Quality-aware Strategies for Optimizing ABR Video Streaming QoE and Reducing Data Usage” by Qin, Y., Hao, S., Pattipati, K., Qian, F., Sen, S., Wang, B., & Yue, C.
  • “Evaluation of Shared Resource Allocation using SAND for Adaptive Bitrate Streaming” by Pham, S., Heeren, P., Silhavy, D., Arbanowski, S.
  • “Requet: Real-Time QoE Detection for Encrypted YouTube Traffic” by Gutterman, C., Guo, K., Arora, S., Wang, X., Wu, L., Katz-Bassett, E., & Zussman, G.

For the evaluation of systems, proper QoE models are of utmost importance, as they  provide a mapping of various parameters to QoE. One of the main research challenges faced by the QoE community is deriving QoE models for various applications and services, whereby ratings collected from subjective user studies are used to model the relationship between tested influence factors and QoE. Below is a selection of papers dealing with this topic from QoMEX 2019; the main scientific venue for the  QoE community.

  • “Subjective Assessment of Adaptive Media Playout for Video Streaming” by Pérez, P., García, N., & Villegas, A.
  • “Assessing Texture Dimensions and Video Quality in Motion Pictures using Sensory Evaluation Techniques” by Keller, D., Seybold, T., Skowronek, J., & Raake, A.
  • “Tile-based Streaming of 8K Omnidirectional Video: Subjective and Objective QoE Evaluation” by Schatz, R., Zabrovskiy, A., & Timmerer, C.
  • “SUR-Net: Predicting the Satisfied User Ratio Curve for Image Compression with Deep Learning” by Fan, C., Lin, H., Hosu, V., Zhang, Y., Jiang, Q., Hamzaoui, R., & Saupe, D.
  • “Analysis and Prediction of Video QoE in Wireless Cellular Networks using Machine Learning” by Minovski, D., Åhlund, C., Mitra, K., & Johansson, P.

System-centric QoE

When considering the whole service, the question arises of how to properly evaluate QoE in a systems context, i.e., how to quantify system-centric QoE. The paper [1] provides fundamental relationships for deriving system-centric QoE,which are the basis for this article.

In the QoE community, subjective user studies are conducted to derive relationships between influence factors and QoE. Typically, the results of these studies are presented in terms of Mean Opinion Scores (MOS). However, these MOS results mask user diversity, which leads to specific distributions of user scores for particular test conditions. In a systems context, QoE can be better represented as a random variable Q|t for a fixed test condition. Such models are commonly exploited by service/network providers to derive various QoE metrics [2] in their system, such as expected QoE, or the percentage of users rating above a certain threshold (Good-or-Better ratio GoB).

Across the whole service, users will experience different performance, measured by e.g.,  response times, throughput, etc. which depend on the system’s (and services’) configuration and implementation. In turn, this leads to users experiencing different quality levels. As an example, we consider the response time of a system, which offers a certain web service, such as access to a static web site. In such a case, the system’s performance can be represented by a random variable R for the response time. In the system community, research aims at deriving such distributions of the performance, R.

The user centric evaluation of the system combines the system’s perspective and the QoE perspective, as illustrated in the figure below. We consider service/network providers interested in deriving various QoE metrics in their system, given (a) the system’s performance, and (b) QoE models available from user studies. The main questions we need to answer are how to combine a) user rating distributions obtained from subjective studies, and b) system performance condition distributions, so as to obtain the actual observed QoE distribution in the system? Moreover, how can various QoE metrics of interest in the system be derived?

System centric QoE - Merging the system and the QoE perspectives

System centric QoE – Merging the system and the QoE perspectives


Model of System-centric QoE

A service provider is interested in the QoE distribution Q in the system, which includes the following stochastic components: 1) system performance condition, t (i.e., response time in our example), and 2) user diversity, Q|t. This system-centric QoE distribution allows us to derive various QoE metrics, such as expected QoE or expected GoB in the system.

Some basic mathematical transformations allow us to derive the expected system-centric QoE E[Q], as shown below. As a result, we show that the expected system QoE is equal to the expected Mean Opinion Score (MOS) in the system! Hence, for deriving system QoE, it is necessary to measure the response time distribution R and to have a proper QoS-to-MOS mapping function f(t) obtained from subjective studies. From the subjective studies, we obtain the MOS mapping function for a response time t, f(t)=E[Q|t]. The system QoE then follows as E[Q] = E[f(R)]=E[M]. Note: The MOS M distribution in the system allows only to derive the expected MOS, i.e., expected system-centric QoE.

Expected system QoE E[Q] in the system is equal to the expected MOS

Expected system QoE E[Q] in the system is equal to the expected MOS

Let us consider another system-centric QoE metric, such as the GoB ratio. On a typical 5-point Absolute Category Rating (ACR) scale (1:bad quality, 5: excellent quality), the system-centric GoB is defined as GoB[Q]=P(Q>=4). We find that it is not possible to use a MOS mapping function f and the MOS distribution M=f(R) to derive GoB[Q] in the system! Instead, it is necessary to use the corresponding QoS-to-GoB mapping function g. This mapping function g can also be derived from the same subjective studies as the MOS mapping function, and maps the response time (tested in the subjective experiment) to the ratio of users rating “good or better” QoE, i.e., g(t)=P(Q|t > 4). We may thus derive in a similar way: GoB[Q]=E[g(R)]. In the system, the GoB ratio is the expected value of the response times R mapped to g(R). Similar observations lead to analogous results for other QoE metrics, such as quantiles or variances (see [1]).

Conclusions

The reported fundamental relationships provide an important link between the QoE community and the systems community. If researchers conducting subjective user studies provide different QoS-to-QoE mapping functions for QoE metrics of interest (e.g.,  MOS or GoB), this is enough to derive corresponding QoE metrics from a system’s perspective. This holds for any QoS (e.g., response time) distribution in the system, as long as the corresponding QoS values are captured in the reported QoE models. As a result, we encourage QoE researchers to report not only MOS mappings, but the entire rating distributions from conducted subjective studies. As an alternative, researchers may report QoE metrics and corresponding mapping functions beyond just those relying on MOS!

We draw the attention of the systems community to the fact that the actual QoE distribution in a system is not (necessarily) equal to the MOS distribution in the system (see [1] for numerical examples). Just applying MOS mapping functions and then using observed MOS distribution to derive other QoE metrics like GoB is not adequate. The current systems literature however, indicates that there is clearly a lack of a common understanding as to what are the implications of using MOS distributions rather than actual QoE distributions.

References

[1] Hoßfeld, T., Heegaard, P.E., Skorin-Kapov, L., & Varela, M. (2019). Fundamental Relationships for Deriving QoE in Systems. 2019 Eleventh International Conference on Quality of Multimedia Experience (QoMEX). IEEE 

[2] Hoßfeld, T., Heegaard, P. E., Varela, M., & Möller, S. (2016). QoE beyond the MOS: an in-depth look at QoE via better metrics and their relation to MOS. Quality and User Experience, 1(1), 2.

Authors

  • Tobias Hoßfeld (University of Würzburg, Germany) is heading the chair of communication networks.
  • Poul E. Heegaard (NTNU – Norwegian University of Science and Technology) is heading the Networking Research Group.
  • Lea Skorin-Kapov (University of Zagreb, Faculty of Electrical Engineering and Computing, Croatia) is heading the Multimedia Quality of Experience Research Lab
  • Martin Varela is working in the analytics team at callstats.io focusing on understanding and monitoring QoE for WebRTC services.

Multidisciplinary Column: An Interview with Max Mühlhäuser

 

Picture1

Could you tell us a bit about your background, and what the road to your current position was?

Well, this road is marked by wonderful people who inspired me and sparked my interest in the research fields I pursued. In addition, it is marked by two of my major deficiencies: I cannot stop to investigate the role of my research in the larger context of systems and disciplines, and I have the strong desire to see “inventions” by researchers make their way into practice i.e. turn into “innovations”. The first of these deficiencies led to the unusually broad research interests of my lab and myself, and the second one made me spend a substantial part of my career conceptualizing and leading technology transfer organizations, for the most part industry-funded ones.

More precisely, I started to cooperate with Digital Equipment Corp. (DEC) during the time of my Diploma thesis already. DEC was then the second largest computer manufacturer and spearhead of the efforts to build affordable “computers for every engineering group”. My boss, the late Professor Krüger, gave me a lot of freedom, so I was able to turn the research cooperation into the first funded European research project of DEC and later into their first research center in Europe, conceived as a campus-based organization that worked very closely with academia. I am proud to say that I was allowed to conceptualize this academia-industry cooperation and that it was later on copied – often with my help and consultancy – many times across the globe, by several companies and governments. I acted as the founding director of the first such center, but at that time I was already determined to follow the academic career path. At the age of 32, I was appointed professor at the university of Kaiserslautern. Over the years, I was offered positions at prestigious universities in Canada, France, and the Netherlands, and I accepted positions in Austria and Germany (Karlsruhe, Darmstadt). My sabbaticals led me to Australia, France and Canada, and for the most part to California (San Diego and four times Palo Alto). In retrospective it was exciting to start at a new academic position every couple of years in the beginning, but it was also exciting to “finally settle” in Darmstadt and to build the strengths and connections there that were necessary to drive even larger cooperative projects than before.

The Telecooperation Lab embraces many different disciplines. Celebrating its 20th birthday next year, how did these disciplines evolve over the years?

It started with my excitement for distributed systems, based on solid knowledge about computer networks. At the time (the early 1980s), little more than point-to-point communication between file transfer or e-mail agents existed, and neither client-server nor multi-party systems were common. My early interest in this field concerned software engineering for distributed systems, ranging from design and specification support via programming and simulation to debugging and testing. Soon, multimedia became feasible due to advancements in computer hardware– and in peripherals: think of the late laser disk, a clumsy predecessor of today’s DVDs and BDs. Multimedia grabbed my immediate attention since numerous problems arose from the interest to enable it in a distributed manner. Almost at the same time, e-learning became my favorite application field since I saw the great potential of distributed multimedia for this domain, given the challenges of global education and of the knowledge society. I believe that technology has come a long way with respect to e-learning, but we are still far from mastering the challenges of technology supported education and knowledge work.

Soon came the time when computers left the desk and became ubiquitous. From my experience in multimedia and e-learning, it was obvious to me that human computer interaction would be a key to the success of ubiquitous computing. Simply extrapolating the keyboard-mouse-monitor based interaction paradigm to a future where tens, hundreds, or thousands of computers would surround an individual –  what a nightmare! This threat of a dystopia made us work on implicit and tangible interaction, hybrid cyber-physical knowledge work, novel mobile and workspace interaction, augmented and virtual reality, and custom 3D printed interaction – HCI became our “new multimedia”.

Regarding applications domains, our research in supporting the knowledge society evolved towards supporting ‘smart environments and spaces’, a natural consequence of the evolution of our core research towards networked ubiquitous computers. My continued interest in turning inventions into innovations made us work on urgent problems of industry – mainly revolving around business processes – and on computers that expect the unexpected: emergencies and disasters. Both these domains were a nice fit since they could benefit from appropriate smart spaces. Looking at smart spaces of ever larger scale, we naturally hit the challenge of supporting smart cities and critical infrastructures.

Finally, a bit more than ten years ago, our ubiquitous computing research made us encounter and realize the “ubiquity” of related cybersecurity threats to at large, in particular threats to privacy and appropriate trustworthiness estimation and of detecting networked attacks. These cybersecurity research activities were, like those in HCI, natural consequences of my afore-mentioned deficiency: my desire to take a holistic look at systems – in my case, ubiquitous computing systems.

Finally, the fact that we adapt, apply and sometimes further machine learning concepts in our research is nothing but a natural consequence of the utility of those concepts for our purposes.

How would you describe the interrelationship between those disciplines? Do these benefit from cross-fertilization effects and if so, how?

In my answer to your last question, I unwillingly used the word “natural” several times. This shows already that research on ubiquitous computing and smart spaces with a holistic slant almost inevitably leads you to looking at the different aspects we investigate. These aspects just happen to concern different research disciplines in computer science. The starting point is the fact that ubiquitous computing devices are much less general-purpose computers than dedicated components. Networking and distributed systems support are therefore a prerequisite for orchestrating these dedicated skills, forming what can be called a truly smart space. Such spaces are usually meant to assist humans, so that multimedia – conveying “humane” information representations – and HCI – for interacting with many cooperating dedicated components – are indispensable. Next, how can a smart space assist a human if it is subject to cyber-vulnerabilities? Instead, it has to enforce its users’ concerns with respect to privacy, trust, and intended behavior. Finally, true smartness is by nature bound to adopting and adapting best-of-breed AI techniques.

You also asked for cross-fertilizing effects. Let me share just three of the many examples in this respect. (i) Our AI related work cross-feritlized our cyberattack defense. (ii) On the other hand, the AI work introduced new challenges in distributed and networked systems, driving our research on edge computing forward. (iii) New requirements are added to this edge computing research by HCI since we want to support collaborative AR applications at large i.e. city-wide scale.

Moreover, cross-fertilizing goes beyond the research fields of computer science that we integrate in my own lab. As you know, I was and am heading highly interdisciplinary doctoral schools, formerly on e-learning, and now on privacy and trust for mobile users. When you work with researchers from sociology, law, economics, and psychology on topics like privacy protecting Smartphones, you first consider these topics as pertaining to computer science. Soon, you realize that the other disciplines dealt with issues like privacy and trust long before computers existed. Not only can you learn a lot from the deep and concise findings brought forth by these disciplines for decades or centuries, you can quickly establish a very fruitful cooperation with researchers from these disciplines who address the new challenges of mobile and ubiquitous computing from their perspective. I am convinced that the unique role of Xerox PARC in the history of computer science, with so many of the most fundamental innovations originating there, is mainly a consequence of their highly interdisciplinary approaches, combining the “science of computers” with the “sciences concerned with humans”.

Please tell us about the main challenges you faced when uniting such diverse topics under the Telecooperation Lab’s multi-disciplinary umbrella?

The major challenge lies in a balancing act for each PhD thesis and researcher. On one hand, the work must be strictly anchored in a narrow academic field; as a young researcher, you are lucky if you can make yourself a bit of a name in a single narrow community–which is a prerequisite for any further academic career steps for many reasons. Trying to get rooted in more than one community during a PhD would be what I call academic suicide. The second side of the balancing act, for us, is the challenge to keep that narrow and focused PhD well connected to the multi-area context of my lab – and for the members of the doctoral schools, even connected to the respective multi-disciplinary context. While this second side is not a prerequisite for a PhD, it is an inexhaustible source of both new challenges for, and new approaches to, the respective narrow PhD fields. In fact, reaching out to other fields while mastering your own field costs some additional time; in my experience, however, this additional time can easily be spared in the search for original scientific contributions that will earn you a PhD. The reason is that the cross-fertilizing from a multi-area or even multi-disciplinary setting will lead you to original contributions much faster, due to a fresh look at both, challenges and approaches.

When it comes to Postdoctoral researchers, things are a bit different since they are already rooted in a field, which means that they can reach out a bit further to other areas and disciplines, thereby creating a unique little research domain in which they can make themselves a name for their further career. My aim for my postdocs is to help them attain a status where, when I mention their name in a pertinent academic circle, my colleagues would say “oh, I know, that’s the guy who is working on XYZ”, with XYZ being a concise subdomain of research which that postdoc was instrumental in shaping.

The Telecooperation Lab is part of CRISP, the National Research Center for Applied Cybersecurity in Germany, which embraces many disciplines as well. Can you give us some insights into multidisciplinarity in such an environment?

Let me start by explaining that we started the first large cybersecurity research center in Darmstadt more than ten years ago, CRISP in its current form as a national center has only started to exist. By the way, CRISP will have to be renamed again for legal reasons (sigh!). Therefore, let me address our cybersecurity research in general. This research involved a very broad spectrum of disciplines, from physicists that address quantum related aspects to psychologists that investigate usable security and mental models. The most fruitful cooperations always concern areas that establish a “mutual benefits and challenges” relationship with the computer science side of cybersecurity. Two examples that come to my mind are The Laws and Economics. Computer science solutions to security and privacy always have limits. For instance, cryptographic solutions are always linked to trust at their boundaries (cf. trusted certificate authorities, trusted implementations of theoretically “proven-secure” protocols, trust in the absence of insider threats etc.). At such boundaries, law must punish what technology cannot guarantee, otherwise the systems remain insecure. In the reverse direction, new technical possibilities and solutions must be reflected in law. A prominent example is the power of AI: privacy law, such as the European Union’s GDPR, holds data processing organizations liable if they process personally identifiable information, PII for short. If data is not considered to be PII, it can be released. Now what if, three years later, a novel AI algorithm can link that data to some background data and infer PII from it? Privacy law needs a considerable update due to these new technical possibilities. I could talk about these mutual benefits and challenges on and on, but let me just quickly mention one more example from economics: if technology comes up with new privacy preserving schemes then these schemes may open up new opportunities for privacy-respecting services. In order for such services to succeed in the market, we need to learn about possible corresponding business models. This kind of economics research may lead to new challenges for technical approaches, and so on. Such “cycles of innovation” across different disciplines are among the most exciting facets of interdisciplinary research.

Could you name a grand challenge of multidisciplinary research in the Multimedia community?

Oh, I think I have a quite dedicated opinion on this one! We clearly live in the era of the fusion of bits and atoms – and this metaphor is of course just one way to characterize what is going on. Firstly, in the cyber-physical society that we are currently creating, the digital components are becoming the “brains” of complex real-world systems such as the transport system, energy grids, industrial production etc. This development creates already significant challenges concerning our future society, but beyond this trend and directly related to multimedia, there is an even more striking development: we increasingly feed the human senses by means of digitally created or processed signals – and hence, basically by means of multimedia. TV and telephone, social media and Web based information, Skype conversations and meetings, you-name-it: our perception of objects, spaces, and of our conversation partners – in other words: of the physical world – is conveyed, augmented, altered, and filtered by means of computers and computer networks. Now, you will ask what I consider the challenge in this development that goes on since decades. Consider that this field “jumps forward” in our days due to AI and other advancements: it is the challenge for interdisciplinary multimedia research to properly conserve the distinction between “real” and “imaginary” in all cases where we would or should conserve it. To cite a field that is only marginally concerned here, let me mention games: in games, it is – mostly – desired to blur the distinction between the real and the virtual. However, if you think of fake news or of highly persuasive social media governmental election campaigns, you get an idea of what I mean. The challenge here is highly multidisciplinary: for instance, many computer science areas have to come together already in order to check where in the media processing chain we can intervene in order to keep a handle on the real-versus-virtual distinction. Way beyond that, we need many disciplines to work hand-in-hand in order to figure out what we want and how we can achieve it. We have to recognize that many long-existing trends are at the fringe of jumping forward to an unprecedented level of perfection. We must figure out what society needs and wants. It is reckless to leave this development to economic or even malicious forces or to tech nerds who invent their own ethics. The examples are endless, let me cite a few in addition to those mentioned above, highlighting fake news and manipulative election campaigns.

Machine learning experts may call me paranoid, hinting at the fact that the detection of manipulated photos or deep fake videos is still a much simpler machine learning task than creating them. While this is true, I fear that it may change in the future. Moreover, alluding to the multidisciplinary challenges mentioned, let me remind you that we currently don’t have processes in place that would sufficiently check content for authenticity in a systematic way.

As another example, humans are told they are “valued customers”, but they are since long considered as consumers at best. More recently, they are downgraded to mass objects in which purchase desires are first created then directed–by sophisticated algorithms and with ever more convincing multimedia content. Meanwhile in the background, pricing discrimination is rising to new levels of sophistication. On a different field, questionable political powers are more and more capable of destabilizing democracies from a save seat across the Internet, using curated and increasingly machine-created influential media.

As a next big wave, we are witnessing a giants’ race among global IT players for the crown in the augmented and virtual reality markets. What is still a niche area may become wide spread technology tomorrow – reckon that the first successful smartphone was introduced only little more than a decade ago and that meanwhile the majority of the world’s population use Smartphones to access the Internet. A similar success story may lie ahead for AR/VR: at the latest when a generation grows up wearing AR contact lenses, noise-cancelling earplugs and haptics-augmented cloths, reality will not be threatened by fake information any more but digitally created, imaginary content will be reality, rendering the question “what is real?” obsolete. Of course, the list of technologies and application domains mentioned here is by far non-exhaustive.

The problem is that all these trends appear to be evolutionary, not disruptive as they are. Marketing has influenced customers already centuries ago, fake news existed even longer, and the movie industry has always had a leading role in imaginary technology, from chroma keying to the most advanced animation techniques. Therefore, the new and upcoming AI-powered multimedia technology is not (yet) recognized as disruptive and hence as a considerable threat to the fundamental rules of our society. This is a key reason why I consider this field a grand interdisciplinary research challenge. We need definitely far more than technology solutions. As an outset, we need to come to grips with appropriate ethical and socio-political norms. To what extend do we want to keep and protect the governing rules of society and humankind? Which changes do we want, which ones not? What does all that mean in terms of governing rules for AI-powered multimedia, for the merging of the real and the virtual? Apart from basic research, we need a participatory approach that involves society in general and the rising generations in particular. Since we cannot expect these fundamental societal process to lead to a final decision, we have to advance the other research challenges in parallel. For instance, we need a better understanding of social implications and of psychological factors related to the merge of the real and the virtual. Technology-related research must be intertwined with these efforts; as to technology fields concerned, multimedia research must go hand-in-hand with others like AI, cybersecurity, privacy, etc. –the selection depends on the particular questions addressed. This research must be further intertwined with human-related fields such as Law: laws must again regulate what technology can’t solve, and reflect what technology can achieve for the good or the evil. In all this, I did not yet mention further related issues like for instance biometric access control: as we try to make access control more user friendly, we rely on biometric data, most of which are variants of multimedia, namely speech, face or iris photos, gait and others. The difference between real and virtual remain important here and we can expect enormous malicious efforts to blur it.  You see, there is really a multidisciplinary grand challenge for multimedia.

How and in what form do you feel we as academics can be most impactful?

During the first half of my career, computer science was still in that wonderful gold diggers’ era: if you had a good idea and just decent skills to convey it to your academic peers, you could count on that idea to be heart, valued, and – if it was socially and economically viable – realized. Since then, we have moved to a state in which good research results are not even half the story. Many seemingly marginal factors drive innovation today. No wonder have we reached a point at which many industry players think that innovation should be driven by the company’s product groups in a close loop with customers, or by startups that can be acquired if successful, or – for the small part that requires long-term research – by a few top research institutions. I am confident that this opinion will be replaced by a new craze among CEOs in a few years. Meanwhile, academics should do there homework in three ways. (a) They should look for the true kernel in the current anti-academic trend and improve academic research accordingly. (b) They should orient their research towards the unique strength of academia, like the possibility to carry out true interdisciplinary research at universities. (c) They should tune their role, their words and deeds to those much-increased societal responsibilities highlighted above.

Academics from computer science trigger confusion and reshaping of our society to a bigger and bigger extend; it is time for them to live up to their responsibility.


Bios

Prof. Dr. Max Mühlhäuser is head of the Telecooperation Lab at Technische Universität Darmstadt, Informatics Dept. His Lab conducts research on smart ubiquitous computing environments for the ‘pervasive Future Internet’ in three research fields: middleware and large network infrastructures, novel multimodal interaction concepts, and human protection in ubiquitous computing (privacy, trust, & civil security). He heads or co-supervises various multilateral projects, e.g., on the Internet-of-Services, smart products, ad-hoc and sensor networks, and civil security; these projects are funded by the National Funding Agency DFG, the EU, German ministries, and industry. Max is heading the doctoral school Privacy and Trust for Mobile Users and serves as deputy speaker of the collaborative research center MAKI on the Future Internet. Max has also led several university wide programs that fostered E-Learning research and application. In his career, Max put a particular emphasis on technology transfer, e.g., as the founder and mentor of several campus-based industrial research centers.

Max has over 30 years of experience in research and teaching in areas related to Ubiquitous Computing (UC), Networks, Distributed Multimedia Systems, E-Learning, and Privacy&Trust. He held permanent or visiting professorships at the Universities of Kaiserslautern, Karlsruhe, Linz, Darmstadt, Montréal, Sophia Antipolis (Eurecom), and San Diego (UCSD). In 1993, he founded the TeCO institute (www.teco.edu) in Karlsruhe, Germany, which became one of the pace-makers for Ubiquitous Computing research in Europe. Max regularly publishes in Ubiquitous and Distributed Computing, HCI, Multimedia, E-Learning, and Privacy&Trust conferences and journals and authored or co-authored more than 400 publications. He was and is active in numerous conference program committees, as organizer of several annual conferences, and as member of editorial boards or guest editor for journals like Pervasive Computing, ACM Multimedia, Pervasive and Mobile Computing, Web Engineering, and Distance Learning Technology.

Editor Biographies

Cynthia_Liem_2017Dr. Cynthia C. S. Liem is an Assistant Professor in the Multimedia Computing Group of Delft University of Technology, The Netherlands, and pianist of the Magma Duo. She initiated and co-coordinated the European research project PHENICX (2013-2016), focusing on technological enrichment of symphonic concert recordings with partners such as the Royal Concertgebouw Orchestra. Her research interests consider music and multimedia search and recommendation, and increasingly shift towards making people discover new interests and content which would not trivially be retrieved. Beyond her academic activities, Cynthia gained industrial experience at Bell Labs Netherlands, Philips Research and Google. She was a recipient of the Lucent Global Science and Google Anita Borg Europe Memorial scholarships, the Google European Doctoral Fellowship 2010 in Multimedia, and a finalist of the New Scientist Science Talent Award 2016 for young scientists committed to public outreach.

 

 

 

jochen_huberDr. Jochen Huber is a Senior User Experience Researcher at Synaptics. Previously, he was an SUTD-MIT postdoctoral fellow in the Fluid Interfaces Group at MIT Media Lab and the Augmented Human Lab at Singapore University of Technology and Design. He holds a Ph.D. in Computer Science and degrees in both Mathematics (Dipl.-Math.) and Computer Science (Dipl.-Inform.), all from Technische Universität Darmstadt, Germany. Jochen’s work is situated at the intersection of Human-Computer Interaction and Human Augmentation. He designs, implements and studies novel input technology in the areas of mobile, tangible & non-visual interaction, automotive UX and assistive augmentation. He has co-authored over 60 academic publications and regularly serves as program committee member in premier HCI and multimedia conferences. He was program co-chair of ACM TVX 2016 and Augmented Human 2015 and chaired tracks of ACM Multimedia, ACM Creativity and Cognition and ACM International Conference on Interface Surfaces and Spaces, as well as numerous workshops at ACM CHI and IUI. Further information can be found on his personal homepage: http://jochenhuber.com

An interview with Professor Pål Halvorsen

Describe your journey into research from your youth up to the present. What foundational lessons did you learn from this journey? Why were you initially attracted to multimedia?

I remember when I was about 14 years old and had an 8th grade project where we were to identify what we wanted to do in the future and the road to get there. I had just recently discovered the world of computers and so reported several ways to become a computer scientist. After following the identified path to the University of Oslo, graduating with a Bachelor in computer science, my way into research was more by chance, or maybe even by accident. At that time, I spent a lot of time on sports and was not sure what to do for my master thesis. However, I was lucky. I found an interesting topic in the area of system support for multimedia, mainly video. I guess my supervisors liked the work because they later offered me a PhD position (thanks!) where they brought me deeper into the world of multimedia systems research.

My supervisors then helped me to get an associate professor position at the university (thanks again!). I got to know more colleagues, all inspiring me to continue research in the area of multimedia. After a couple of years performing research as a continuation of my PhD and teaching system related courses, I got an opportunity to join Simula Research Laboratory together with Carsten Griwodz. A bit later, we started our own small research group at Simula, and it is still a great place to be.

I think it is safe to say my path has been to a large degree influenced by some of the great people that I have met. You cannot do everything yourself, and I have been blessed with a lot of very good colleagues and friends. As a PhD student, I was told that after a year I should know more about my topic than my supervisors. It sounded not possible, but after having supervised a number of students myself, I believe it is true! Another friend and colleague also said that he had learned everything he knew from his students. Again, very correct – my students (and colleagues) have taught me a lot (thanks!). Thus, my main take home message is to find an area that interests you and nice people to work with! You can accomplish a lot as a good team!  

Regarding my research interests, I initially found an interest in how efficient a computer system could be. I became fascinated by delivery of continuous media early on, and the “system support for multimedia” quickly became my area. After years of reporting an X% improvement of component Y, an interest of the complete end-to-end system rose. I have had a wish to build complete systems. So today, our research group does not only aim to improve individual components but also the entire pipeline in a holistic system – especially in the area of sports and medicine – where we can see the effects of the systems we deploy.

Pål Halvorsen at the beginning of his career

Pål Halvorsen at the beginning of his career as a computer scientist

Tell us more about your vision and objectives behind your current roles? What do you hope to accomplish and how will you bring this about?

Currently, I have several roles. My main position is with SimulaMet, a research center established by Simula Research Laboratory and Oslo Metropolitan University (OsloMet). I also recently moved my main university affiliation to OsloMet while still having a small adjunct professor position at University of Oslo. Both my research and teaching activities are related to my previously stated interests, and the combination of universities and research center is a perfect match for me, enabling a good mix of students and seniors.

I hope to be able to deliver results back into real systems, so that our results are not only published and then forgotten in a dark drawer somewhere. In this respect, we have contact with several real life “problem owners”, mainly in sports and medicine. To bring our results beyond research prototypes, we have also spun off both a sport and a medical company, achieving the vision of having real impact. The fact that we now run our systems for the two top soccer leagues in both Norway and Sweden is an example of our aims being fulfilled. Hopefully, we can soon say similar things in the medical scenario – that medical experts are assisted using our research-based systems!  

Can you profile your current research, its challenges, opportunities, and implications?

Having the end-to-end view, it is hard to make a short answer. We are trying to optimize both single components and the entire pipeline of components. Thus, we are doing a lot of different things. Our challenges are not only related to a specific requirement or a component, but also its integration into a system as a whole. We also address a number of real world applications. As you can see, the variety in our research is large.

However, there are also large opportunities in that the systems are researched and developed with real requirements and wishes in mind. Thus, if we succeed, there is a chance that we might actually have some impact. For example, in sports, we have three deployed systems in use.

How would you describe your top innovative achievements in terms of the problems you were trying to solve, your solutions, and the impact it has today and into the future?

Together with colleagues at Simula, University of Oslo and University of Tromsø, we have been lucky to find some interesting and usable solutions. For example, at the system level, we have solutions (code) included in the Linux kernel, and at the application level, or as efficient complete system providing functionality beyond existing systems, we have running (prototype) systems in both the areas of sport and medicine.

Pål Halvorsen today

Pål Halvorsen in his office in 2019

Over your distinguished career, what are your top lessons you want to share with the audience?

Well, first, I do not think you can call it “distinguished”. This is your description.

The most important thing for me is to have some fun. You must like what you do, and you must find people you enjoy working with. There are a lot of interesting challenges out there. You must just find yours.

What is the best joke you know?

Hehe, I am so bad at jokes. Every ten years, I might have a catchy comment, but I hardly ever tell jokes.

If you were conducting this interview, what questions would you ask, and what would be your answers?

Haha, I am not a man of many words, so I would probably just stick to the set of questions I was given and hoping it would soon be finished 😉

So, maybe this one last question:

Q: Anything to add?

A: No. (Both short since I have to both Q and A)


Bios

Professor Pål Halvorsen: 

Pål Halvorsen is a chief research scientist at SimulaMet, a professor at OsloMet University, an adjunct professor at University of Oslo, Norway, and the CEO of ForzaSys AS. He received his doctoral degree (Dr.Scient.) in 2001.  His research focuses mainly on complete  distributed multimedia systems including operating systems, processing, storage and retrieval, communication and distribution from a performance and efficiency point of view. He is a member of the IEEE and ACM. More information
can be found at http://home.ifi.uio.no/paalh

Pia Helén Smedsrud: 

Pia Helén Smedsrud is a PhD student at Simula Research Laboratory in Oslo, Norway. She has a medical degree from UiO (University of Oslo), and worked as a medical doctor before starting as a research trainee in the field of computer science at Simula. She also has a background from journalism. Her research interests include medical multimedia, clinical implementation and machine learning. Currently, she is doing her PhD in the intersection between informatics and medicine, on machine learning in endoscopy.

Opinion Column: Evolution of Topics in the Multimedia Community

For this edition of the SIGMM Opinion Column, we asked members of the Multimedia community to share their impressions about the shift of scientific topics in the community over the years, namely the evolution of “traditional” and “emerging” Multimedia topics. 

This subject has emerged in several conversations over the 2 years of history of the SIGMM Opinion Column, and we report here a summary of recent and old discussions, happened over different channels – our Facebook group, the SIGMM Linkedin group, and in-person conversations between the column editors and MM researchers – with hopes, fears and opinions around this problem. We want to thank all participants for their precious contribution to these discussions.

Historical Perspective of Topics in ACM MM

opinion11_2_1This year, ACM Multimedia turns 27. Today, MM is a large premium conference with hundreds of paper submissions every year, spanning 12 different thematic areas spanning across the wide spectrum of multimedia topics. But back at the beginning of MM’s history, the scale of the topic range was very different.

In the first editions of the conference, a general call for papers encouraged submissions about “technology, tools and techniques for the construction and delivery of high quality, innovative multimedia systems and interfaces”. Already in its 3rd edition, MM featured an Arts and Multimedia program. Starting from 2004, the conference offered three tracks for paper submissions: content (Multimedia analysis, processing, and retrieval), Systems (Multimedia networking and system support), and Applications (Multimedia tools, end-systems, and applications), plus a “Brave New Topics” track for work-in-progress submissions. Later on, the Human-Centered Multimedia track was included in the projects. In 2011, after a conference review, the ACM MM program went beyond the notion of “tracks”, and the concept of areas was introduced to allow the community to “solicit papers from a wide range of timely multimedia-related topics” (see the ACMM11 website). In 2014, the areas became 14, including, among others, Music, Speech and Audio Processing in Multimedia, and Social Media and Collective Online Presence. After a retreat in 2014, starting from 2015, areas are grouped in larger “Themes”, the core thematic areas of ACM Multimedia. After the last retreat in 2014, no major changes were introduced in the thematic structure of the conference.

Dynamics of Evolution Emerging Topics

Emerging topics and less mature works are generally welcome at conferences’ workshops. In our discussions, most members of the community agree that “you’ll see great work there, and very fruitful discussions due to the common focus on the workshop theme”. When emerging topics become more popular, they can be promoted to conference areas, as it happened for the “music, speech and audio” theme. 

It was observed in our community conversations that, while this upgrade to the main conference is great for visibility, being a separate, relatively novel area could lead to isolation: the workload for reviewers specialized on emerging topics could become too high, given that they are assigned to works in other areas; and the flat acceptance rate across all conference themes could mean that even accepting 2 submissions from an emerging topic area would give ‘unreasonably’ high acceptance rate, thus leading to many good papers (even with 3 accepts) having to be rejected. Participants to our forums noticed that these dynamics are somehow “counteracted the ‘Multimedia’ and multidisciplinary nature of the field”, they prevent conferences from growing and eventually hurt emerging topics. One solution proposed to balance this effect is to “maintain a solid specialized reviewer pool (where needed managed by someone from the field), which however would be distributed over relevant MM areas”, rather than forming a new area.

It was also noted that some emerging topics in their early stage would most likely not have an appropriate workshop. Therefore, it is important for the main conference to have places to accept such early works, thus making tracks such as the short paper tracks or the brave new idea track absolutely crucial for the development of  novel topics.

The Near-Future of Multimedia

In multiple occasions, MM community members shared their thoughts about how they would like to see the Multimedia community evolve around new topics.

There are a few topics that emerged in the past and that the community wishes they continued growing, and these include interactive Multimedia applications, as well as music-related Multimedia technology, Multimedia in cooking spaces, and arts and Multimedia. It was also pointed out that, although very important for Multimedia applications, topics around compression technology are also often given low weight in Multimedia spaces, and that MM should encourage submissions in the domain of machine learning concepts applied to compression.

There are also a few areas that are emerging across different sub-communities in computer science, and that, according to our community members, we should be encouraging to grow within the Multimedia field as well. These include works in digital health exploring the power of Multimedia for health care and monitoring, research around applications of Multimedia for good, understanding how the technologies we develop can help having a real impact on society, and discussions around the ethics and responsibility of Multimedia technologies, encouraging fair, transparent, inclusive and accountable Multimedia tools.

The Future of Multimedia

The future of MM according to the participants of the discussion goes beyond the forms we know today, as new technologies could significantly broaden and shake the current applicative paradigm of Multimedia. 

The upcoming 5G technology will enable a plethora of applications that are now extremely limited by the lack of bandwidth. This could go from mobile virtual reality, to interconnection with objects and, of course, smart cities. To extract meaningful information to be presented to the user, various and highly diverse data streams will need to be treated consistently. And Multimedia researchers will develop methods, applications, systems and models to understand how to properly develop and impact this field. Likewise, this technology will push the limits of what is currently possible in terms of content demand and interaction with connected objects. We will see technologies for hyper-personalization, dynamic user interaction and real-time video personalization. These technologies will be enabled by the study of how film grammar and storytelling works for novel content types like AR, VR, panoramic and 360° video, by research around novel immersive media experiences, and by the design of new media formats, with novel consumption paradigms.

Multimedia has a bright future, with new, exciting emerging topics to be discussed and encouraged. Perhaps time for a new retreat or for a conference review?