MPEG Column: 101st MPEG Meeting

The 101st MPEG meeting in Sweden

MPEG news: a report from the 101st meeting, Stockholm, Sweden

The 101st MPEG meeting in Sweden

The 101st MPEG meeting was held in Stockholm, Sweden, July 16-20, 2012. The official press release can be found here and I would like to highlight the following topics:

  • MPEG Media Transport (MMT) reaches Committee Draft (CD)
  • High-Efficiency Video Coding (HEVC) reaches Draft International Standard (DIS)
  • MPEG and ITU-T establish JCT-3V
  • Call for Proposals: HEVC scalability extensions
  • 3D audio workshop
  • Green MPEG

MMT goes CD

The Committee Draft (CD) of MPEG-H part 1 referred to as MPEG Media Transport (MMT) has been approved and will be publicly available after an editing period which will end Sep 17th. MMT comprises the following features:

  • Delivery of coded media by concurrently using more than one delivery medium (e.g., as it is the case of heterogeneous networks).
  • Logical packaging structure and composition information to support multimedia mash-ups (e.g., multiscreen presentation).
  • Seamless and easy conversion between storage and delivery formats.
  • Cross layer interface to facilitate communication between the application layers and underlying delivery layers.
  • Signaling of messages to manage the presentation and optimized delivery of media.

This list of ‘features’ may sound very high-level but as the CD usually comprises stable technology and is publicly available, the research community is more than welcome to evaluate MPEG’s new way of media transport. Having said this, I would like to refer to the Call for Papers of  JSAC’s special issue on adaptive media streaming which is mainly focusing on DASH but investigating its relationship to MMT is definitely within the scope.

HEVCs’ next step towards completion: DIS

The approval of the Draft International Standard (DIS) brought the HEVC standard one step closer to completion. As reported previously, HEVC shows inferior performance gains compared to its predecessor and real-time software decoding on the iPad 3 (720p, 30Hz, 1.5 Mbps) has been demonstrated during the Friday plenary [12]. It is expected that the Final Draft International Standard (FDIS) is going to be approved at the 103rd MPEG meeting in January 21-25, 2013. If the market need for HEVC is only similar as it was when AVC was finally approved, I am wondering if one can expect first products by mid/end 2013. From a research point of view we know – and history is our witness – that improvements are still possible even if the standard has been approved some time ago. For example, the AVC standard is now available in its 7th edition as a consolidation of various amendments and corrigenda.


After the Joint Video Team (JVT) which successfully developed standards such as AVC, SVC, MVC and the Joint Collaborative Team on Video Coding (JCT-VC), MPEG and ITU-T establish the Joint Collaborative Team on 3D Video coding extension development (JCT-3V). That is, from now on MPEG and ITU-T also joins forces in developing 3D video coding extensions for existing codecs as well as the ones under development (i.e., AVC, HEVC). The current standardization plan includes the development of AVC multi-view extensions with depth to be completed this year and I assume HEVC will be extended with 3D capabilities once the 2D version is available.

In this context it is interesting that a call for proposals for MPEG Frame Compatible (MFC) has been issued to address current deployment issues of stereoscopic videos. The requirements are available here.

Call for Proposals: SVC for HEVC

In order to address the need for higher resolutions – Ultra HDTV – and subsets thereof, JCT-VC issued a call for proposals for HEVC scalability extensions. Similar to AVC/SVC, the requirements include that the base layer should be compatible with HEVC and enhancement layers may include temporal, spatial, and fidelity scalability. The actual call, the use cases, and the requirements shall become available on the MPEG Web site.

MPEG hosts 3D Audio Workshop

Part 3 of MPEG-H will be dedicated to audio, specifically 3D audio. The call for proposals will be issues at the 102nd MPEG meeting in October 2012 and submissions will be due at the 104th meeting in April 2013. At this meeting, MPEG has hosted a 2nd workshop on 3D audio with the following speakers.

  • Frank Melchior, BBC R&D: “3D Audio? – Be inspired by the Audience!”
  • Kaoru Watanabe, NHK and ITU: “Advanced multichannel audio activity and requirements”
  • Bert Van Daele, Auro Technologies: “3D audio content production, post production and distribution and release”
  • Michael Kelly, DTS: “3D audio, objects and interactivity in games”

The report of this workshop including the presentations will be publicly available by end of August at the MPEG Web site.

What’s new: Green MPEG

Impressions from the 101st meeting

Finally, MPEG is starting to explore a new area which is currently referred to as Green MPEG addressing technologies to enable energy-efficient use of MPEG standards. Therefore, an Ad-hoc Group (AhG) was established with the following mandates:

  1. Study the requirements and use-cases for energy efficient use of MPEG technology.
  2. Solicit further evidence for the energy savings.
  3. Develop reference software for Green MPEG experimentation and upload any such software to the SVN.
  4. Survey possible solutions for energy-efficient video processing and presentation.
  5. Explore the relationship between metadata types and coding technologies.
  6. Identify new metadata that will enable additional power savings.
  7. Study system-wide interactions and implications of energy-efficient processing on mobile devices.

AhGs are usually open to the public and all discussions take place via email. To subscribe please feel free to join the email reflector.

ACM TOMCCAP Nicolas D. Georganas Best Paper Award


In its initial year the ACM Transactions on Multimedia Computing, Communications and Applications (TOMCCAP) Nicolas D. Georganas Best Paper Award goes to the paper Video Quality for Face Detection, Recognition and Tracking (TOMCCAP vol 7. no.3) by Pavel Korshunov and Wei Tsang Ooi.

The winning paper is pioneering because it is the very first study which tries to determine an objective quality threshold value for videos used in automated video processing (AVP). The

paper proves that if a video’s quality is below a certain threshold (it gives the actual values for this threshold based on video context), it cannot be used in AVP systems. Further, it is shown that

AVP systems still work with reasonable accuracy even when the video quality is low from a human’s perspective. This is an important finding because it means we can reduce quality and bit rate of the video without sacrificing accuracy, leading to reduced costs, greater scalability, and faster processing. What is unique about the paper is that it distinguishes between quality as perceived by humans, versus quality as perceived by AVP systems. In essence, the paper proposes that for AVP systems we should design machine-consumable video coding standards, not human-consumable codes.

The purpose of the award is to recognize the most significant work in ACM TOMCCAP in a given calendar year. The whole readership of ACM TOMCCAP was invited to nominate articles which were published in Volume 7 (2011). Based on the nominations the winner has been chosen by the TOMCCAP Editorial Board. The main assessment criteria have been quality, novelty, timeliness, clarity of presentation, in addition to relevance to multimedia computing, communications, and applications.

The award honors the founding Editor-in-Chief of TOMCCAP, Nicolas D. Georganas, for his contributions to the field of multimedia computing and his significant contributions to ACM.  He influenced the research and the multimedia community exceedingly.

The Editor-in-Chief Prof. Dr.-Ing. Ralf Steinmetz and the Editorial Board of ACM TOMCCAP cordially congratulate the winner. The award will be presented to the authors on November 1st 2012 at the ACM Multimedia 2012 in Nara, Japan and includes  travel expenses for the winning authors.


Dear Member of the SIGMM Community, welcome to the third issue of the SIGMM Records in 2012.

As you can see, the format of the Records as changed dramatically with this issue, and the migration is going to be completed in the coming months. The new system is meant to make the Records more valuable and interactive for your benefit, and we hope the long wait for the third issue was worth your while. First of all, your submissions will become visible on the front page of the Records as soon as they have been approved by one of the editors, and they will be included in the following issue. The submission of standard content formats has become much easier than before: select your contribution from the pulld0wn menu, add your information and submit it, and see immediately on the submission page that your submission was successful. You can of course also send your contributions and any questions to

We are furthermore inviting a new category: Please tell us about your ongoing research projects! What are your goals and achievements? Who are your partners? How are your publications connected?

Of course, this issue has also some content: SIGMM’s 2012 awards have been handed out at ACM Multimedia in October in Nara. In this issue you can read about the SIGMM award for outstanding contributions to multimedia, the best PhD thesis award 2012 and the first ever awarded Nicholas D. Georganas Award for the best TOMCCAP paper.

TOMCCAP announces a major policy chang; you can read about the startup foodQuest, the Open Source project Ambulant, and the latest MPEG meeting.  You can read PhD thesis summaries provided by a two candidates who have recently passed their doctoral exams.

Last but most certainly not least, you find pointers to the latest issues of TOMCCAP and MMSJ, and several job announcements.

We hope that you enjoy this issue of the Records.

The Editors
Stephan Kopf
Viktor Wendel
Lei Zhang
Pradeep Atrey
Christian Timmerer
Pablo Cesar
Carsten Griwodz

Ambulant – a multimedia playback platform


Distributed multimedia is a field that depends on many technologies, including networking, coding and decoding, scheduling, rendering and user interaction. Often, this leads to multimedia researchers in one of those fields expending a lot of implementation effort to build a complete media environment when they actually only want to demonstrate an advance within their own field. In 2004 the authors, having gone through this process more than once themselves, decided to design an open source extensible and embeddable multimedia platform that could serve as a central research resource. The NLNet Foundation,, graciously provided initial funding for the resulting Ambulant project. Ambulant was designed from the outset to be usable for experimentation in a wide range of fields, not only in a laboratory setting but also as a deployed player for end users. However, it was not intended to compete with general end-user playback systems such as the (then popular) RealPlayer, Quicktime or the Windows Media Player. Our goal was to build a glue environment where various research groups could plug in next approaches to media scheduling, rendering and distribution. While some effort was spent on things like ease of installation, multi-platform compatibility and user interface issues, Ambulant has never hoped to usurp commercial media players. The user interface on three different platforms can bee seen in the figure below.

The first deployment of the platform was during the W3C standardization of SMIL 2.1 and 3.0 [2, 3], when Ambulant was used to test the specification and create an open reference implementation. The fact that Ambulant supports SMIL out of the box means that it is not only useful to “low-level” multimedia researchers who want to experiment with replacing systems components, but also to people interested in semantics or server-side document generation: by using SMIL as their output format they can use Ambulant to render their documents on any platform, including inside a web browser. Design and Implementation Ambulant is designed so that all key components are replaceable and extensible. This follows from the requirement that it is usable as an experimentation vehicle: if someone wants to replace the scheduler by one of their own design this should be possible, and have little or no impact on the rest of the system. To ensure wide deployability it was decided to create a portable platform. However, runtime efficiency is also an issue in multimedia playback, especially for audio and video decoding and rendering, so we decided to implement the core engine in C++. This allowed us to use platform-native decoding and rendering toolkits such as QuickTime and DirectShow, and gave us the added benefit of being able to use the native GUI toolkit on each platform, which makes life easier for end users and integrators. Using the native GUI has been a bit of extra effort up front, finding the right spot to separate platform-independent and platform-dependent code, but by now porting to a new GUI toolkit takes about three man-months. About 8 GUI toolkits have been supported over time (or 11 if you count browser plugin APIs as a GUI toolkit). The current version of Ambulant runs natively on MacOSX, Linux, Windows and iOS, and a browser plugin is available for all major browsers on all desktop platforms (including Internet Explorer on Windows). Various old platforms (WM5, Maemo) were supported in the past and, while no longer maintained, the code is still available. The design of Ambulant is shown in the figure above. On the document level there is a parser which reads external documents and converts them into a representation that the scheduler and layout engine will handle during document playout time. On the intermediate level there are datasources that read documents and media streams and handles them to the playout components. On the lower level there are the machine-dependent implementations of those stream readers and renderers. For each of these components there are multiple implementations, and those can easily be replaced or extended. The design largely uses factory functions and abstract interfaces, therefore the implementation uses a plugin architecture to allow easy replacement of components at runtime without having to rebuild the complete application. To make life even more simple, the API to the core Ambulant engine is available not only in C++ but also in Python. The Python bridge is complete and bidirectional: all classes that are accessible from C++ are just as accessible from Python and vice versa, and sending an object back-and-forth through the Python-C++ bridge results in the original object, not a new double-wrapped object. Moreover, not only can C++ classes be subclassed in Python but also the reverse. This means both extending Ambulant through a plugin and embedding Ambulant can be done in pure Python, without having to write any C/C++ code and without having to rebuild Ambulant. Applications Over the years, Ambulant has extensively been used for experimentation, both within our group and externally. In this section we will highlight some of these applications. The overview is not complete, but it highlights the breadth of applications of Ambulant. One of the interests of the authors is maintaining the temporal scheduling integrity of dynamically modified multimedia presentations. In the Ambulant Annotator [4], we experimented with using secondary screens during playback, allowing user interaction on those secondary screens to modify existing shared presentations on the main screen. The modification and sharing interface was implemented as a plugin in Ambulant, which is also used to drive the main screen. In Ta2 MyVideos [5] we looked at a different form of live modification: a personalized video mashup that was created while the user is viewing it. Integration of live video conferencing and multimedia documents is another area in which we work. For the Ta2 Family Game project [6] we augmented Ambulant with renderers to do low delay live video rendering and digitizing, and a Flash engine. The resulting platform was used to play a cooperative action game in multiple locations. We are also using Ambulant to investigate protocols for synchronizing media playback at remote locations. In a wholly different application area, the Daisy Consortium has used Ambulant as the basis of AMIS, AMIS is software that reads Daisy Books, which are the international standard for digital talking books for the visually impaired. For this project Ambulant was only a small part of the solution. The main program allows the end user, who may be blind or dyslectic, to select books and navigate them. Timed playback is then handled by Ambulant, with added functionality to highlight paragraphs on-screen as the content is read out, etc. At a higher level, an instrumented version of Ambulant has also been deployed to indirectly evaluate social media systems. In 2004, it was submitted to the first ACM Multimedia Open Source Software Competition [1]. Obtaining and Using Ambulant Ambulant is available via, in three different forms: as a stable distribution (source and installers), as a nightly build (source and installers) and through Mercurial. Unfortunately, the stable distribution is currently lagging quite a bit behind, due to restricted manpower. We also maintain full API documentation, sample documents and community mailing lists. Ambulant is distributed under the LGPL2 license. This allows the platform to be used with commercial plugins developed by industry partners who provide proprietary software intended for limited distribution. We are considering a switch to dual licensing (GPL/BSD), but a concrete need has yet to arise. The Bottom Line Ambulant is a full open source media rendering pipeline. It provides an open, plug-in environment in which researches from a wide variety of (sub)disciplines can test new algorithms and media sharing approaches without having to write mountains of less-relevant framework code. It can serve as an open environment for experimentation, validation and distribution. You are welcome to give it a try and to contribute to its growth. References [1]Bulterman, D. et al. 2004. Ambulant: a fast, multi-platform open source SMIL player. In Proceedings of the 12th annual ACM international conference on Multimedia (MULTIMEDIA ’04). ACM, New York, NY, USA, 492-495. DOI=10.1145/1027527.1027646 [2]Bulterman, D. et al. 2008. Synchronized Multimedia Integration Language (SMIL 3.0). W3C. URL= [3]Bulterman, D. and Rutledge, L. 2008. Interactive Multimedia for the Web, Mobile Devices and Daisy Talking Books. Springer-Verlag, Heidelberg, Germany, ISBN: 3-540-20234-X. [4]Cesar, P. et al. Fragment, tag, enrich, and send: Enhancing social sharing of video. Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP (2009) vol. 5 (3). DOI=10.1145/1556134.1556136 [5]Jansen, J. et al. 2012. Just-in-time personalized video presentations. In Proceedings of the 2012 ACM symposium on Document engineering (DocEng ’12). ACM, New York, NY, USA, 59-68. DOI=10.1145/2361354.2361368 [6]Jansen, J. et al. Enabling Composition-Based Video-Conferencing for the Home. IEEE Transactions on Multimedia (2011) vol. 13 (5) pp. 869-881. DOI=10.1109/TMM.2011.2159369

Outstanding PhD Thesis in Multimedia Computing, Communications and Applications

Dr. Wanmin Wu

Dr. Wanmin Wu

The SIGMM Ph.D. Award Committee is pleased to recommend this year’s award for the outstanding Ph.D. thesis in multimedia computing, communications and applications to Dr. Wanmin Wu.

Wu’s dissertation documents fundamental work in the area of unifying systems  and user-centric approaches to managing information flows for supporting 3D tele-immersive environments. She has developed a theoretical framework for modeling and measuring QoE,  and for correlating QoE with Quality-of-Service (QoS) in distributed multi-modal interactive environments. This work has been significant in that it introduced the importance of the user-centric approach to modelling and managing complex three-dimensional data exchanges in time-constrained systems.

The committee considered the main innovations of this work to be:

  1. Identifying and incorporating human psycho-physical factors along with traditional QOS to improve experience;
  2. Proposing new methods and theory for QOS in interactive multi-camera environments that have served as a catalyst for enabling work in distributed education, medicine and conferencing;
  3. The development of new methods for video coding incorporating understanding of users psycho-physical understanding of color and depth.

These new methods have significantly reduced the impact of sharing tele-immersive information and are likely to have a longer-term benefit that is similar to that of selective audio encoding.

The committee has considered this contribution as worthy of the award as it tackles a new problem, proposes new theory and practice as a solution to this problem area, and opens the way for further research into effective distributed three-dimensional immersive systems.

SIGMM Award for Outstanding Technical Contributions to Multimedia Computing, Communications and Applications


Dr. HongJiang Zhang

The 2012 winner of the prestigious Association for Computing Machinery (ACM) Special Interest Group on Multimedia (SIGMM) award for Outstanding Technical Contributions to Multimedia Computing, Communications and Applications is Dr. HongJiang Zhang. He is currently Chief Executive Officer at Kingsoft. He also holds guest professorships at Tsinghua University and Harbin Institute of Technology. The ACM SIGMM Technical Achievement award, given in recognition of outstanding contributions over a researcher’s career, cites Dr. Zhang’s “pioneering contributions to and leadership in media computing including content-based media analysis and retrieval, and their applications.” The SIGMM award will be presented at the ACM International Conference on Multimedia 2012 that will be held Oct 29 – Nov 2 2012 in Nara, Japan.

In the early 1990s, Dr. Zhang began his pioneering work on content analysis and content-based abstraction, browsing, and retrieval of video, when these research areas were about to emerge. He established the foundations of this new research area by his numerous seminal contributions. Dr. Zhang’s most noteworthy early works include the first algorithm for reliably detecting gradual video scene transitions and content-based video key-frame extraction, one of the first works on compressed domain video content analysis, as well as his structured video analysis framework and algorithms. These pioneering works have had tremendous impact on the directions, methodologies and advancements of the media computing field.

Dr. Zhang’s research contributions also made a profound impact on the establishment of the ISO (International Standards Organization) MPEG-7 standard, which is the international standard that defines multimedia content descriptions.

In addition to his scholarly contributions, Dr. Zhang has significantly shaped the video indexing and editing software industry through his seminars, publications, patents and technology licensing and transfers to a number of companies and successful HP and Microsoft products. Most significant are:

  1. Image Bank’s video cataloging tools licensed to Image Bank. Inc.(1995);
  2. Video structure parsing technologies licensed to Intel in (1996);
  3. Media metadata definition and extraction algorithms in Window Imaging Platform;
  4. Image search in Microsoft Digital Image Pro (2003) and web search releases; and
  5. Automated video editing, a technology breakthrough that gained Microsoft MovieMaker 2.0 a five star rating by About.Com Desktop Video (2003).

In summary, Dr. Zhang’s accomplishments include pioneering and extraordinary contributions to media computing and outstanding service to the computing community.

ACM is the professional society of computer scientists, and SIGMM is the special interest group on multimedia.




TU Darmstadt spin-off „foodQuest“ is developing an iPhone app for personalized restaurant recommendations

What do a student, a top-manager, a loving couple and a family have in common? Indeed, they all are hungry! And: They always want to discover great restaurants. The requirements, however, vary a lot: Low price, suitable for business talks, romantic or child-friendly. Not every restaurant is suitable for every guest and every occasion. foodQuest is the first app for restaurant recommendations that caters to the individual needs of its hungry users. The suggestions are derived through a hybrid model of automated analysis, crowdsourcing, and editorial content. In the current version, foodQuest supplies culinary assistance for the cities of Hanover and Frankfurt. The next release will feature recommendations for all of Germany. The app is available for free in Apple’s App Store.

Read more

GameDays & Edutainment 2012

Opening of the GameDays 2012

On behalf of the conference co-chairs, we wish to provide a report of the eight GameDays, which have been held from September 18th to 20th at Technische Universität Darmstadt and in the premises of Fraunhofer IGD.

Opening of the GameDays 2012

The GameDays are initiated and mainly organized by Dr. Stefan Göbel, the head of the Serious Games group at the Multimedia Communications Lab at TU Darmstadt. The GameDays take place as a “Science meets Business” event in the field of Serious Games on an annual basis since 2005 in cooperation with Hessen-IT, the Forum for Interdisciplinary Research of TU Darmstadt and other partners from science and industry. Read more