Multimedia Grand Challenge


Submission Deadline: June 15th 2009

Date: Oct 20th 2009

Location: Beijing, China

Chair(s): Mor Naaman, Tat-Seng Chua

MM2009: Introducing The Multimedia Grand Challenge

This year’s ACM Multimedia Conference is introducing a new program: the Multimedia Grand Challenge (or, MMGC). Despite its name, the Grand Challenge is in fact a set of problems and challenges from various industry leaders. The challenges, from Google, Yahoo!, Nokia and other leaders are summarized below, and are geared to engage the Multimedia research community in solving relevant, interesting and challenging questions about the industry’s 2-5 year horizon for multimedia. While the Multimedia Grand Challenge is initially presented as part of ACM, we hope to continue the engagement with existing and new industry partners in future Multimedia conferences.

The plan is simple. Each corporate partner posted a challenge (some posted two different challenges) on the Grand Challenge website. Researchers, in turn, use the challenges as motivation/inspiration/source for research. Researchers make submissions of short papers describing working systems that significantly address the challenges. The top submissions, selected by the MMGC program committee of academics and the industry partners, will be presented and demonstrated in a live session during the ACM Multimedia conference. Finally, the best submission(s), as chosen by corporate partners, win the challenge (Prize details will be announced soon on the Multimedia Grand Challenge blog).

We believe that the MMGC will help strengthen the (currently rather frail) connection between industry leaders and multimedia researchers, to both sides’ benefit.  Industry partners will be able to: Engage talented researchers in working on their problems (and perhaps data); Gain new insights into problems that look ahead of their current roadmap; Attract and excite great graduate students who are working on relevant problems (and maybe hire them later as interns or full time employees); and, finally, Build an image of innovation and demonstrate support for innovative projects and efforts (a surprisingly significant driver for many companies).

On the other hand, multimedia researchers will hopefully gain exciting new challenges. Other benefits are also possible, including: a stronger potential alignment with industry challenges, metrics, and evaluations methodologies; potential for real-world applications for their expertise and innovations; enhanced exposure to public and to relevant companies; eventually, access to data or sources for data to use for research; and finally, the potential for future funding and some good-old prize-winning!

OK, enough with the introduction: let’s get to the challenges! Listed below are the overview summaries of the challenges from the different companies. URLs are provided for each challenge, or view all of the challenges at, you guessed it, the MMGC web site.

Google Challenge

Robust, As-Accurate-As-Human Genre Classification for Video.
A notion of browsing collections is naturally associated with videos. Having videos classified into a pre-existing hierarchy of genres is one way to make the browsing task easier. The goal of this task would be to take user generated videos (along with their sparse and noisy metadata) and automatically classify them into genres.

HP Challenge

Robust Identification of Informative Multimedia Content in Web Pages.
This challenge invites solutions to the robust identification and extraction of informative multimedia content for web pages. Multimedia content on a web page is classified as either informative or “auxiliary” content. Multimedia such as advertisements, navigation aids, decorative graphics, or any other content peripherally related to the informative portions of the page is considered as auxiliary content. Can a solution correctly detect informative multimedia content with 99% accuracy for any web page of any language?

Nokia Challenge

Where was this Photo Taken, and How?

This challenge focuses on capture device location and orientation, one dimension of content metadata. The problem can be stated simply: try to derive exact camera poses (location and orientation) of given photos that are lacking location annotation. This kind of technology could potentially be used to add metadata to existing or newly captured photos.

Radvision Challenges

1. Video Conferencing To Surpass “In-Person” Meeting Experience.
This challenge focuses on developing new technologies and ideas to surpass the “in-person” meeting experience. In the process a set of subjective and objective measures to evaluate “meeting” experience will be developed. With these measures, alternative solutions could be compared to each other and to in-person meetings, and optimized accordingly.

2. Real-time Data Collaboration Adaptation for Multi-Device Video Conferencing.
This challenge focuses on adapting, in real-time, the data collaboration channel in a video conferencing system to different receiving devices, in a way that would be regarded as optimal perceptually by users.

Current TV Challenge

Media Production in the Age of Community.
Current TV is seeking a connection between streaming social media and multimedia content. What kinds of social streams (e.g., Twitter) can be aligned in real time to live media? What video content features align to social streams? How could social streams be used to find highlights and summarizations of events? How is this video segment important to a community?  What deeper insight can the analysis of social streams add to news reportage?

Yahoo Challenges

1. Robust Automatic Segmentation of Video According to Narrative Themes.
The challenge is to develop methods, techniques, and algorithms to automatically generate narrative themes for a given video, as well as present the content in an easy-to-consume manner to end-users in a search engine experience.

2. Robust Clustering Guided by User Intent in Image Search.
The challenge involves developing a robust way of understanding user intent in image search and generating highly relevant result clusters for the given intent and query.

Accenture Challenge

Analysis of Video Footage Captured in Uncontrolled Environments.
The proliferation of cameras has led to an explosion of video content. Often, it is necessary to analyze this corpus after (for) an event. A system should utilize a video corpus such as data from surveillance cameras, and knowledge about the camera networks (if any), and generate categories of objects and events that we can identify based on these objects and their interactions, and a good representation for these objects or events.

CeWe Challenge

The Next Generation of Tangible Multimedia Products:
thematic photo story generation from personal photo collections. The challenge is to help the user determine a meaningful subset of photos out of a collection, which best summarizes and represents the specific event. Researchers should take realistic photo sets of users as a basis to (semi-) automatically determine those that best summarize the underlying event such as a 2-weeks holiday. The solution should not only consist of an approach for the selection but could be embedded in an authoring system the user in the loop.

To summarize, we encourage multimedia researchers to consider the challenges and submit contributions or solutions to the ACM Multimedia 2009 Grand Challenge track by June 15th. Specific submission instructions will be made available on the Grand Challenge blog. Submission of extended description of the work to other MM 2009 tracks is highly recommended; Grand Challenge submissions that are accompanied by other MM 2009 accepted publications would receive priority. At the conference, you (if accepted) will introduce the idea shortly to the audience, give a quick demo, and take short questions (Multimedia Idol-style) from the judges. Based on your presentation, as mentioned above, a team of judges and the attending crowd will select the top contributors and prize winners. Get more of the technical details right here.

See you in Beijing!

Bookmark the permalink.