Report from ACM MMSys 2017

–A report from Christian Timmerer, AAU/Bitmovin Austria

The ACM Multimedia Systems Conference (MMSys) provides a forum for researchers to present and share their latest research findings in multimedia systems. It is a unique event targeting “multimedia systems” from various angles and views across all domains instead of focusing on a specific aspect or data type. ACM MMSys’17 was held in Taipei, Taiwan in June 20-23, 2017.

MMSys is a single-track conference which hosts also a series of workshops, namely NOSSDAV, MMVE, and NetGames. Since 2016, it kicks off with overview talks and 2017 we’ve seen the following talks: “Geometric representations of 3D scenes” by Geraldine Morin; “Towards Understanding Truly Immersive Multimedia Experiences” by Niall Murray; “Rate Control In The Age Of Vision” by Ketan Mayer-Patel; “Humans, computers, delays and the joys of interaction” by Ragnhild Eg; “Context-aware, perception-guided workload characterization and resource scheduling on mobile phones for interactive applications” by Chung-Ta King and Chun-Han Lin.

Additionally, industry talks have been introduced: “Virtual Reality – The New Era of Future World” by WeiGing Ngang; “The innovation and challenge of Interactive streaming technology” by Wesley Kuo; “What challenges are we facing after Netflix revolutionized TV watching?” by Shuen-Huei Guan; “The overview of app streaming technology” by Sam Ding; “Semantic Awareness in 360 Streaming” by Shannon Chen; “On the frontiers of Video SaaS” by Sega Cheng.

An interesting set of keynotes presented different aspects related multimedia systems and its co-located workshops:

  • Henry Fuchs, The AR/VR Renaissance: opportunities, pitfalls, and remaining problems
  • Julien Lai, Towards Large-scale Deployment of Intelligent Video Analytics Systems
  • Dah Ming Chiu, Smart Streaming of Panoramic Video
  • Bo Li, When Computation Meets Communication: The Case for Scheduling Resources in the Cloud
  • Polly Huang, Measuring Subjective QoE for Interactive System Design in the Mobile Era – Lessons Learned Studying Skype Calls

IMG_4405The program included a diverse set of topics such as immersive experiences in AR and VR, network optimization and delivery, multisensory experiences, processing, rendering, interaction, cloud-based multimedia, IoT connectivity, infrastructure, media streaming, and security. A vital aspect of MMSys is dedicated sessions for showcasing latest developments in the area of multimedia systems and presenting datasets, which is important towards enabling reproducibility and sustainability in multimedia systems research.

The social events were a perfect venue for networking and in-depth discussion how to advance the state of the art. A welcome reception was held at “LE BLE D’OR (Miramar)”, the conference banquet at the Taipei World Trade Center Club, and finally a tour to the Shilin Night Market was organized.

ACM MMSys 2917 issued the following awards:

  • The Best Paper Award  goes to “A Scalable and Privacy-Aware IoT Service for Live Video Analytics” by Junjue Wang (Carnegie Mellon University), Brandon Amos (Carnegie Mellon University), Anupam Das (Carnegie Mellon University), Padmanabhan Pillai (Intel Labs), Norman Sadeh (Carnegie Mellon University), and Mahadev Satyanarayanan (Carnegie Mellon University).
  • The Best Student Paper Award goes to “A Measurement Study of Oculus 360 Degree Video Streaming” by Chao Zhou (SUNY Binghamton), Zhenhua Li (Tsinghua University), and Yao Liu (SUNY Binghamton).
  • The NOSSDAV’17 Best Paper Award goes to “A Comparative Case Study of HTTP Adaptive Streaming Algorithms in Mobile Networks” by Theodoros Karagkioules (Huawei Technologies France/Telecom ParisTech), Cyril Concolato (Telecom ParisTech), Dimitrios Tsilimantos (Huawei Technologies France), Stefan Valentin (Huawei Technologies France).

Excellence in DASH award sponsored by the DASH-IF 

  • 1st place: “SAP: Stall-Aware Pacing for Improved DASH Video Experience in Cellular Networks” by Ahmed Zahran (University College Cork), Jason J. Quinlan (University College Cork), K. K. Ramakrishnan (University of California, Riverside), and Cormac J. Sreenan (University College Cork)
  • 2nd place: “Improving Video Quality in Crowded Networks Using a DANE” by Jan Willem Kleinrouweler, Britta Meixner and Pablo Cesar (Centrum Wiskunde & Informatica)
  • 3rd place: “Towards Bandwidth Efficient Adaptive Streaming of Omnidirectional Video over HTTP” by Mario Graf (Bitmovin Inc.), Christian Timmerer (Alpen-Adria-Universität Klagenfurt / Bitmovin Inc.), and Christopher Mueller (Bitmovin Inc.)

Finally, student travel grants awards have been sponsored by SIGMM. All details including nice pictures can be found here.

ACM MMSys 2018 will be held in Amsterdam, The Netherlands, June 12 – 15, 2018 and includes the following tracks:

  • Research track: Submission deadline on November 30, 2017
  • Demo track: Submission deadline on February 25, 2018
  • Open Dataset & Software Track: Submission deadline on February 25, 2018

MMSys’18 co-locates the following workshops (with submission deadline on March 1, 2018):

  • MMVE2018: 10th International Workshop on Immersive Mixed and Virtual Environment Systems,
  • NetGames2018: 16th Annual Worksop on Network and Systems Support for Games,
  • NOSSDAV2018: 28th ACM SIGMM Workshop on Network and Operating Systems Support for Digital Audio and Video,
  • PV2018: 23rd Packet Video Workshop

MMSys’18 includes the following special sessions (submission deadline on December 15, 2017):

Report from ICMR 2017

ACM International Conference on Multimedia Retrieval (ICMR) 2017

ACM ICMR 2017 in “Little Paris”

ACM ICMR is the premier International Conference on Multimedia Retrieval, and from 2011 it “illuminates the state of the arts in multimedia retrieval”. This year, ICMR was in an wonderful location: Bucharest, Romania also known as “Little Paris”. Every year at ICMR I learn something new. And here is what I learnt this year.


Final Conference Shot at UP Bucharest

UNDERSTANDING THE TANGIBLE: object, scenes, semantic categories – everything we can see.

1) Objects (and YODA) can be easily tracked in videos.

Arnold Smeulders delivered a brilliant keynote on “things” retrieval: given an object in an image, can we find (and retrieve) it in other images, videos, and beyond? Very interesting technique for tracking objects (e.g. Yoda) in videos based on similarity learnt through siamese networks.

Tracking Yoda with Siamese Networks

2) Wearables + computer vision help explore cultural heritage sites.

As showed in his keynote, at MICC University of Florence, Alberto del Bimbo and his amazing team have designed smart audio guides for indoor and outdoor spaces. The system detects, recognises, and describes landmarks and artworks from wearable camera inputs (and GPS coordinates, in case of outdoor spaces).

3) We can finally quantify how much images provide complementary semantics compared to text [BEST MULTIMODAL PAPER AWARD].

For ages, the community has asked how relevant different modalities are for multimedia analysis: this paper ( finally proposes a solution to quantify information gaps between different modalities.

4) Exploring news corpuses is now very easy: news graphs are easy to navigate and aware of the type of relations between articles.

Remi Bois and his colleagues presented this framework (, made for professional journalists and the general public, for seamlessly browsing through large-scale news corpus. They built a graph where nodes are articles in a news corpus. The most relevant items to each article are chosen (and linked) based on an adaptive nearest neighbor technique. Each link is then characterised according to the type of relation of the 2 linked nodes.

5) Panorama outdoor images are much easier to localise.

In his beautiful work (, Ahmet Iscen from Inria developed an algorithm for location prediction from StreetView images, outperforming the state of the art thanks to an intelligent stitching pre-processing step: predicting locations from panoramas (stitched individual views) instead of individual street images improves performances dramatically!

UNDERSTANDING THE INTANGIBLE: artistic aspects, beauty, intent: everything we can perceive

1) Image search intent can be predicted by the way we look.

In his best paper candidate research work (, Mohammad Soleymani showed that image search intent (seeking information, finding content, or re-finding content) can be predicted from physisological responses (eye gaze) and implicit user interaction (mouse movements).

2) Real-time detection of fake tweets is now possible using user and textual cues.

Another best paper candidate (, this time from CERTH. The team collected a large dataset of fake/real sample tweets spanning 17 events and built an effective model from misleading content detection from tweet content and user characteristics. A live demo here:

3) Music tracks have different functions in our daily lives.

Researchers from TU Delft have developed an algorithm ( which classifies music tracks according to their purpose in our daily activities: relax, study and workout.

4) By transferring image style we can make images more memorable!

The team at University of Trento built an automatic framework ( to improve image memorability. A selector finds the style seeds (i.e. abstract paintings) which are likely to increase memorability of a given image, and after style transfer, the image will be more memorable!

5) Neural networks can help retrieve and discover child book illustrations.

In this amazing work (, motivated by real children experiences, Pinar and her team from Hacettepe University collected a large dataset of children book illustrations and found that neural networks can predict and transfer style, allowing to make “Winnie the witch”-like many other illustrations.

Winnie the Witch

6) Locals perceive their neighborhood as less interesting, more dangerous and dirtier compared to non-locals.

In this wonderful work (, presented by Darshan Santain from IDIAP, researchers asked locals and crowd-workers to look at pictures from various neighborhoods in Guanajuato and rate them according to interestingness, cleanliness, and safety.

THE FUTURE: What’s Next?

1) We will be able to anonymize images of outdoor spaces thanks to Instagram filters, as proposed by this work ( in the Brave New Idea session.  When an image of an outdoor space is manipulated with appropriate Instagram filters, the location of the image can be masked from vision-based geolocation classifiers.

2) Soon we will be able to embed watermarks in our Deep Neural Network models in order to protect our intellectual property [BEST PAPER AWARD]. This is a disruptive, novel idea, and that is why this work from KDDI Research and Japan National Institute of Informatics won the best paper award. Congratulations!

3) Given an image view of an object, we will predict the other side of things (from Smeulders’ keynote). In the pic: predicting the other side of chairs. Beautiful.

Predicting the other side of things

THANKS: To the organisers, to the volunteers, and to all the authors for their beautiful work :)

EDITORIAL NOTE: A more extensive report from ICMR 2017 by Miriam is available on Medium

Report from MMM 2017

Harpa, the venue of the conference banquet.

MMM 2017 — 23rd International Conference on MultiMedia Modeling

MMM is a leading international conference for researchers and industry practitioners for sharing new ideas, original research results and practical development experiences from all MMM related areas. The 23rd edition of MMM took place on January 4-6 of 2017, on the modern campus of Reykjavik University. In this short report, we outline the major aspects of the conference, including: technical program; best paper session; video browser showdown; demonstrations; keynotes; special sessions; and social events. We end by acknowledging the contributions of the many excellent colleagues who helped us organize the conference. For more details, please refer to the MMM 2017 web site.

Technical Program

The MMM conference calls for research papers reporting original investigation results and demonstrations in all areas related to multimedia modeling technologies and applications. Special sessions were also held that focused on addressing new challenges for the multimedia community.

This year, 149 regular full paper submissions were received, of which 36 were accepted for oral presentation and 33 for poster presentation, for a 46% acceptance ratio. Overall, MMM received 198 submissions for all tracks, and accepted 107 for oral and poster presentation, for a total of 54% acceptance rate. For more details, please refer to the table below.

MMM2017 Submissions and Acceptance Rates

MMM2017 Submissions and Acceptance Rates

Best Paper Session

Four best paper candidates were selected for the best paper session, which was a plenary session at the start of the conference.

The best paper, by unanimous decision, was “On the Exploration of Convolutional Fusion Networks for Visual Recognition” by Yu Liu, Yanming Guo, and Michael S. Lew. In this paper, the authors propose an efficient multi-scale fusion architecture, called convolutional fusion networks (CFN), which can generate the side branches from multi-scale intermediate layers while consuming few parameters.

Phoebe Chen, Laurent Amsaleg and Shin’ichi Satoh (left) present the Best Paper Award to Yu Liu and Yanming Guo (right).

Phoebe Chen, Laurent Amsaleg and Shin’ichi Satoh (left) present the Best Paper Award to Yu Liu and Yanming Guo (right).

The best student paper, partially chosen due to the excellent presentation of the work, was “Cross-modal Recipe Retrieval: How to Cook This Dish?” by Jingjing Chen, Lei Pang, and Chong-Wah Ngo. In this work, the problem of sharing food pictures from the viewpoint of cross-modality analysis was explored. Given a large number of image and recipe pairs acquired from the Internet, a joint space is learnt to locally capture the ingredient correspondence from images and recipes.

Phoebe Chen, Laurent Amsaleg and Shin’ichi Satoh (left) present the Best Student Paper Award to Jingjing Chen and Chong-Wah Ngo (right).

Phoebe Chen, Shin’ichi Satoh and Laurent Amsaleg (left) present the Best Student Paper Award to Jingjing Chen and Chong-Wah Ngo (right).

The two runners-up were “Spatio-temporal VLAD Encoding for Human Action Recognition in Videos” by Ionut Cosmin Duta, Bogdan Ionescu, Kiyoharu Aizawa, and Nicu Sebe, and “A Framework of Privacy-Preserving Image Recognition for Image-Based Information Services” by Kojiro Fujii, Kazuaki Nakamura, Naoko Nitta, and Noboru Babaguchi.

Video Browser Showdown

The Video Browser Showdown (VBS) is an annual live video search competition, which has been organized as a special session at MMM conferences since 2012. In VBS, researchers evaluate and demonstrate the efficiency of their exploratory video retrieval tools on a shared data set in front of the audience. The participating teams start with a short presentation of their system and then perform several video retrieval tasks with a moderately large video collection (about 600 hours of video content). This year, seven teams registered for VBS, although one team could not compete for personal and technical reasons. For the first time in 2017, live judging was included, in which a panel of expert judges made decisions in real-time about the accuracy of the submissions for ⅓ of the tasks.

Teams and spectators in the Video Browser Showdown.

Teams and spectators in the Video Browser Showdown.

On the social side, two changes were also made from previous conferences. First, VBS was held in a plenary session, to avoid conflicts with other schedule items. Second, the conference reception was held at VBS, which meant that attendees had extra incentives to attend VBS, namely food and drink. And third, Alan Smeaton served as “color commentator” during the competition, interviewing the organizers and participants, and helping explain to the audience what was going on. All of these changes worked well, and contributed to a very well attended VBS session.

The winners of VBS 2017, after a very even and exciting competition, were Luca Rossetto, Ivan Giangreco, Claudiu Tanase, Heiko Schuldt, Stephane Dupont and Omar Seddati, with their IMOTION system.

The winners of VBS 2017, after a very even and exciting competition, were Luca Rossetto, Ivan Giangreco, Claudiu Tanase, Heiko Schuldt, Stephane Dupont and Omar Seddati, with their IMOTION system.


Five demonstrations were presented at MMM. As in previous years, the best demonstration was selected using both a popular vote and a selection committee. And, as in previous years, both methods produced the same winner, which was: “DeepStyleCam: A Real-time Style Transfer App on iOS” by Ryosuke Tanno, Shin Matsuo, Wataru Shimoda, and Keiji Yanai.

The winners of the Best Demonstration competition hard at work presenting their system.

The winners of the Best Demonstration competition hard at work presenting their system.


The first keynote, held in the first session of the conference, was “Multimedia Analytics: From Data to Insight” by Marcel Worring, University of Amsterdam, Netherlands. He reported on a novel multimedia analytics model based on an extensive survey of over eight hundred papers. In the analytics model, the need for semantic navigation of the collection is emphasized and multimedia analytics tasks are placed on an exploration-search axis. Categorization is then proposed as a suitable umbrella task for realizing the exploration-search axis in the model. In the end, he considered the scalability of the model to collections of 100 million images, moving towards methods which truly support interactive insight gain in huge collections.

Björn Þór Jónsson introduces the first keynote speaker, Marcel Worring (right).

Björn Þór Jónsson introduces the first keynote speaker, Marcel Worring (right).

The second keynote, held in the last session of the conference, was “Creating Future Values in Information Access Research through NTCIR” by Noriko Kando, National Institute of Informatics, Japan. She reported on NTCIR (NII Testbeds and Community for Information access Research), which is a series of evaluation workshops designed to enhance the research in information access technologies, such as information retrieval, question answering, and summarization using East-Asian languages, by providing infrastructures for research and evaluation. Prof Kando provided motivations for the participation in such benchmarking activities and she highlighted the range of scientific tasks and challenges that have been explored at NTCIR over the past twenty years. She ended with ideas for the future direction of NTCIR.


Noriko Kando presents the second MMM keynote.

Special Sessions

During the conference, four special sessions were held. Special sessions are mini-venues, each focusing on one state-of-the-art research direction within the multimedia field. The sessions are proposed and chaired by international researchers, who also manage the review process, in coordination with the Program Committee Chairs. This year’s sessions were:
– “Social Media Retrieval and Recommendation” organized by Liqiang Nie, Yan Yan, and Benoit Huet;
– “Modeling Multimedia Behaviors” organized by Peng Wang, Frank Hopfgartner, and Liang Bai;
– “Multimedia Computing for Intelligent Life” organized by Zhineng Chen, Wei Zhang, Ting Yao, Kai-Lung Hua, and Wen-Huang Cheng; and
– “Multimedia and Multimodal Interaction for Health and Basic Care Applications” organized by Stefanos Vrochidis, Leo Wanner, Elisabeth André, Klaus Schoeffmann.

Social Events

This year, there were two main social events at MMM 2017: a welcome reception at the Video Browser Showdown, as discussed above, and the conference banquet. Optional tours then allowed participants to further enjoy their stay on the unique and beautiful island.

The conference banquet was held in two parts. First, we visited the exotic Blue Lagoon, which is widely recognised as one of the modern wonders of the world and one of the most popular tourist destinations in Iceland. MMM participants had the option of bathing for two hours in this extraordinary spa, and applying the healing silica mud to their skin, before heading back for the banquet in Reykjavík.

The banquet itself was then held at the Harpa Reykjavik Concert Hall and Conference Centre in downtown Reykjavík. Harpa is one of Reykjavik‘s most recent, yet greatest and most distinguished landmarks. It is a cultural and social centre in the heart of the city and features stunning views of the surrounding mountains and the North Atlantic Ocean.

Harpa, the venue of the conference banquet.

Harpa, the venue of the conference banquet.

During the banquet, Steering Committee Chair Phoebe Chen gave a historical overview of the MMM conferences and announced the venues for MMM 2018 (Bangkok, Thailand) and MMM 2019 (Thessaloniki, Greece), before awards for the best contributions were presented. Finally, participants were entertained by a small choir, and were even asked to participate in singing a traditional Icelandic folk song.

MMM 2018 will be held at Chulalongkorn University in Bangkok, Thailand.  See

MMM 2018 will be held at Chulalongkorn University in Bangkok, Thailand. See


There are many people who deserve appreciation for their invaluable contributions to MMM 2017. First and foremost, we would like to thank our Program Committee Chairs, Laurent Amsaleg and Shin’ichi Satoh, who did excellent work in organizing the review process and helping us with the organization of the conference; indeed they are still hard at work with an MTAP special issue for selected papers from the conference. The Proceedings Chair, Gylfi Þór Guðmundsson, and Local Organization Chair, Marta Kristín Lárusdóttir, were also tirelessly involved in the conference organization and deserve much gratitude.

Other conference officers contributed to the organization and deserve thanks: Frank Hopfgartner and Esra Acar (demonstration chairs); Klaus Schöffmann, Werner Bailer and Jakub Lokoč (VBS Chairs); Yantao Zhang and Tao Mei (Sponsorship Chairs); all the Special Session Chairs listed above; the 150 strong Program Committee, who did an excellent job with the reviews; and the MMM Steering Committee, for entrusting us with the organization of MMM 2017.

Finally, we would like to thank our student volunteers (Atli Freyr Einarsson, Bjarni Kristján Leifsson, Björgvin Birkir Björgvinsson, Caroline Butschek, Freysteinn Alfreðsson, Hanna Ragnarsdóttir, Harpa Guðjónsdóttir), our hosts at Reykjavík University (in particular Arnar Egilsson, Aðalsteinn Hjálmarsson, Jón Ingi Hjálmarsson and Þórunn Hilda Jónasdóttir), the CP Reykjavik conference service, and all others who helped make the conference a success.

Report from ICACNI 2015


Report from the 3rd International Conference on Advanced Computing, Networking, and Informatics


Inauguration of 3rd ICACNI 2015

The 3rd International Conference on Advanced Computing, Networking and Informatics (ICACNI-2015), organized by School of Computer Engineering, KIIT University, Odisha, India, was held during 23-25 June, 2015.


Prof. Nikhil R. Pal during his keynote

The conference commenced with a keynote by Prof. Nikhil R. Pal (Fellow IEEE, Indian Statistical Institute, Kolkata, India) on ‘A Fuzzy Rule-Based Approach to Single Frame Super Resolution’.

Authors listening to technical presentations

Authors listening to technical presentations

Apart from three regular tracks on advanced computing, networking, and informatics, the conference hosted three invited special sessions. While a total of more than 550 articles across different tracks of the conference were received, 132 articles are finally selected for presentation and publication by Smart Innovation, Systems and Technologies series of Springer as Volume 43 and 44.

Prof. Nabendu Chaki during his technical talk

Prof. Nabendu Chaki during his technical talk

Extended versions of few extraordinary articles from these will be published by special issues of Egyptian Informatics Journal and Innovations in Systems and Software Engineering (A NASA Journal). The conference showcased a technical talk by Prof. Nabendu Chaki (Senior Member IEEE, Calcutta University, India) on ‘Evolution from Web-based Applications to Cloud Services: A Case Study with Remote Healthcare’.

A click from award giving ceremony

A click from award giving ceremony

The conference identified some wonderful works and have given away eight awards in different categories. The conference was successful to bring together academic scientists, professors, research scholars and students to share and disseminate information on knowledge and scientific research works related to the conference. 4th ICACNI 2016 is scheduled to be held at National Institute of Technology Rourkela, Odisha, India.

Summary of the 5th BAMMF


Bay Area Multimedia Forum (BAMMF)

BAMMF is a Bay Area Multimedia Forum series. Experts from both academia and industry are invited to exchange ideas and information through talks, tutorials, posters, panel discussions and networking sessions. Topics of the forum will include emerging areas in vision, audio, touch, speech, text, various sensors, human computer interaction, natural language processing, machine learning, media-related signal processing, communication, and cross-media analysis etc. Talks in the event may cover advancement in algorithms and development, demonstration of new inventions, product innovation, business opportunities, etc. If you are interested in giving a presentation at the forum, please contact us.

The 5th BAMMF

The 5th BAMMF was held in the George E. Pake Auditorium in Palo Alto, CA, USA on November 20, 2014. The slides and videos of the speakers at the forum have been made available on the BAMMF web page, and we provide here an overview of their talks. For speakers’ bios, the slides and videos, please visit the web page.

Industrial Impact of Deep Learning – From Speech Recognition to Language and Multimodal Processing

Li Deng (Deep Learning Technology Center, Microsoft Research, Redmond, USA)

Since 2010, deep neural networks have started making real impact in speech recognition industry, building upon earlier work on (shallow) neural nets and (deep) graphical models developed by both speech and machine learning communities. This keynote will first reflect on the historical path to this transformative success. The role of well-timed academic-industrial collaboration will be highlighted, so will be the advances of big data, big compute, and seamless integration between application-domain knowledge of speech and general principles of deep learning. Then, an overview will be given on the sweeping achievements of deep learning in speech recognition since its initial success in 2010 (as well as in image recognition since 2012). Such achievements have resulted in across-the-board, industry-wide deployment of deep learning. The final part of the talk will focus on applications of deep learning to large-scale language/text and multimodal processing, a more challenging area where potentially much greater industrial impact than in speech and image recognition is emerging.

Brewing a Deeper Understanding of Images

Yangqing Jia (Google)

In this talk I will introduce the recent developments in the image recognition fields from two perspectives: as a researcher and as an engineer. For the first part I will describe our recent entry “GoogLeNet” that won the ImageNet 2014 challenge, including the motivation of the model and knowledge learned from the inception of the model. For the second part, I will dive into the practical details of Caffe, an open-source deep learning library I created at UC Berkeley, and show how one could utilize the toolkit for a quick start in deep learning as well as integration and deployment in real-world applications.

Applied Deep Learning

Ronan Collobert (Facebook)

I am interested in machine learning algorithms which can be applied in real-life applications and which can be trained on “raw data”. Specifically, I prefer to trade simple “shallow” algorithms with task-specific handcrafted features for more complex (“deeper”) algorithms trained on raw features. In that respect, I will present several general deep learning architectures, which excels in performance on various Natural Language, Speech and Image Processing tasks. I will look into specific issues related to each application domain, and will attempt to propose general solutions for each use case.

Compositional Language and Visual Understanding

Richard Socher (Stanford)

In this talk, I will describe deep learning algorithms that learn representations for language that are useful for solving a variety of complex language tasks. I will focus on 3 projects:

  • Contextual sentiment analysis (e.g. having an algorithm that actually learns what’s positive in this sentence: “The Android phone is better than the IPhone”)
  • Question answering to win trivia competitions (like IBM Watson’s Jeopardy system but with one neural network)
  • Multimodal sentence-image embeddings to find images that visualize sentences and vice versa (with a fun demo!) All three tasks are solved with a similar type of recursive neural network algorithm.


Report from SLAM 2014


ISCA/IEEE Workshop on Speech, Language and Audio in Multimedia

Following SLAM 2013 in Marseille, France, SLAM 2014 was the second edition of the workshop, held in Malaysia as a satellite of Interspeech 2014. The workshop was organized over two days, one for science and one for socializing and community building. With about 15 papers and 30 attendees, the highly-risky second edition of the workshop showed the will to build a strong scientific community at the frontier of speech and audio processing, natural language processing and multimedia content processing.

The first day featured talks covering various topics related to speech, language and audio processing applied to multimedia data. Two keynotes from Shri Narayanan (University of Southern California) and Min-Yen Kan (National University of Singapore) nicely completed the program.
The second day took us on a tour of Penang followed by a visit of the campus of Universiti Sains Malaysia from which local organizers are. The tour offered plenty of opportunities to strengthen the links between participants and build a stronger community, as expected. Most participants later went ot Singapore to attend Interspeech, the main conference in the domain of speech communication, where further discussions went on.

We hope to collocate the next SLAM edition with a multimedia conference such as ACM Multimedia in 2015. Keep posted!

Report from ACM Multimedia 2013


Conference/Workshop Program Highlights

ACM Multimedia 2013 was held at the CCIB (Centre de Conventions Internacional de Barcelona) from October 21st to October 25th, 2012 in Barcelona. The Art Exhibition has been held for the entire duration of the conference at the FAD (Forment de les Arts i del Disseny) in the center of the city while the workshops were held in the Universitat Pompeu Fabra – Balmes building during the first two days of the conference (Oct. 21-Oct 22). It was the first time the conference was held in Spain and it offered a high-quality program and a few notable innovations. Dr. Nozha Boujemaa from INRIA, France, Dr. Alejandro Jaimes from Yahoo! Labs, Spain and Prof. Nicu Sebe from the University of Trento, Italy were the general co-chairs of the conference. Dr. Daniel Gatica-Perez from IDIAP & EPFL, Switzerland, Dr. David A. Shamma from Yahoo! Labs, USA, Prof. Marcel Worring from the University of Amsterdam, The Netherlands, and Prof. Roger Zimmermann from the National University of Singapore, Singapore were the program co-chairs. The entire organization committee is listed in Appendix A. The number of participants was 544. The main conference was attended by 476 participants out of which 425 paid and 51 participants were special cases (sponsors, student volunteers, etc.), and 68 participants attended workshops only. The tutorials which were free of charge were registered by 312 in advance. Multimedia art exhibition was open to public from Oct. 21 to Oct.28, and visited by more than 2,000 visitors. The total revenue of the conference was $318,151, and the surplus was $25,430.

The venue (CCIB)

Below is the list of the program components of Multimedia 2013.

  • ž Technical Papers: Full and Short papers
  • ž Keynote Talks
  • ž SIGMM Achievement Award Talk, Ph.D Thesis Award Talk
  • ž Panel
  • ž Brave New Ideas
  • ž Multimedia Grand Challenge Solutions
  • ž Technical Demos
  • ž Open Source Software Competition
  • ž Doctoral Symposium
  • ž Art Exhibition and Reception
  • ž Tutorials
  • ž Workshops
  • ž Awards and Banquet

Innovations made for Multimedia 2013:

In attempt to continuously improve ACM Multimedia and ensure its vibrant role for the multimedia community, we have made a number of enhancements for this year’s conference:

  • The Technical Program Committee defined twelve Technical Areas for major focus for this year’s conference, including introducing new Technical Areas for Music & Audio and Crowdsourcing to reflect their growing interest and promise. We have also changed the names of some traditional Technical Areas and provided extensive description of each area to help the authors choosing the most appropriate Technical Area for their manuscripts.
  • We have introduced a new role in the organization of the conference: the author’s advocate. His explicit role was to listen to the authors, and to help them if reviews are clearly below average quality. The authors could request the mediation of the author’s advocate after the reviews have been sent to them and they had to clearly justify the reasons why such mediation is needed (the reviews or the meta-review were below average quality). The task of the advocate was to investigate carefully the matter and to request additional review or reexamination of the decision of the particular manuscript. This year, the author’s advocate was Pablo Cesar from CWI, The Netherlands.
  • We have decided to keep a couple of plenary sessions which will bring singular focus to conference activities: keynotes, Multimedia Grand Challenge competition, Best Paper session, Technical Achievement Award and Best PhD Award sessions. The other technical sessions are held in parallel to allow pursuit of more specialized interests at the conference. We have limited the number of parallel session to no more than 3 to minimize the risk of having overlapping interests.
  • The use of video spotlights for advertising the works to be presented. These were meant to offer all attendees an opportunity to become aware of the content of each paper, and thus to be attracted to attend the corresponding poster or talk.
  • Workshops and Tutorials are held on separate days from the main conference in order to reduce conflict with the regular Technical Program.
  • The Multimedia Art Exhibition featured both invited and selected artists. It was open for the duration of the conference in the satellite venue located in the center of the city.
  • Following the last two years’ precedent, Tutorials are made free for all participants.
  • Recognizing that students are the lifeblood of our next generation of multimedia thinkers, this year’s Student Travel Grant was greatly expanded. We had a total amount of $26,000 received from SIGMM ($16,000) and NSF ($10,000) that supported 35 students.
  • Finally, we have decided to provide open access for the community to the proceedings available in the ACM Digital Libraries. As such, no USB proceedings were handed over to the participants encouraging everyone to get online access.

Technical Program

Following the guidelines of the ACM Multimedia Review Committee, the conference was structured into 12 Areas, with a two-tier TPC, a double-blind review process, and a target acceptance rate of 20% for long papers and 27.7% for short papers. Based on the experience from ACM Multimedia 2012 and the responses to our “Call for Areas” that we issued to the community, we selected the following Areas.

  1. Art, Entertainment, and Culture
  2. Authoring and Collaboration
  3. Crowdsourcing
  4. Media Transport and Delivery
  5. Mobile & Multi-device
  6. Multimedia Analysis
  7. Multimedia HCI
  8. Music & Audio
  9. Search, Browsing, and Discovery
  10. Security and Forensics
  11. Social Media & Presence
  12. Systems and Middleware

The Technical Program Committee was first created by appointing Area Chairs (ACs). A total of 29 colleagues agreed to serve in this role. Each Area was represented by two ACs, with exception of two Areas (Multimedia Analysis and Search, Browsing, and Discovery) whose scope has traditionally attracted the largest proportion of papers and so required further coordination. The added topic diversity brought an increase in gender diversity to the ACs, which increased from approximately 12% in previous years to 22% for 2013. We also made a conscious effort to bring new talent and excellence into the community and to better represent emerging trends in the field. For this we appointed many young and well recognized ACs who served in this role for the first time. For each junior AC, we co-appointed a senior researcher as their co-AC to aid in their shepherding. In a second step, the Area Chairs were responsible for appointing the TPC members (reviewers) for their coordinated areas. This was a large effort to grow the TPC base for the conference as well as ensure proper expertise was represented in each area. We coupled this with a hard goal of limiting the number of submissions assigned to each TPC member for review. For example, two years ago, the average number of papers assigned to a reviewer was 9 with over 38% of the approximately 225 TPC members receiving 10 or more papers to review. With our design, we had a total of 398 reviewers receiving an average of 4.13 papers per reviewer. While we were unable to keep a hard ceiling limitation, only 2.51% of the TPC received 10 or more papers to review—all TPC members who had agreed to serve in more than one area. The Area Chairs were in charge of assigning all papers for review, and each submission was reviewed double-blind by three TPC members. Reviews and reviewer assignments of papers co-authored by Area Chairs, Program Chairs, and General Chairs were handled by Program Chairs who had no conflicts of interest for each specific case. Another novelty introduced in the reviewing process was to set the paper submission deadline to a significantly earlier date than previous years, in order to allocate more time for reviews, rebuttals, discussions, and final decisions. Despite the reduced time given to authors, the response to the Call for Papers was enthusiastic with a total of 235 long papers and 278 short papers going through review. The authors of long papers were asked to write a rebuttal after receiving the reviews. A new element in the reviewing process was the introduction of the Author’s Advocate figure, created to provide authors with an independent channel to express concerns about the quality of the reviews for their papers, and to raise a flag about these reviews. All cases were brought to the attention to the corresponding Area Chair. After evaluating each case reported to him (16 reviews out of 761 long paper reviews), the Author’s Advocate recommended in 5 cases that new reviews were generated and added to the discussion. The reviewers had a period for on-line discussion of reviews and rebuttals, after which the Area Chairs drafted a meta-review for each paper. Decisions on long and short papers were made at the TPC meeting held at the University of Amsterdam on June 11, 2013. The meeting was physically attended by one of the General Chairs, three of the Program Chairs, the Author’s Advocate, and 86% of the ACs. Many of the ACs who were unable to attend were tele-present online for discussions. On the first half day of the TPC meeting, the Area Chairs worked in breakout sessions to discuss the papers that were weak accepts and weak rejects, with the exception of conflict of interest papers which were handled out of band as previously mentioned. In the second half of the first day, the ACs met in a plenary session where they reviewed the clear accepts and defended the decisions on the borderline papers based on the papers themselves, reviews, meta-reviews, on-line discussions, and authors’ rebuttal comments. In many cases, an emergency reviewer was added if there was clear intersection with a related submission area. If a paper had any conflict of interest during the plenary session with an Area, Program, or General Chair, they were excused from the room. On June 12, 2013, the Program Chairs finalized the process and conference program in a separate meeting—arranging the sessions by thematic narratives and not by submission area to promote cross-area conversations during the conference itself. The review process resulted in an overall acceptance rate of 20.0% for long papers and 27.7% for short papers (the distribution of submissions and the acceptance rate for each one of the 12 areas is shown in the graph below). All accepted long papers were shepherded by the Area Chairs themselves or by qualified TPC members who were in charge of verifying that the revised papers adequately addressed concerns raised by the reviewers and changes promised by authors in their rebuttals. This step ensured that all of the accepted papers are of the highest quality possible. In addition, four papers with high review scores were nominated at the TPC meeting as candidates for the Best Paper Award. Each nominated paper had to be successfully championed and defended by the ACs from that area. The winner was announced at the Conference Banquet.

ACM Multimedia 2013 Program at a Glance

The entire program of ACM Multimedia 2013 is shown below.

Workshop session

Conference venue

Opening ceremony

Keynote presentation

Poster/Demo session

SIGMM Achievement Award Talk

Keynote Talks

Multimedia Framed Dr. Elizabeth F. Churchill (Ebay Research Labs) Wednesday, Oct. 23, 2013 Abstract: Multimedia is the combination of several media forms. Information designers, educationalists and artists are concerned with questions such as: Is text, or audio or video, or a combination of all three, the best format for the message? Should another modality (e.g., haptics/touch, olfaction) be invoked instead to make the message more effective and/or the experience more engaging? How does the setting affect perception/reception? How does framing affect people’s experience of multimedia? How is the artifact changed through interaction with audience members? In this presentation, I will talk about people’s experience of multimedia artifacts like videos. I will discuss the ways in which framing affects how we experience multimedia. Framing can be intentional–scripted creations produced with clear intent by technologists, designers, media producers, media artists, film-makers, archivists, documentarians and architects. Framing can also be unintentional. Everyday acts of interest and consumption turn us, the viewers, into co-producers of the experiences of the multimedia artifacts we have viewed. We download, annotate, comment and share multimedia artifacts online. Our actions are reflected in view counts, displayed comments and content ranking. Our actions therefore change how multimedia artifacts are interpreted and understood by others. Drawing on examples from the history of film and of performance art, from current social media research and from research conducted with collaborators over the past 16 years, I will illustrate how content understanding is modulated by context, by the “framing” of the content. I will consider three areas of research that are addressing the issue of framing, and that have implications for our understanding of ‘multimedia’ consumption, now and in the future: (1) The psychology and psychophysiology of multimedia as multimodal experience; (2) Emerging practices with contemporary social media capture and sharing from personal devices; and (3) Innovations in social media and audience analytics focused on more deeply understanding media consumption. I will conclude with some technical excitements, design/development challenges and experiential possibilities that lie ahead. Dr. Elizabeth Churchill is Director of Human Computer Interaction at eBay Research Labs (ERL) in San Jose, California. Formerly a Principal Research Scientist at Yahoo! Research, she founded, staffed and managed the Internet Experiences Group. Until September of 2006, she worked at the Palo Alto Research Center (PARC), California, in the Computing Science Lab (CSL). Prior to that she formed and led the Social Computing Group at FX Palo Laboratory, Fuji Xerox’s research lab in Palo Alto. Originally a psychologist by training, throughout her career Elizabeth has focused on understanding people’s social and collaborative interactions in their everyday digital and physical contexts. With over 100 peer-reviewed publications and 5 edited books, topics she has written about include implicit learning, human-agent systems, mixed initiative dialogue systems, social aspects of information seeking, digital archive and memory, and the development of emplaced media spaces. She has been a regular columnist for ACM interactions since 2008. Elizabeth has a BSc in Experimental Psychology, an MSc in Knowledge Based Systems, both from the University of Sussex, and a PhD in Cognitive Science from the University of Cambridge. In 2010, she was recognised as a Distinguished Scientist by the Association for Computing Machinery (ACM). Elizabeth is the current Executive Vice President of ACM SigCHI (Human Computer Interaction Special Interest Group). She is a Distinguished Visiting Scholar at Stanford University’s Media X, the industry affiliate program to Stanford’s H-STAR Institute. The Space between the Images Leonidas J. Guibas (Stanford University) Thursday, Oct. 24, 2013 Abstract: Multimedia content has become a ubiquitous presence on all our computing devices, spanning the gamut from live content captured by device sensors such as smartphone cameras to immense databases of images, audio and video stored in the cloud. As we try to maximize the utility and value of all these petabytes of content, we often do so by analyzing each piece of data individually and foregoing a deeper analysis of the relationships between the media. Yet with more and more data, there will be more and more connections and correlations, because the data captured comes from the same or similar objects, or because of particular repetitions, symmetries or other relations and self-relations that the data sources satisfy. This is particularly true for media of a geometric character, such as GPS traces, images, videos, 3D scans, 3D models, etc. In this talk we focus on the “space between the images”, that is on expressing the relationships between different multimedia data items. We aim to make such relationships explicit, tangible, first class objects that themselves can be analyzed, stored, and queried — irrespective of the media they originate from. We discuss mathematical and algorithmic issues on how to represent and compute relationships or mappings between media data sets at multiple levels of detail. We also show how to analyze and leverage networks of maps and relationships, small and large, between inter-related data. The network can act as a regularizer, allowing us to to benefit from the “wisdom of the collection” in performing operations on individual data sets or in map inference between them. We will illustrate these ideas using examples from the realm of 2D images and 3D scans/shapes — but these notions are more generally applicable to the analysis of videos, graphs, acoustic data, biological data such as microarrays, homeworks in MOOCs, etc. This is an overview of joint work with multiple collaborators, as will be discussed in the talk. Prof. Leonidas Guibas obtained his Ph.D. from Stanford under the supervision of Donald Knuth. His main subsequent employers were Xerox PARC, DEC/SRC, MIT, and Stanford. He is currently the Paul Pigott Professor of Computer Science (and by courtesy, Electrical Engineering) at Stanford University. He heads the Geometric Computation group and is part of the Graphics Laboratory, the AI Laboratory, the Bio-X Program, and the Institute for Computational and Mathematical Engineering. Professor Guibas’ interests span geometric data analysis, computational geometry, geometric modeling, computer graphics, computer vision, robotics, ad hoc communication and sensor networks, and discrete algorithms. Some well-known past accomplishments include the analysis of double hashing, red-black trees, the quad-edge data structure, Voronoi-Delaunay algorithms, the Earth Mover’s distance, Kinetic Data Structures (KDS), Metropolis light transport, and the Heat-Kernel Signature. Professor Guibas is an ACM Fellow, an IEEE Fellow and winner of the ACM Allen Newell award.


SIGMM Achievement Award Talk Dick Bulterman, CWI, The Netherlands Friday, Oct. 25, 2013 The 2013 winner of SIGMM award for Outstanding Technical Contributions to Multimedia Computing, Communications and Applications is Prof. Dr. Dick Bulterman. The ACM SIGMM Technical Achievement award is given in recognition of outstanding contributions over a researcher’s career. Prof. Dick Bulterman has been selected for his outstanding technical contributions in multimedia authoring, media annotation, and social sharing from research through standardization to entrepreneurship, and in particular for promoting international Web standards for multimedia authoring and presentation (SMIL) in the W3C Synchronized Multimedia Working Group as well as his dedicated involvement in the SIGMM research community for many years. Dr. Dick Bulterman has been a long time intellectual leader in the area of temporal modeling and support for complex multimedia system. His research has led to the development of several widely used multimedia authoring systems and players. He developed the Amsterdam Hypermedia Model, the CMIF document structure, the CMIFed authoring environment, the GRiNS editor and player, and a host of multimedia demonstrator applications. In 1999, he started the CWI spinoff company called Oratrix Development BV, and he worked as CEO to widely deliver this software. He is currently a Research Group Head of the Distributed and Interactive Systems at Centrum Wiskunde & Informatica (CWI) in Amsterdam, The Netherlands. He is also a Full Professor of Computer Science at Vrije Universiteit, Amsterdam. His research interests are multimedia authoring and document processing. Dick has a strong international reputation for the development of the domain-specific temporal language for multimedia (SMIL). Much of this software has been incorporated into the widely used Ambulant Open Source SMIL Player, which has served to encourage development and use of time-based multimedia content. His conference publications and book on SMIL have helped to promote SMIL and its acceptance as a W3C standard. Dick’s recent work on social sharing of video will likely prove influential in upcoming Interactive TV products. This work has already been recognized in the academic community, earning the ACM SIGMM best paper award at ACM MM 2008 and also at the EUROITV conference. SIGMM Ph.D Thesis Award Talk Xirong Li, Remin University, China Friday, Oct. 25, 2013 The SIGMM Ph.D. Thesis Award Committee recommended this year’s award for the outstanding Ph.D. thesis in multimedia computing, communications and applications to Dr. Xirong Li. The committee considered Dr. Li’s dissertation titled “Content-based visual search learned from social media” as worthy of the award as it substantially extends the boundaries for developing content-based multimedia indexing and retrieval solutions. In particular, it provides fresh new insights into the possibilities for realizing image retrieval solutions in the presence of vast information that can be drawn from the social media. The committee considered the main innovation of Dr. Li’s work to be in the development of the theory and algorithms providing answers to the following challenging research questions: (a) what determines the relevance of a social tag with respect to an image, (b) how to fuse tag relevance estimators, (c) which social images are the informative negative examples for concept learning, (d) how to exploit socially tagged images for visual search and (e) how to personalize automatic image tagging with respect to a user’s preferences. The significance of the developed theory and algorithms lies in their power to enable effective and efficient deployment of the information collected from the social media to enhance the datasets that can be used to learn automatic image indexing mechanisms (visual concept detection) and to make this learning more personalized for the user. Dr. Xirong Li received the B.Sc. and M.Sc. degrees from the Tsinghua University, China, in 2005 and 2007, respectively, and the Ph.D. degree from the University of Amsterdam, The Netherlands, in 2012, all in computer science. The title of his thesis is “Content-based visual search learned from social media”. He is currently an Assistant Professor in the Key Lab of Data Engineering and Knowledge Engineering, Renmin University of China. His research interest is image search and multimedia content analysis. Dr. Li received the IEEE Transactions on Multimedia Prize Paper Award 2012, Best Paper Nominee of the ACM International Conference on Multimedia Retrieval 2012, Chinese Government Award for Outstanding Self-Financed Students Abroad 2011, and the Best Paper Award of the ACM International Conference on Image and Video Retrieval 2010. He served as publicity co-chair for ICMR 2013. Panel Cross-Media Analysis and Mining Wednesday, Oct 23, 2013 Panelists:Mark Zhang, Alberto del Bimbo, Selcuk Candan, Alexander Hauptmann, Ramesh Jain, Alexis Joly, Yueting Zhuang Motivation Today there are lots of heterogeneous and homogeneous media data from multiple sources, such as news media websites, microblog, mobile phone, social networking websites, and photo/video sharing websites. Integrated together these media data represent different aspects of the real-world and help document the evolution of the world. Consequently, it is impossible to correctly conceive and to appropriately understand the world without exploiting the data available on these different sources of rich multimedia content simultaneously and synergistically. Cross-media analysis and mining is a research area in the general field of multimedia content analysis which focuses on the exploitation of the data with different modalities from multiple sources simultaneously and synergistically to discover knowledge and understand the world. Specifically, we emphasize two essential elements in the study of cross-media analysis that help differentiate cross-media analysis from the rest of the research in multimedia content analysis or machine learning. The first is the simultaneous co-existence of data from two or more different data sources. This element indicates the concept of “cross”, e.g., cross-modality, cross-source, and cross cyberspace to reality. Cross-modality means that heterogeneous features are obtained from the data in different modalities; cross-source means that the data may be obtained across multiple sources (domains or collections); cross-space means that the virtual world (i.e., cyberspace) and the real world (i.e., reality) complement each other. The second is the leverage of different types of data across multiple sources for strengthening the knowledge discovery, for example, discovering the (latent) correlation or synergy between the data with different modalities across multiple sources, transferring the knowledge learned from one domain (e.g., a modality or a space) to generate knowledge in another related domain, and generating a summary with the data from multiple sources. There two essential elements help promote cross-media analysis and mining as a new, emerging, and important research area in today’s multimedia research. With the emphasis on knowledge discovery, cross-media analysis is different from the traditional research areas such as cross-lingual translation. On the other hand, with the general scenarios of the leverage of different types of data across multiple sources for strengthening the knowledge discovery, cross-media analysis and mining addresses a broader series of problems than the traditional research areas such as transfer learning. Overall, cross-media analysis and mining is beneficial for many applications in data mining, causal inference, machine learning, multimedia, and public security. Like other emerging hot topics in multimedia research, cross-media analysis and mining also has a number of fundamental and controversial issues that must be addressed in order to have a full and complete understanding of the research in this topic. These issues include but are not limited to whether or not there exists a unified representation or modeling for the same semantic concept from different media, and if there is what such unified representation or modeling is; whether or not there exists any “law” that governs the topic evolution and development over the time in different media and if there is what such “law” is and how it is formulated; whether or not there exists a mapping for a conceptual or semantic activity between the cyberspace and the real-world, and if there is what such a mapping is and how it is developed and formulated. Brave New Idea Program Brave New Ideas addressed long term research challenges, pointed to new research directions, or provided new insights or brave perspectives that pave the way to innovation. The selection process was different from the regular papers. First, submission of a 2 page abstract was requested. Then, the first selection was performed and a full paper was required for the selected abstracts and reviewed and chosen. We received 38 submissions for the first stage and 14 were invited to submit the full paper for the second reviewing stage. Finally, there were accepted 6 papers, which formed two sessions of oral presentations. Multimedia Grand Challenge Solutions We had received six challenges as shown below for the Multimedia Grand Challenge Solutions Program.

  1. NHK – Where is beauty? Grand Challenge
  2. Technicolor – Rich Multimedia Retrieval from Input Videos Grand Challenge
  3. Yahoo! – Large-scale Flickr-tag Image Classification Grand Challenge
  4. Huawei/3DLife – 3D human reconstruction and action recognition Grand Challenge
  5. MediaMixer/VideoLectures.NET – Temporal Segmentation and Annotation Grand Challenge
  6. Microsoft: MSR – Bing Image Retrieval Grand Challenge

We received 34 proposals for this program, and 14 of them were accepted for the presentation. In order to promote submissions, all presentations in this program were awarded as Multimedia Grand Challenge Finalists. The best prize and two second best prizes were chosen and awarded. Requested by Technicolor, the Grand Challenge Multimodal Prize was also chosen and awarded. Technical Demonstrations We have received 80 excellent technical demonstrations proposals. The number of submissions was in line to the demonstrations received in the previous year. Three reviewers were assigned to each demo proposal, and finally 40 proposals were chosen. The best demo prize was awarded. Open Source Software Competition This year was the 6th edition of the Open software competition being part of the ACM Multimedia program. The goal of this competition is to praise the invaluable contribution of researchers and software developers who advance the field by providing the community with implementations of codecs, middleware, frameworks, toolkits, libraries, applications, and other multimedia software. This year we have received 16 submissions and after assigning three reviewers to each of them we have selected 11 for the competition. The best open source software was awarded. Doctoral Symposium Doctoral Symposium was meant as a forum for mentoring graduate students. It was held in the afternoon of Oct. 25 both in the oral and poster formats. We have received 19 proposals for doctoral symposium. We accepted 13 presentations (6 oral + poster and 7 additional posters). Additionally, there was organized a Doctoral Symposium lunch in which the students had the opportunity to talk to their assigned mentors. Finally, the best doctoral symposium paper was awarded. Multimedia Art Exhibition and Reception ACM Multimedia provided a rich Multimedia Art Exhibition to stimulate artists and researchers alike to meet and discover the frontiers of multimedia artistic communication. The Art Exhibition has attracted significant work from a variety of digital artists collaborating with research institutions. We have endeavored to select exhibits that achieved an interesting balance between technology and artistic intent. The techniques underpinning these artworks are relevant to several technical tracks of the conference, in particular those dealing with human-centered and interactive media. We had a satellite venue, FAD (Forment de les Arts i del Disseny), for the art exhibition located in the center of the city. The venue had a very good public access. The exhibition was open from Oct. 21 to Oct. 28 and visited by more than 2,000 visitors. The reception event was held with the artists on Oct. 23. We had selected 10 art works for the exhibition:

  1. Emotion Forecast, Maurice Benayoun (City University of Hong Kong)
  2. Critical, Anabela Costa (France)
  3. Smile-Wall, Shen-Chi Chen, He-Lin Luo, Kuan-Wen Chen, Yu-Shan Lin, Hsiao-Lun Wang, Che-Yao Chan, Kai-Chih Huang, Yi-Ping Hung (National Taiwan University)
  4. SOMA, Guillaume Faure (France)
  5. A Feast of Shadow Puppetry, Zhenzhen Hu, Min Lin, Si Liu, Jiangguo Jiang, Meng Wang, Richang Hong, Shuicheng Yan, Hefei University of Technology and NUS
  6. Tele Echo Tube, Hill Hiroki Kobayashi, Kaoru Saito, Akio Fujiwara (University of Tokyo)
  7. 3D-Stroboscopy, Sujin Lee (Sogang University, South Korea)
  8. The Qi of Calligraphy, He-Lin Luo, Yi-Ping Hung (Taiwan National University), I-Chun Chen (Tainan National University of the Arts)
  9. Gestural Pen Animation, Sheng-Ying Pao and Kent Larson (MIT Media Lab, USA)
  10. MixPerceptions, Jose San Pedro (Telefonica Research, Spain), Aurelio San Pedro (Escola Massana, Barcelona), Juan Pablo Carrascal (UPF, Barcelona), Matylda Szmukier (Telefonica Research, Spain)

Attending the Art Exhibition

San Pedro’s Mix Perceptions


We received in 14 tutorial proposals and we have selected 8 tutorials for the main program. All tutorials were half day and were held on Oct. 21 and 22 in parallel with the workshops in the in the Universitat Pompeu Fabra – Balmes building. Tutorials were made free for all participants and we received 312 pre-registrations. Gerald Friedland(ICSI)

Tutorial 1 Foundations and Applications of Semantic Technologies for Multimedia Content
Ansgar Scherp (Uni Mannheim, Germany)
Tutorial 2 Towards Next-Generation Multimedia Recommendation Systems
Jialie Shen, (SMU Singapore)
Shuicheng Yan (NUS)
Xian-Sheng Hua (Microsoft)
Tutorial 3 Crowdsourcing for Multimedia Research
Mohammad Soleymani (Imperial College London)
Martha Larson (TU Delft)
Tutorial 4 Massive-Scale Multimedia Semantic Modeling
John R. Smith (IBM Research )
Liangliang Cao (IBM Research)
Tutorial 5 Social Interactions over Geographic-Aware Multimedia Systems
Roger Zimmerman (NUS)
Yi Yu (NUS)
Tutorial 6 Multimedia Information Retrieval: Music and Audio
Markus Schedl (JKU Linz)
Emilia Gomez (UPF)
Masataka Goto (AIST)
Tutorial 7 Blending the Physical and the Virtual in Musical Technology: From interface design to multimodal signal processing
George Tzanetakis (U Victoria, Canada)
Sidney Fels (UBC)
Michael Lyons (Ritsumeikan U, JP)
Tutorial 8 Privacy Concerns of Sharing Multimedia in Social Networks
Gerald Friedland (ICSI)


Workshops have always been an important part of the conference. Below is the list of workshops held in conjunction with ACM Multimedia 2013. We had 9 full day workshops and 4 half day workshops, which were held on Oct. 21-22 in parallel with the tutorials. We followed the rule from last year and two complementary workshop only registrations were provided for invited talks of each workshop to encourage participation of notable speakers.

Full Day Workshops (8)

  1. 2nd International Workshop on Socially-Aware Multimedia (SAM 2013) Organizers: Pablo Cesar (CWI, NL) Matthew Cooper (FXPAL) David A. Shamma (Yahoo!) Doug Williams (BT)
  1. 4th ACM/IEEE ARTEMIS 2013 International Workshop on Analysis and Retrieval of Tracked Events and Motion in Imagery Streams Organizers: Marco Bertini (University of Florence, Italy) Anastasios Doulamis (TU Crete, Greece) Nikolaos Doulamis (Cyprus University of Technology, Cyprus) Jordi Gonzàlez (Universitat Autònoma de Barcelona, Spain) Thomas Moeslund (University of Aalborg, Denmark)
  1. 5th International Workshop on Multimedia for Cooking and Eating Activities (CEA2013) Organizer: Kiyoharu Aizawa(Univ. of Tokyo, JP)
  1. 4th International Workshop on Human Behavior Understanding (HBU 2013) Organizers: Albert Ali Salah, Boğaziçi Univ., Turkey Hayley Hung, Delft Univ. of Technology, The Netherlands Oya Aran, Idiap Research Intitute, Switzerland Hatice Gunes, Queen Mary Univ. of London (QMUL), UK
  1. International ACM Workshop on Crowdsourcing for Multimedia 2013 (CrowdMM 2013) Organizers: Wei-Ta Chu (National Chung Cheng University, TW) Martha Larson (Delft University of Technology, NL) Kuan-Ta Chen (Academia Sinica, TW)
  1. First ACM MM Workshop on Multimedia Indexing and Information Retrieval for Healthcare (ACM MM MIIRH) Organizers: Jenny Benois-Pineau, University of Bordeaux 1, France Alexia Briasouli, CERTH -ITI Alex Hauptman, Carnegie-Mellon University, USA
  1. Workshop on Personal Data Meets Distributed Multimedia Organizers: Vivek Singh, MIT, USA Tat-Seng Chua, NUS Ramesh Jain, University of California, Irvine, USA Alex (Sandy) Pentland, MIT, USA
  1. Workshop on Immersive Media Experiences Organizers: Teresa Chambel, University of Lisbon, Portugal V. Michael Bove, MIT Media Lab, USA Sharon Strover, University of Texas at Austin, USAA Paula Viana, Polytechnic of Porto and INESC TEC, Portugal Graham Thomas, BBC, UK
  1. Workshop on Event-based Media Integration and Processing Organizers: Fausto Giunchiglia, University of Trento, Italy Sang “Peter” Chin, Johns Hopkins University, US Giulia Boato, University of Trento, Italy Bogdan Ionescu, University Politehnica of Bucharest, Romania Yiannis Kompatsiaris, Centre for Research and Technology Hellas, Greece

Half Day Workshops (4)

  1. ACM Multimedia Workshop on Geotagging and Its Applications Organizers: Liangliang Cao, IBM T. J. Watson Research Center, USA Gerald Friedland, International Computer Science Institute, USA, Pascal Kelm, Technische Universitaet of Berlin, Germany
  1. Data-driven challenge-based workshop ACM MM 2013(AVEC 2013) Organizers: Björn Schuller, TUM, Germany Michel Valstar, University of Nottingham, UK Roddy Cowie, Queen’s University Belfast, UK Maja Pantic, Imperial College London, UK Jarek Krajewski, University of Wuppertal, Germany
  1. 2nd ACM International Workshop on Multimedia Analysis for Ecological Data (MAED 2013) Organizers: Concetto Spampinato, University of Catania, Italy Vasileios Mezaris, CERTH, Greece Jacco van Ossenbruggen, CWI, The Netherlands
  1. 3rd International Workshop on Interactive Multimedia on Mobile and Portable Devices(IMMPD’13) Organizers: Jiebo Luo, University of Rochester, USA Caifeng Shan, Philips Research, The Netherlands Ling Shao, The University of Sheffield, UK Minoru Etoh, NTT DOCOMO, Japan


Awards were given in almost all the programs except for short papers during the banquet that was organized at the conference venue. The following awards have been given: Best Paper Award Luoqi Liu, Hui Xu, Junliang Xing, Si Liu, Xi Zhou and Shuicheng Yan, National University of Singapore (NUS), “Wow! You Are So Beautiful Today!” Best Student Paper Award Hanwang Zhang, Zheng-Jun Zha, Yang Yang, Shuicheng Yan, Yue Gao and Tat-Seng Chua, National University of Singapore (NUS), “Attributes-augmented Semantic Hierarchy for Image Retrieval” Grand Challenge 1st Place Award [Sponsored by Technicolor] Brendan Jou, Hongzhi Li, Joseph G. Ellis, Daniel Morozoff-Abegauz and Shih-Fu Chang, Digital Video & Multimedia (DVMM) Lab, Columbia University, “Structured Exploration of Who, What, When, and Where in Heterogenous Multimedia News Sources” Grand Challenge 2nd Place Award [Sponsored by Technicolor] Subhabrata Bhattacharya, Behnaz Nojavanasghari, Tao Chen, Dong Liu, Shih-Fu Chang, Mubarak Shah, University of Central Florida and Columbia University, “Towards a Comprehensive Computational Model for Aesthetic Assessment of Videos” Grand Challenge 3rd Place Award [Sponsored by Technicolor] Shannon Chen, Penye Xia, and Klara Nahrstedt, UIUC, “Activity-Aware Adaptive Compression: A Morphing-Based Frame Synthesis Application in 3DTI”

Program chairs during the banquet

Award ceremony

Banquet venue

Social program

Grand Challenge Multimodal Award [Sponsored by Technicolor] Chun-Che Wu, Kuan-Yu Chu, Yin-Hsi Kuo, Yan-Ying Chen, Wen-Yu Lee, Winston H. Hsu, National Taiwan University, Taiwan, “Search-Based Relevance Association with Auxiliary Contextual Cues” Best Demo Award Duong-Trung-Dung Nguyen, Mukesh Saini; Vu-Thanh Nguyen, Wei Tsang Ooi, National University of Singapore (NUS), “Jiku director: An online mobile video mashup system” Best Doctoral Symposium Paper Jules Francoise, Institut de Recherche et Coordination Acoustique/Musique (IRCAM), “Gesture-Sound Mapping by demonstration in Interactive Music Systems” Best Open Source Software Award Dmitry Bogdanov, Nicolas Wack, Emilia Gómez, Sankalp Gulati, Perfecto Herrera, Oscar Mayor, Gerard Roma, Justin Salamon, Jose Zapata Xavier Serra (UPF), “ESSENTIA: An Audio Analysis Library for Music Information Retrieval”

Prize amounts:

Best Paper Award 500 euro
Best Student Paper Award 250 euro
Grand Challenge 1st Prize 750 euro
Grand Challenge 2nd Prize 500 euro
Grand Challenge 3nd Prize 200 euro
Grand Challenge Multimodal Prize 500 euro
Best Technical Demo Award 250 euro
Best Doctoral Symposium Paper 250 euro
Best Open Source Software Award 250 euro
Student Travel Grant (35 students) $26,000 ($10,000 NSF, $16,000 SIGMM)

Sponsors:We had an incredible support from industries and funding organizations (38.5k euro). All the sponsors and the institutional supporters are listed in Appendix B. The sponsoring amount for each individual sponsor is as follows:

Sponsor Amount
FXPAL 5000 euro
Google 5000 euro
Huawei 5000 euro
Yahoo!Labs 5000 euro
Technicolor 4000 euro
Media Mixer 3500 euro
INRIA 3000 euro
Facebook 2000 euro
IBM 2000 euro
Telefonica 2000 euro
Microsoft 2000 euro
Total 38500 euro

The benefits for the sponsors were honorary registrations and publicity, that is, the company logo was published on the website of the conference, in the Proceedings, and the Booklet. On top of these amounts we have received 16k$ from SIGMM and 10K from NSF for student travel grants.

Geographical distribution of the participants

We had 544 participants at the main conference and workshops. The main conference was attended by 476 participants out of which 425 paid and 51 participants were special cases (sponsors, student volunteers, etc.), and 68 participants attended only the workshops. The tutorials which were free of charge were registered by 312 in advance. Country-wise distribution is shown below. As shown in the list, the geographical distribution was wide meaning that we managed to attract participants from a large number of countries.

Total  # of participants 544      
USA 75 Switzerland 20
Singapore 48 Germany 20
China 45 Portugal 20
Japan 40 Taiwan 18
UK 35 Korea 15
Italy 29 Australia 15
France 28 Greece 14
Netherlands 26 Turkey 14
Spain 26 25 other countries 56


In order to gather opinions from the participants at ACM Multimedia 2013 we have performed a post-conference survey and the results are summarized in Appendix C. Here we summarize the 10 most important issues that were compiled after analyzing the answers received. The effort to gather all this information is the first of its kind at ACM Multimedia and we hope this tradition will be continued in the future. The results of the survey represent in our opinion a very good source of information for the future organizers.

  1. Poster space too small
  2. Many people still want USB proceedings!!
  3. Oral topics in the same time slot overlapped too much. Need to diversify.
  4. Need to attract more multimedia niche topics. Should not become a second rate CV conference
  5. First day location hard to find. Workshop/tutorial better to be co-located with main conference
  6. Senior members of MM community should participate in paper sessions more
  7. Need to update web site program content and make it available earlier
  8. Consider offering short spotlight talks for poster papers
  9. Keep 15 mins for oral, but have them presented again in poster session for more discussion
  10. SIGMM business meeting too long. Not enough time for QA.


ACM Multimedia 2013 was a great success with a great number of submissions, an excellent technical program, attractive program components, and stimulating events. As a result, we welcomed a large number of participants, in line with our initial expectation. There were a few problems see above but this is only natural. We greatly acknowledge those who have contributed to the success of ACM Multimedia 2013. We thank the organizers of ACM Multimedia 2012 for their useful suggestions and comments which helped us to improve the organization the 2013 edition. We also thank them for giving us the template for the conference booklet. We thank the many paper authors and proposal contributors for the various technical and program components. We thank the large number of volunteers, including the Organizing Committee members and Technical Program Committee members who worked very hard to create this year’s outstanding conference. Every aspect of the conference was also aided by local committee members and by the hard work of Grupo Pacifico, to whom we are very grateful. We thank also ACM staff and Sheridan Printing Company for their constant support. This success was clearly due to the integration of their efforts.


General Co-Chairs  Alejandro (Alex) Jaimes (Yahoo Labs, Spain) Nicu Sebe (University of Trento, Italy) Nozha Boujemaa (INRIA, France) Technical Program Co-Chairs Daniel Gatica-Perez (IDIAP & EPFL, Switzerland) David A. Shamma (Yahoo Labs, USA) Marcel Worring (University of Amsterdam, The Netherlands) Roger Zimmermann (National University of Singapore, Singapore) Author’s Advocate Pablo Cesar (CWI, The Netherlands) Multimedia Grand Challenge Co-Chairs Yiannis Kompatsiaris (CERTH, Greece) Neil O’Hare (Yahoo Labs, Spain) Interactive Arts Co-Chairs Antonio Camurri (University of Genova, Italy) Marc Cavazza (Teesside University, UK) Local Arrangement Chair Mari-Carmen Marcos (Pompeu Fabra University, Spain) Sponsorship Chairs Ricardo Baeza-Yates (Yahoo Labs, Spain) Bernard Merialdo (Eurecom, France) Panel Co-Chairs  Yong Rui (Microsoft, China) Winston Hsu (National Tawain University, Taiwan) Michael Lew (University of Leiden, The Netherlands) Video Program Chairs Alexis Joly (INRIA, France) Giovanni Maria Farinella (University of Catania, Italy) Julien Champ (INRIA/LIRMM, France) Brave New Ideas Co-Chairs Jiebo Luo (University of Rochester, USA) Shuicheng Yan (National University of Singapore, Singapore) Doctorial Symposium Chairs Hayley Hung (Technical University of Delft, The Netherlands) Marco Cristani (University of Verona, Italy) Open Source Competition Chairs Ioannis (Yiannis) Patras (Queen Mary University, UK) Andrea Vedaldi (Oxford University, UK) Tutorial Co-Chairs Kiyoharu Aizawa (University of Tokyo, Japan) Lexing Xie (Australian National University, Australia) Workshop Co-Chairs Maja Pantic (Imperial College, UK ) Vladimir Pavlovic (Rutgers University, USA) Student Travel Grants Co-Chairs Ramanathan Subramanian (ADSC, Singapore) Jasper Uijlings (University of Trento, Italy) Publicity Co-Chairs Marco Bertini (University of Florence, Italy) Ichiro Ide (Nagoya University, Japan) Technical Demo Co-Chairs  Yi Yang (Carnegie Mellon University, USA) Xavier Anguera (Telefonica Research, Spain) Proceedings Co-Chairs  Bogdan Ionescu (University Politehnica of Bucharest, Romania) Qi Tian (University of Texas San Antonio, USA) Web Chair Michele Trevisol (Web Research Group UPF & Yahoo Labs, Spain)

Appendix B. ACM MM 2012 Sponsors & Supporters

Report from SLAM 2013

Intl. Workshop on Speech, Language and Audio in Multimedia

The International Workshop on Speech, Language and Audio in Multimedia (SLAM) is a yearly series of workshop to bring together researchers working in the broad field of speech, language and audio processing applied to the analysis, indexing and use of any type of multimedia data (e.g., broadcast, social media, audiovisual archives, online courses, music), with the goal of sharing recent research results, discussing ongoing and future projects as well as benchmarking initiatives and applications.

The very first edition of SLAM was held in Marseille, Aug. 22—23, 2013, as a satellite event of Interspeech 2013. Jointly patronized by the ISCA SIG on Speech and Language in Multimedia and the IEEE SIG on Audio and Speech Processing for Multimedia , the workshop was locally organized by the Laboratoire d’Informatique Fondamental (LIF) of Aix-Marseille University in a gorgeous location, the Parc du Pharo. SLAM received financial support from local institutions, from national and international associations and from national project in the field of multimedia. Financial support, combined with low-cost organization within a university setting, made it possible to maintain very low registration fees, in particular targeting students.

SLAM 2013 gathered 56 participants from around the world over a day and a half (see IMAGE(slam-photo-2.jpg)) CAPTION(Group picture of the SLAM 2013 attendees). The workshop was held in a very friendly atmosphere, with plenty of time for discussions on the side and a warm-hearted social event, yet featuring high-profile scientific communications in a number of topics and a keynote speech by Sam Davies on the BBC World Archives. Contributions were organized in five sessions, namely

  • audio & video event detection and segmentation
  • ASR in multimedia documents
  • multimedia person recognition
  • speaker & speaker roles recognition
  • multimedia applications and corpus

covering most of the topics targeted by the workshop. Major results from a vast number of projects were presented, generating fundamental discussions on the future of speech, language and audio in the multimedia sphere. We however hope to have in the future more contributions regarding non-speech audio processing. Proceedings are available online in open-access mode at

The SLAM workshop intends to establish itself as a yearly event, at the frontier between the audio processing, speech communication and multimedia communities. The second edition will be held in Penang, Malaysia, Sep. 11—12, 2014, as a satellite event of Interspeech 2014. See We hope to have SLAM organized as a satellite multimedia conferences in the near future and welcome bids for 2015.

GameDays & Edutainment 2012

Opening of the GameDays 2012

On behalf of the conference co-chairs, we wish to provide a report of the eight GameDays, which have been held from September 18th to 20th at Technische Universität Darmstadt and in the premises of Fraunhofer IGD.

Opening of the GameDays 2012

The GameDays are initiated and mainly organized by Dr. Stefan Göbel, the head of the Serious Games group at the Multimedia Communications Lab at TU Darmstadt. The GameDays take place as a “Science meets Business” event in the field of Serious Games on an annual basis since 2005 in cooperation with Hessen-IT, the Forum for Interdisciplinary Research of TU Darmstadt and other partners from science and industry. Read more

Report from NOSSDAV 2012

Setting for NOSSDAV 2012

NOSSDAV 2012, the 22nd SIGMM Workshop on Network and Operating Systems Support for Digital Audio and Video, was be held in Toronto, Canada, on June 7-8 2012. As in previous years, the workshop will continue to focus on both established and emerging research topics, high-risk high-return ideas and proposals, and future research directions in multimedia networking and systems, in a single-track format that encourages active participation and discussions among academic and industry researchers and practitioners. Read more