Report from ACM Multimedia 2017 – by Benoit Huet


Best #SIGMM Social Media Reporter Award! Me? Really?? fig_huet_1

This was my reaction after being informed by the SIGMM Social Media Editors that I was one of the two recipients following ACM Multimedia 2017! #ACMMM What a wonderful idea this is to encourage our community to communicate, both internally and to other related communities, about our events, our key research results and all the wonderful things the multimedia community stands for!  I have always been surprised by how limited social media engagement is within the multimedia community. Your initiative has all my support! Let’s disseminate our research interest and activities on social media! @SIGMM #Motivated


The SIGMM flagship conference took place on October 23-27 at the Computer History Museum in Mountain View California, USA. For its 25th edition, the organizing committee had prepared an attractive program cleverly mixing expected classics (i.e. Best Paper session, Grand Challenges, Open Source software competition, etc…) and brand new sessions (such as Fast Forward and Thematic Workshops, Business Idea Venture, and the Novel Topics Track). In this last edition, the conference adopted a single paper length, removing the boundary between long and short papers. The TPC Co-Chairs and Area Chairs had the responsibility of directing accepted papers to either an oral session or a thematic workshop.

Thematic workshops took the form of poster presentations. Presenters were asked to provide a short video briefly motivating their work with the intention of making them available online for reference after the conference (possibly with a link to the full paper and the poster!). However, this did not come through as publication permissions were not cleared out in time, but the idea is interesting and should be taken into account for future editions. Fast forward (or Thematic workshop pitches) are short targeted presentations aimed at attracting the audience to the Thematic Workshop where the papers are presented (in the form of posters in this case). While such short presentations allow conference attendees to efficiently identify which poster are relevant to them, it is crucial for presenters to be well prepared and concentrate on highlighting one key research idea, as time is very limited. It also gives more exposure to poster. I would be in favor of keeping such sessions for future ACM Multimedia editions.

The 25th edition of ACM MM wasn’t short of keynotes. No less than 6 industry keynotes had punctuated each of the conference half day. The first keynote by Achin Bhowmik from Starkey focused on Audio as a mean to “Enhancing and Augmenting Human Perception with Artificial Intelligence”. Bill Dally from NVidia presented “Efficient Methods and Hardware for Deep Learning”, in short why we all need GPUs! “Building Multi-Modal Interfaces for Smartphones” was the topic presented by Injong Rhee (Samsung Electronics), Scott Silver (YouTube) discussed the difficulties in “Bringing a Billion Hours to Life” (referring to the vast quantities of videos uploaded and viewed on the sharing platform, and the long tail). Ed. Chang from HTC presented “DeepQ: Advancing Healthcare Through AI and VR” and demonstrated how healthcare is and will benefit from AR, VR and AI. Danny Lange from Unity Technologies highlighted how important machine learning and deep learning are in the game industry in ”Bringing Gaming, VR, and AR to Life with Deep Learning”.  Personally, I would have preferred a mix of industry/academic keynotes as I found some of the keynotes not targeting an audience of computer scientists.

Arnold W. M. Smeulders received the SIGMM Technical Achievement Award for his outstanding and pioneeringfig_huet_3 contribution defining and bridging the semantic gap in content based image retrieval (his lecture is here: His talk was sharp, enlightening and very well received by the audience.

The @sigmm rising star award went to Dr Liangliang Cao for his contribution to large-scale multimedia recognition and social media mining.

The conference was noticeably flavored with trendy topics such as AI, Human augmenting technologies, Virtual and Augmented Reality, and Machine (Deep) Learning, as can be noticed from the various works rewarded.

The Best Paper award was given to Bokun Wang, Yang Yang, Xing Xu, Alan Hanjalic, Heng Tao Shen for their work on “Adversarial Cross-Modal Retrieval“.

Yuan Tian, Suraj Raghuraman, Thiru Annaswamy, Aleksander Borresen, Klara Nahrstedt, Balakrishnan Prabhakaran received the Best Student Paper award for the paper “H-TIME: Haptic-enabled Tele-Immersive Musculoskeletal Examination“.

The Best demo award went to “NexGenTV: Providing Real-Time Insight during Political Debates in a Second Screen Application” by Olfa Ben Ahmed, Gabriel Sargent, Florian Garnier, Benoit Huet, Vincent Claveau, Laurence Couturier, Raphaël Troncy, Guillaume Gravier, Philémon Bouzy and Fabrice Leménorel.

The Best Open source software award was received by Hao Dong, Akara Supratak, Luo Mai, Fangde Liu, Axel Oehmichen, Simiao Yu, Yike Guo for “TensorLayer: A Versatile Library for Efficient Deep Learning Development“.

The Best Grand Challenge Video Captioning Paper award went to “Knowing Yourself: Improving Video Caption via In-depth Recap“, by Qin Jin, Shizhe Chen, Jia Chen, Alexander Hauptmann.

The Best Grand Challenge Social Media Prediction Paper award went to Chih-Chung Hsu, Ying-Chin Lee, Ping-En Lu, Shian-Shin Lu, Hsiao-Ting Lai, Chihg-Chu Huang,Chun Wang, Yang-Jiun Lin, Weng-Tai Su for “Social Media Prediction Based on Residual Learning and Random Forest“.

Finally, the Best Brave New Idea Paper award was conferred to John R Smith, Dhiraj Joshi, Benoit Huet, Winston Hsu and Zef Cota for the paper “Harnessing A.I. for Augmenting Creativity: Application to Movie Trailer Creation“.

A few years back, the multimedia community was concerned with the lack of truly multimedia publications. In my opinion, those days are behind us. The technical program has evolved into a richer and broader one, let’s keep the momentum!

The location was a wonderful opportunity for many of the attendees to take a stroll down memory lane and see fig_huet_4computers and devices (VT100, PC, etc…) from the past thanks to the complementary entrance to the museum exhibitions. The “isolated” location of the conference venue meant going out for lunch breaks was out of the question given the duration of the lunch break. As a solution, the organizers catered buffet lunches. This resulted in the majority of the attendees interacting and mixing over the lunch break while eating. This could be an effective way to better integrate new participants and strengthen the community.  Both the welcome reception and the banquet were held successfully within Computer Museum. Both events offer yet another opportunity for new connections to be made and for further interaction between attendees. Indeed, the atmosphere of both occasions was relaxed, lively and joyful. 

All in all, ACM MM 2017 was another successful edition of our flagship conference, many thanks to the entire organizing team and see you all in Seoul for ACM MM 2018 and follow @sigmm on Twitter!

Report from ACM Multimedia 2017 – by Conor Keighrey


My name is Conor Keighrey, I’m a PhD. candidate at the Athlone Institute Technology in Athlone, Co. Westmeath, Ireland.  The focus of my research is to understand the key influencing factors that affect Quality of Experience (QoE) in emerging immersive multimedia experiences, with a specific focus on applications in the speech and language therapy domain. I am funded for this research by the Irish Research Council Government of Ireland Postgraduate Scholarship Programme. I’m delighted to have been asked to present this report to the SIGMM community as a result of my social media activity at ACM Multimedia Conference.

Launched in 1993, the ACM Multimedia (ACMMM) Conference held its 25th anniversary event in the Mountain View, California. The conference was located in the heart of Silicon Valley, at the inspirational Computer History Museum.

Under five focal themes, the conference called for multimedia papers which focused on topics relating to multimedia: Experience, Systems and Applications, Understanding, Novel Topics, and Engagement.

Keynote addresses were delivered by high-profile industry leading experts from the field of multimedia. These talks provided insight into the active development from the following experts:

  • Achin Bhowmik (CTO & EVP, Starkey, USA)
  • Bill Dally (Senior Vice President and Chief Scientist, NVidia, USA)
  • Injong Rhee (CTO & EVP, Samsung Electronics, Korea)
  • Edward Y. Chang (President, HTC, Taiwan)
  • Scott Silver (Vice President, Google, USA)
  • Danny Lange (Vice President, Unity Technologies, USA)

Some keynote highlights include Bill Dally’s talk on “Efficient Methods and Hardware for Deep Learning”. Bill provided insight into the work NVidia are doing with neural networks, the hardware which drives them, and the techniques the company are using to make them more efficient. He also highlighted how AI should not be thought of as a mechanism which replaces, but empower humans, thus allowing us to explore more intellectual activities.

Danny Lange of Unity Technologies discussed the application of the Unity game engine to create scenarios in which machine learning models can be trained. His presentation entitled “Bringing Gaming, VR, and AR to Life with Deep Learning” described the capture of data for self-driving cars to prepare for unexpected occurrences in the real world (e.g. pedestrians activity or other cars behaving in unpredicted ways).

A number of the Keynotes were captured by FXPAL (an ACMMM Platinum Sponsor) and are available here.

With an acceptance rate of 27.63% (684 reviewed, 189 accepted), the main track at ACMMM showcased a diverse collection of research from academic institutes around the globe. An abundance of work was presented in the ever-expanding area of deep/machine learning, virtual/augmented/mixed realities, and the traditional multimedia field.


The importance of gender equality and diversity with respect to advancing careers of women in STEM has never been greater. Sponsored by SIGMM, the Women/Diversity in MM lunch took place on the first day of ACMMM. Speakers such as Prof. Noel O’Conner discussed the significance of initiatives such as Athena SWAN (Scientific Women’s Academic Network) within Dublin City University (DCU). Katherine Breeden (Pictured left), an Assistant Professor in the Department of Computer Science at Harvey Mudd College (HMC), presented a fantastic talk on gender balance at HMC. Katherine’s discussion highlighted the key changes which have occurred resulting in more women than men graduating with a degree in computer science at the college.

Other highlights from day 1 include a paper presented at the Experience 2 (Perceptual, Affect, and Interaction) session, chaired by Susanne Boll (University of Oldenburg). Researchers from the National University of Singapore presented the results of a multisensory virtual cocktail (Vocktail) experience which was well received. 


conor3Through the stimulation of 3 sensory modalities, Vocktails aim to create virtual flavor, and augment taste experiences through a customizable interactive drinking utensil. Controlled by a mobile device, participants of the study experienced augmented taste (electrical stimulation of the tongue), smell (micro air-pumps), and visual (RGB light projected onto the liquid) stimulus as they used the system. For more information, check out their paper entitled “Vocktail: A Virtual Cocktail for Pairing Digital Taste, Smell, and Color Sensations” on the ACM Digital Library.

Day 3 of the conference included a session entitled Brave New Ideas. The session presented a fantastic variety of work which focused on the use of multimedia technologies to enhance or create intelligent systems. Demonstrating AI as an assistive tool and winning the Best Brave New Idea Paper award, a paper entitled “Harnessing A.I. for Augmenting Creativity: Application to Movie Trailer Creation” (ACM Digital Library) describes the first-ever human machine collaboration for creating a real movie trailer. Through multi-modal semantic extraction, inclusive of audio-visual, scene analysis, and a statistical approach, key moments which characterize horror films were defined. As a result of this, the AI selected 10 scenes from a feature length film which were further developed alongside a professional film maker to finalize an exciting movie trailer. Officially released by 20th Century Fox, the complete AI trailer for the horror movie “Morgan” can be viewed here.

A new addition to the last ACMMM edition year has been the inclusion of thematic workshops. Four individual workshops (as outlined below) provided opportunity for papers which could not be accommodated within the main track to be presented to the multimedia research community. A total of 495 papers were reviewed from which 64 were accepted (12.93%). Authors of accepted papers presented their work via on-stage thematic workshop pitches, which were followed by poster presentations on Monday the 23rd and Friday the 27th. The workshop themes were as follows:

  • Experience (Organised by Wanmin Wu)
  • Systems and Applications (Organised by Roger Zimmermann & He Ma)
  • Engagement (Organised by Jianchao Yang)
  • Understanding (Organised by Qi Tian)

Presented as part of the thematic workshop pitches, one of the most fascinating demos at the conference was a body of work carried out by Audrey Ziwei Hu (University of Toronto). Her paper entitled “Liquid Jets as Logic-Computing Fluid-User-Interfaces” describes a fluid (water) user interface which is presented as a logic-computing device. Water jets form a medium for tactile interaction and control to create a musical instrument known as a hydraulophone.

conor4Steve Mann (Pictured left) from Stanford University, who is regarded as “The Father of Wearable Computing”, provided a fantastic live demonstration of the device. The full paper can be found on the ACM Digital Library, and a live demo can be seen here.

In large scale events such ACMMM, the importance of social media reporting/interaction has never been greater. More than 250 social media interactions (tweets, retweets, and likes) were monitored using the #SIGMM and #ACMMM hashtags, as outlined by the SIGMM Records prior to the event. Descriptive (and multimedia enhanced) social media reports provide a chance for those who encounter an unavoidable schedule overlap, and an opportunity to gather some insight into alternative works presented at the conference.

From my own perspective (as a PhD. student), the most important aspect of social media interaction is that reports often serve as a conversational piece. Developing a social presence throughout the many coffee breaks and social events during the conference is key to the success of building a network of contacts within any community. As a newcomer this can often be a daunting task, recognition of other social media reporters offers the perfect ice-breaker, providing opportunity to discuss and inform each other of the on-going work within the multimedia community. As a result of my own online reporting, I was recognized numerous times throughout the conference. Staying active on social media often leads to the development of a research audience, and social media presence among peers. Engaging in such an audience is key to the success of those who wish to follow a path in academia/research.

Building on my own personal experience, continued attendance to SIGMM conferences (irrespective of paper submission) has so many advantages. While the predominant role of a conference is to disseminate work, the informative aspect of attending such events is often overlooked. The area of multimedia research is moving at a fast pace, and thus having the opportunity to engage directly with researchers in your field of expertise is of upmost importance. Attendance to ACMMM and other SIGMM conferences, such ACM Multimedia Systems, has inspired me to explore alternative methodologies within my own respective research. Without a doubt, continued attendance will inspire my research as I move forward.

ACM Multimedia ‘18 (October 22nd – 26th) – The diverse landscape of modern skyscrapers mixed with traditional Buddhist temples, and palaces that is Seoul, South Korea, will be host to the 26th Annual ACMMM. The 2018 event will without a doubt present a variety of work from the multimedia research community. Regular paper abstracts are due on the 30th of March (Full manuscripts are due on the 8th of April). For more information on next year’s ACM Multimedia conference check out the following link:

The Deep Learning Indaba Report


Given the focus on deep learning and machine learning, there is a need to address this problem of low participation of Africans in data science and artificial intelligence. The Deep Learning Indaba was thus born to stimulate the participation of Africans within the research and innovation landscape surrounding deep learning and machine learning. This column reports on the Deep Learning Indaba event, which consisted of a 5-day series of introductory lectures on Deep Learning, held from 10-15 September 2017, coupled with tutorial sessions where participants gained practical experience with deep learning software packages. The column also includes interviews with some of the organisers to learn more about the origin and future plans of the Deep Learning Indaba.


Africans have a low participation in the area of science called deep learning and machine learning, as shown by the fact that at the 2016 Neural Information Processing Systems (NIPS’16) conference, none of the accepted papers had at least one author from a research institution in Africa (

Given the increasing focus on deep learning, and the more general area of machine learning, there is a need to address this problem of low participation of Africans in the technology that underlies the recent advances in data science and artificial intelligence that is set to transform the way the world works. The Deep Learning Indaba was thus born, aiming to be a series of master classes on deep learning and machine learning for African researchers and technologists. The purpose of the Deep Learning Indaba was to stimulate the participation of Africans, within the research and innovation landscape surrounding deep learning and machine learning.

What is an ‘indaba’?

According to the organisers ‘indaba’ is a Zulu word that simply means gathering or meeting. There are several words for such meetings (that are held throughout southern Africa) including an imbizo (in Xhosa), an intlanganiso, and a lekgotla (in Sesotho), a baraza (in Kiswahili) in Kenya and Tanzania, and padare (in Shona) in Zimbabwe. Indabas have several functions: to listen and share news of members of the community, to discuss common interests and issues facing the community, and to give advice and coach others. Using the word ‘indaba’ for the Deep Learning event connects it to other community gatherings that are similarly held by cultures throughout the world. The Deep Learning Indaba is about the spirit of coming together, of sharing and learning and is one of the core values of the event.

The Deep Learning Indaba

After a couple of months of furious activity by the organisers, roughly 300 students, researchers and machine learning practitioners from all over Africa gathered for the first Deep Learning Indaba from 10-15 September 2017 at the University of Witswatersrand, Johannesburg, South Africa. More than 30 African countries were represented for an intense week of immersion into Deep Learning.

The Deep Learning Indaba consisted of a 5-day series of introductory lectures on Deep Learning, coupled with tutorial sessions where participants gained practical experience with deep learning software packages such as TensorFlow. The format of the Deep Learning Indaba was based on the intense summer school experience of NIPS. Presenters at the Indaba included prominent figures in the machine learning community such as Nando de Freitas, Ulrich Paquet and Yann Dauphin. The lecture sessions were all recorded and all the practical tutorials are also available online: Lectures and Tutorials.

After organising the first successful Deep Learning Indaba in Africa (a report on the outcomes of the Deep Learning Indaba can be found at online), the organisers have already started planning the next two Deep Learning Indabas, that will take place in 2018 and 2019. More information can be found at the Deep Learning Indaba website

Having been privileged to attend this first Deep Learning Indaba, a number of the organisers were interviewed to learn more about the origin and future plans of the Deep Learning Indaba. The interviewed organisers include Ulrich Paquet and Stephan Gouws.

Question 1: What was the origin of the Deep Learning Indaba?

Ulrich Paquet: We’d have to dig into history a bit here, as the dream of taking ICML (International Conference on Machine Learning) to South Africa has been around for a while. The topic was again raised at the end of 2016, when Shakir and I sat at NIPS (Conference on Neural Information Processing Systems), and said “let’s find a way to make something happen in 2017.” We were waiting for the right opportunity. Stephan has been thinking along these lines, and so has George Konidaris. I met Benjamin Rosman in January or February over e-mail, and within a day we were already strategizing what to do.

We didn’t want to take a big conference to South Africa, as people parachute in and out, without properly investing in education. How can we make the best possible investment in South African machine learning? We thought a summer school would be the best vehicle, but more than that, we wanted a summer school that would replicate the intense NIPS experience in South Africa: networking, parties, high-octane teaching, poster sessions, debates and workshops…

Shakir asked Demis Hassibis for funding in February this year, and Demis was incredibly supportive. And that got the ball rolling…

Stephan Gouws: It began with the question that was whispered amongst many South Africans in the machine learning industry: “how can we bring ICML to South Africa?” Early in 2017, Ulrich Paquet and Shakir Mohamed (both from Google DeepMind) began a discussion regarding how a summer school-like event can be held in South Africa. A summer school-like event was chosen as it typically has a bigger impact after the event than a typical conference. Benjamin Rosman (from the South African Council of Scientific and Industrial Research), Nando de Freitas (also from Google DeepMind) joined the discussion in February. A fantastic group of researchers from South Africa was gathered that shared the vision of making the event a reality. I suggested the name “Deep Learning Indaba”, we registered a domain, and from there we got the ball rolling!

Question 2: What did the organisers want to achieve with the Indaba?

Ulrich Paquet: Strengthening African Machine Learning

“a shared space to learn, to share, and to debate the state-of-the-art in machine learning and artificial intelligence”

  • Teaching and mentoring
  • Building a strong research community
  • Overcoming isolation

We also wanted to work towards inclusion; build a community; confidence building; affect government policy.

Stephan Gouws: Our vision is to strengthen machine learning in Africa. Machine learning experts, workshop and conferences are mostly concentrated in North America and Western-Europe. African do not easily get the opportunity to be exposed to such events as they are far away, expensive to attend, etc. Furthermore, with a conference a bunch of experts fly in, discuss the state-of-the-art of the field, and then fly away. A conference does not easily allow for a transfer of expertise, and therefore the local community does not gain much from a conference. With the Indaba, we hoped to facility a knowledge transfer (for which a summer school-like event is better suited), and also to create networking opportunities for students, industry, academics and the international presenters.

Question 3: Why was the Indaba held in South Africa?

Ulrich Paquet: All of the (original) organizers are South African, and really care about development of their own country. We want to reach beyond South Africa, though, and tried to include as many institutions as possible (more than 20 African countries were represented).

But, one has to remember that the first Indaba was essentially an experiment. We had to start somewhere! We benefit by having like-minded local organizers :)

Stephan Gouws: All the organisers are originally from South Africa and want to support and strengthen the machine learning field in South Africa (and eventually in the rest of Africa).

Question 4: What was the expectations beforehand for the Indaba? (For example, how many people did the organisers expect will attend?)

Ulrich Paquet: Well, we originally wanted to run a series of master classes for 40 students. We had ABSOLUTELY NO idea how many students would apply, or if any would even apply. We were very surprised when we hit more than 700 applications by our deadline, and by then, the whole game changed. We couldn’t take 40 out of 700, and decided to go for the largest lecture hall we could possibly find (for 300 people).

There are then other logistics of scale that come into play: feeding everyone, transporting everyone, running practical sessions, etc. And it has to be within budget!! The cap at 300 seemed to work well.

Question 5: Are there any plans for the future of the Indaba? Are you planning on making it an annual event?

Ulrich Paquet: Yes, definitely.

Stephan Gouws: Nothing official yet, but the plan from the beginning was to make it an annual event.

[Editor]:  The Deep Learning Indaba 2018 has since been announced and more information can be found at the following link:  The organisers have also announced locally organised, one-day Indabas to be held from 26 March to 6 April 2108 with the aim of strengthening the African Machine learning community. Details for obtaining support for the organising of an IndabaX event can be found at the main site:

Question 6: How can students, researchers and people from industry still get and stay involved after the Indaba?

Ulrich Paquet: There are many things that could be changed with enough critical mass. One, that we’re hoping, is to ensure that the climate for research in sub-Saharan Africa is as fertile as possible. This will only happen through lots of collaboration and cross-pollination. There are some things that stand in the way of this kind of collaboration. One is government KPIs (key performance indicators) that rewards research: for AI, it does not rightly reward collaboration, and does not rightly reward publications in top-tier platforms, which are all conferences (NIPS, ICML). Therefore, it does not reward playing in and contributing to the most competitive playing field. These are all things that the AI community in SA should seek to creatively address and change.

We have seen organic South African papers published at UAI and ICML for the first time this year, and the next platforms should be JMLR and NIPS, and then Nature. There’s never been any organic Africa AI or machine learning papers in any of the latter venues. Students should be encouraged to collaborate and submit to them! The nature of the game is that the barrier to entry for these venues is so high, that one has to collaborate… This of course brings me to my point about why research grants (in SA) should be revisited to reflect these outcomes.

Stephan Gouws: In short, yes. All the practical, lectures and videos are made publicly available. There is also Facebook and WhatsApp groups, and we hope that the discussion and networking will not stop after the 15th of September. As a side note: I am working on ideas (more aimed at postgraduate students) to eventually put a mentor system in place, as well as other types of support for postgraduate students after the Indaba. But it is still early days and only time will tell.

Biographies of Interviewed Organisers

Ulrich Paquet (Research Scientist, DeepMind, London):

Ulrich Paquet

Dr. Ulrich Paquet is a Research Scientist at DeepMind, London. He really wanted to be an artist before stumbling onto machine learning while attending a third-year course taught at University of Pretoria (South Africa) where he eventually obtained a Master’s degree in Computer Science. In April 2007 Ulrich obtained his PhD from the University of Cambridge with dissertation topic “Bayesian Inference for Latent Variable Models.” After obtaining his PhD he worked with a start-up called Imense, focusing on face recognition and image similarity search. He then joined Microsoft’s FUSE Labs, based at Microsoft Research Cambridge, where he eventually worked on the XBox-One launch as part of the Xbox Recommendations team. From 2015 he joined another start-up in Cambridge, VocalIQ, which has been acquired by Apple before joining DeepMind in April 2016.

Stephan Gouws (Research Scientist, Google Brain Team):

Stephan Gouws

Dr. Stephan Gouws is a Research Scientist at Google and part of the Google Brain Team that developed TensorFlow and Google’s Neural Machine Translation System. His undergraduate studies was a double major in Electronic Engineering and Computer Science at Stellenbosch University (South Africa). His postgraduate studies in Electronic Engineering were also completed at the MIH Media Lab at Stellenbosch University. He obtained his Master’s degree cum laude in 2010 and his PhD degree in 2015 on the dissertation topic of “Training Neural Word Embeddings for Transfer Learning and Translation.” During his PhD he spent one year at Information Sciences Institute (ISI) at the University of Southern California in Los Angeles, and 1 year at Montreal Institute for Learning Algorithms where he worked closely with Yoshua Bengio. He also worked as Research Intern at both Microsoft Research and Google Brain during this period.

The Deep Learning Indaba Organisers:

Shakir Mohamed (Research Scientist, DeepMind, London)
​Nyalleng Moorosi (Researcher, Council for Scientific and Industrial Research, South Africa)
Ulrich Paquet (Research Scientist, DeepMind, London)
​Stephan Gouws (Research Scientist, Google, Brain Team, London)
Vukosi Marivate (Researcher, Council for Scientific and Industrial Research, South Africa)
Willie Brink (Senior Lecturer, Stellenbosch University, South Africa)
Benjamin Rosman (Researcher, Council for Scientific and Industrial Research, South Africa)
Richard Klein (Associate Lecturer, University of the Witwatersrand, South Africa)

Advisory Committee:

Nando De Freitas (Research Scientist, DeepMind, London)
Ben Herbst (Professor, Stellenbosch University)
Bonolo Mathibela (Research Scientist, IBM Research South Africa)
​George Konidaris (Assistant Professor, Brown University)​
​Bubacarr Bah (Research Chair, African Institute for Mathematical Sciences, South Africa)

Report from ACM MMSys 2017

–A report from Christian Timmerer, AAU/Bitmovin Austria

The ACM Multimedia Systems Conference (MMSys) provides a forum for researchers to present and share their latest research findings in multimedia systems. It is a unique event targeting “multimedia systems” from various angles and views across all domains instead of focusing on a specific aspect or data type. ACM MMSys’17 was held in Taipei, Taiwan in June 20-23, 2017.

MMSys is a single-track conference which hosts also a series of workshops, namely NOSSDAV, MMVE, and NetGames. Since 2016, it kicks off with overview talks and 2017 we’ve seen the following talks: “Geometric representations of 3D scenes” by Geraldine Morin; “Towards Understanding Truly Immersive Multimedia Experiences” by Niall Murray; “Rate Control In The Age Of Vision” by Ketan Mayer-Patel; “Humans, computers, delays and the joys of interaction” by Ragnhild Eg; “Context-aware, perception-guided workload characterization and resource scheduling on mobile phones for interactive applications” by Chung-Ta King and Chun-Han Lin.

Additionally, industry talks have been introduced: “Virtual Reality – The New Era of Future World” by WeiGing Ngang; “The innovation and challenge of Interactive streaming technology” by Wesley Kuo; “What challenges are we facing after Netflix revolutionized TV watching?” by Shuen-Huei Guan; “The overview of app streaming technology” by Sam Ding; “Semantic Awareness in 360 Streaming” by Shannon Chen; “On the frontiers of Video SaaS” by Sega Cheng.

An interesting set of keynotes presented different aspects related multimedia systems and its co-located workshops:

  • Henry Fuchs, The AR/VR Renaissance: opportunities, pitfalls, and remaining problems
  • Julien Lai, Towards Large-scale Deployment of Intelligent Video Analytics Systems
  • Dah Ming Chiu, Smart Streaming of Panoramic Video
  • Bo Li, When Computation Meets Communication: The Case for Scheduling Resources in the Cloud
  • Polly Huang, Measuring Subjective QoE for Interactive System Design in the Mobile Era – Lessons Learned Studying Skype Calls

IMG_4405The program included a diverse set of topics such as immersive experiences in AR and VR, network optimization and delivery, multisensory experiences, processing, rendering, interaction, cloud-based multimedia, IoT connectivity, infrastructure, media streaming, and security. A vital aspect of MMSys is dedicated sessions for showcasing latest developments in the area of multimedia systems and presenting datasets, which is important towards enabling reproducibility and sustainability in multimedia systems research.

The social events were a perfect venue for networking and in-depth discussion how to advance the state of the art. A welcome reception was held at “LE BLE D’OR (Miramar)”, the conference banquet at the Taipei World Trade Center Club, and finally a tour to the Shilin Night Market was organized.

ACM MMSys 2917 issued the following awards:

  • The Best Paper Award  goes to “A Scalable and Privacy-Aware IoT Service for Live Video Analytics” by Junjue Wang (Carnegie Mellon University), Brandon Amos (Carnegie Mellon University), Anupam Das (Carnegie Mellon University), Padmanabhan Pillai (Intel Labs), Norman Sadeh (Carnegie Mellon University), and Mahadev Satyanarayanan (Carnegie Mellon University).
  • The Best Student Paper Award goes to “A Measurement Study of Oculus 360 Degree Video Streaming” by Chao Zhou (SUNY Binghamton), Zhenhua Li (Tsinghua University), and Yao Liu (SUNY Binghamton).
  • The NOSSDAV’17 Best Paper Award goes to “A Comparative Case Study of HTTP Adaptive Streaming Algorithms in Mobile Networks” by Theodoros Karagkioules (Huawei Technologies France/Telecom ParisTech), Cyril Concolato (Telecom ParisTech), Dimitrios Tsilimantos (Huawei Technologies France), Stefan Valentin (Huawei Technologies France).

Excellence in DASH award sponsored by the DASH-IF 

  • 1st place: “SAP: Stall-Aware Pacing for Improved DASH Video Experience in Cellular Networks” by Ahmed Zahran (University College Cork), Jason J. Quinlan (University College Cork), K. K. Ramakrishnan (University of California, Riverside), and Cormac J. Sreenan (University College Cork)
  • 2nd place: “Improving Video Quality in Crowded Networks Using a DANE” by Jan Willem Kleinrouweler, Britta Meixner and Pablo Cesar (Centrum Wiskunde & Informatica)
  • 3rd place: “Towards Bandwidth Efficient Adaptive Streaming of Omnidirectional Video over HTTP” by Mario Graf (Bitmovin Inc.), Christian Timmerer (Alpen-Adria-Universität Klagenfurt / Bitmovin Inc.), and Christopher Mueller (Bitmovin Inc.)

Finally, student travel grants awards have been sponsored by SIGMM. All details including nice pictures can be found here.

ACM MMSys 2018 will be held in Amsterdam, The Netherlands, June 12 – 15, 2018 and includes the following tracks:

  • Research track: Submission deadline on November 30, 2017
  • Demo track: Submission deadline on February 25, 2018
  • Open Dataset & Software Track: Submission deadline on February 25, 2018

MMSys’18 co-locates the following workshops (with submission deadline on March 1, 2018):

  • MMVE2018: 10th International Workshop on Immersive Mixed and Virtual Environment Systems,
  • NetGames2018: 16th Annual Worksop on Network and Systems Support for Games,
  • NOSSDAV2018: 28th ACM SIGMM Workshop on Network and Operating Systems Support for Digital Audio and Video,
  • PV2018: 23rd Packet Video Workshop

MMSys’18 includes the following special sessions (submission deadline on December 15, 2017):

Report from ICMR 2017

ACM International Conference on Multimedia Retrieval (ICMR) 2017

ACM ICMR 2017 in “Little Paris”

ACM ICMR is the premier International Conference on Multimedia Retrieval, and from 2011 it “illuminates the state of the arts in multimedia retrieval”. This year, ICMR was in an wonderful location: Bucharest, Romania also known as “Little Paris”. Every year at ICMR I learn something new. And here is what I learnt this year.


Final Conference Shot at UP Bucharest

UNDERSTANDING THE TANGIBLE: object, scenes, semantic categories – everything we can see.

1) Objects (and YODA) can be easily tracked in videos.

Arnold Smeulders delivered a brilliant keynote on “things” retrieval: given an object in an image, can we find (and retrieve) it in other images, videos, and beyond? Very interesting technique for tracking objects (e.g. Yoda) in videos based on similarity learnt through siamese networks.

Tracking Yoda with Siamese Networks

2) Wearables + computer vision help explore cultural heritage sites.

As showed in his keynote, at MICC University of Florence, Alberto del Bimbo and his amazing team have designed smart audio guides for indoor and outdoor spaces. The system detects, recognises, and describes landmarks and artworks from wearable camera inputs (and GPS coordinates, in case of outdoor spaces).

3) We can finally quantify how much images provide complementary semantics compared to text [BEST MULTIMODAL PAPER AWARD].

For ages, the community has asked how relevant different modalities are for multimedia analysis: this paper ( finally proposes a solution to quantify information gaps between different modalities.

4) Exploring news corpuses is now very easy: news graphs are easy to navigate and aware of the type of relations between articles.

Remi Bois and his colleagues presented this framework (, made for professional journalists and the general public, for seamlessly browsing through large-scale news corpus. They built a graph where nodes are articles in a news corpus. The most relevant items to each article are chosen (and linked) based on an adaptive nearest neighbor technique. Each link is then characterised according to the type of relation of the 2 linked nodes.

5) Panorama outdoor images are much easier to localise.

In his beautiful work (, Ahmet Iscen from Inria developed an algorithm for location prediction from StreetView images, outperforming the state of the art thanks to an intelligent stitching pre-processing step: predicting locations from panoramas (stitched individual views) instead of individual street images improves performances dramatically!

UNDERSTANDING THE INTANGIBLE: artistic aspects, beauty, intent: everything we can perceive

1) Image search intent can be predicted by the way we look.

In his best paper candidate research work (, Mohammad Soleymani showed that image search intent (seeking information, finding content, or re-finding content) can be predicted from physisological responses (eye gaze) and implicit user interaction (mouse movements).

2) Real-time detection of fake tweets is now possible using user and textual cues.

Another best paper candidate (, this time from CERTH. The team collected a large dataset of fake/real sample tweets spanning 17 events and built an effective model from misleading content detection from tweet content and user characteristics. A live demo here:

3) Music tracks have different functions in our daily lives.

Researchers from TU Delft have developed an algorithm ( which classifies music tracks according to their purpose in our daily activities: relax, study and workout.

4) By transferring image style we can make images more memorable!

The team at University of Trento built an automatic framework ( to improve image memorability. A selector finds the style seeds (i.e. abstract paintings) which are likely to increase memorability of a given image, and after style transfer, the image will be more memorable!

5) Neural networks can help retrieve and discover child book illustrations.

In this amazing work (, motivated by real children experiences, Pinar and her team from Hacettepe University collected a large dataset of children book illustrations and found that neural networks can predict and transfer style, allowing to make “Winnie the witch”-like many other illustrations.

Winnie the Witch

6) Locals perceive their neighborhood as less interesting, more dangerous and dirtier compared to non-locals.

In this wonderful work (, presented by Darshan Santain from IDIAP, researchers asked locals and crowd-workers to look at pictures from various neighborhoods in Guanajuato and rate them according to interestingness, cleanliness, and safety.

THE FUTURE: What’s Next?

1) We will be able to anonymize images of outdoor spaces thanks to Instagram filters, as proposed by this work ( in the Brave New Idea session.  When an image of an outdoor space is manipulated with appropriate Instagram filters, the location of the image can be masked from vision-based geolocation classifiers.

2) Soon we will be able to embed watermarks in our Deep Neural Network models in order to protect our intellectual property [BEST PAPER AWARD]. This is a disruptive, novel idea, and that is why this work from KDDI Research and Japan National Institute of Informatics won the best paper award. Congratulations!

3) Given an image view of an object, we will predict the other side of things (from Smeulders’ keynote). In the pic: predicting the other side of chairs. Beautiful.

Predicting the other side of things

THANKS: To the organisers, to the volunteers, and to all the authors for their beautiful work :)

EDITORIAL NOTE: A more extensive report from ICMR 2017 by Miriam is available on Medium

Report from MMM 2017

Harpa, the venue of the conference banquet.

MMM 2017 — 23rd International Conference on MultiMedia Modeling

MMM is a leading international conference for researchers and industry practitioners for sharing new ideas, original research results and practical development experiences from all MMM related areas. The 23rd edition of MMM took place on January 4-6 of 2017, on the modern campus of Reykjavik University. In this short report, we outline the major aspects of the conference, including: technical program; best paper session; video browser showdown; demonstrations; keynotes; special sessions; and social events. We end by acknowledging the contributions of the many excellent colleagues who helped us organize the conference. For more details, please refer to the MMM 2017 web site.

Technical Program

The MMM conference calls for research papers reporting original investigation results and demonstrations in all areas related to multimedia modeling technologies and applications. Special sessions were also held that focused on addressing new challenges for the multimedia community.

This year, 149 regular full paper submissions were received, of which 36 were accepted for oral presentation and 33 for poster presentation, for a 46% acceptance ratio. Overall, MMM received 198 submissions for all tracks, and accepted 107 for oral and poster presentation, for a total of 54% acceptance rate. For more details, please refer to the table below.

MMM2017 Submissions and Acceptance Rates

MMM2017 Submissions and Acceptance Rates

Best Paper Session

Four best paper candidates were selected for the best paper session, which was a plenary session at the start of the conference.

The best paper, by unanimous decision, was “On the Exploration of Convolutional Fusion Networks for Visual Recognition” by Yu Liu, Yanming Guo, and Michael S. Lew. In this paper, the authors propose an efficient multi-scale fusion architecture, called convolutional fusion networks (CFN), which can generate the side branches from multi-scale intermediate layers while consuming few parameters.

Phoebe Chen, Laurent Amsaleg and Shin’ichi Satoh (left) present the Best Paper Award to Yu Liu and Yanming Guo (right).

Phoebe Chen, Laurent Amsaleg and Shin’ichi Satoh (left) present the Best Paper Award to Yu Liu and Yanming Guo (right).

The best student paper, partially chosen due to the excellent presentation of the work, was “Cross-modal Recipe Retrieval: How to Cook This Dish?” by Jingjing Chen, Lei Pang, and Chong-Wah Ngo. In this work, the problem of sharing food pictures from the viewpoint of cross-modality analysis was explored. Given a large number of image and recipe pairs acquired from the Internet, a joint space is learnt to locally capture the ingredient correspondence from images and recipes.

Phoebe Chen, Laurent Amsaleg and Shin’ichi Satoh (left) present the Best Student Paper Award to Jingjing Chen and Chong-Wah Ngo (right).

Phoebe Chen, Shin’ichi Satoh and Laurent Amsaleg (left) present the Best Student Paper Award to Jingjing Chen and Chong-Wah Ngo (right).

The two runners-up were “Spatio-temporal VLAD Encoding for Human Action Recognition in Videos” by Ionut Cosmin Duta, Bogdan Ionescu, Kiyoharu Aizawa, and Nicu Sebe, and “A Framework of Privacy-Preserving Image Recognition for Image-Based Information Services” by Kojiro Fujii, Kazuaki Nakamura, Naoko Nitta, and Noboru Babaguchi.

Video Browser Showdown

The Video Browser Showdown (VBS) is an annual live video search competition, which has been organized as a special session at MMM conferences since 2012. In VBS, researchers evaluate and demonstrate the efficiency of their exploratory video retrieval tools on a shared data set in front of the audience. The participating teams start with a short presentation of their system and then perform several video retrieval tasks with a moderately large video collection (about 600 hours of video content). This year, seven teams registered for VBS, although one team could not compete for personal and technical reasons. For the first time in 2017, live judging was included, in which a panel of expert judges made decisions in real-time about the accuracy of the submissions for ⅓ of the tasks.

Teams and spectators in the Video Browser Showdown.

Teams and spectators in the Video Browser Showdown.

On the social side, two changes were also made from previous conferences. First, VBS was held in a plenary session, to avoid conflicts with other schedule items. Second, the conference reception was held at VBS, which meant that attendees had extra incentives to attend VBS, namely food and drink. And third, Alan Smeaton served as “color commentator” during the competition, interviewing the organizers and participants, and helping explain to the audience what was going on. All of these changes worked well, and contributed to a very well attended VBS session.

The winners of VBS 2017, after a very even and exciting competition, were Luca Rossetto, Ivan Giangreco, Claudiu Tanase, Heiko Schuldt, Stephane Dupont and Omar Seddati, with their IMOTION system.

The winners of VBS 2017, after a very even and exciting competition, were Luca Rossetto, Ivan Giangreco, Claudiu Tanase, Heiko Schuldt, Stephane Dupont and Omar Seddati, with their IMOTION system.


Five demonstrations were presented at MMM. As in previous years, the best demonstration was selected using both a popular vote and a selection committee. And, as in previous years, both methods produced the same winner, which was: “DeepStyleCam: A Real-time Style Transfer App on iOS” by Ryosuke Tanno, Shin Matsuo, Wataru Shimoda, and Keiji Yanai.

The winners of the Best Demonstration competition hard at work presenting their system.

The winners of the Best Demonstration competition hard at work presenting their system.


The first keynote, held in the first session of the conference, was “Multimedia Analytics: From Data to Insight” by Marcel Worring, University of Amsterdam, Netherlands. He reported on a novel multimedia analytics model based on an extensive survey of over eight hundred papers. In the analytics model, the need for semantic navigation of the collection is emphasized and multimedia analytics tasks are placed on an exploration-search axis. Categorization is then proposed as a suitable umbrella task for realizing the exploration-search axis in the model. In the end, he considered the scalability of the model to collections of 100 million images, moving towards methods which truly support interactive insight gain in huge collections.

Björn Þór Jónsson introduces the first keynote speaker, Marcel Worring (right).

Björn Þór Jónsson introduces the first keynote speaker, Marcel Worring (right).

The second keynote, held in the last session of the conference, was “Creating Future Values in Information Access Research through NTCIR” by Noriko Kando, National Institute of Informatics, Japan. She reported on NTCIR (NII Testbeds and Community for Information access Research), which is a series of evaluation workshops designed to enhance the research in information access technologies, such as information retrieval, question answering, and summarization using East-Asian languages, by providing infrastructures for research and evaluation. Prof Kando provided motivations for the participation in such benchmarking activities and she highlighted the range of scientific tasks and challenges that have been explored at NTCIR over the past twenty years. She ended with ideas for the future direction of NTCIR.


Noriko Kando presents the second MMM keynote.

Special Sessions

During the conference, four special sessions were held. Special sessions are mini-venues, each focusing on one state-of-the-art research direction within the multimedia field. The sessions are proposed and chaired by international researchers, who also manage the review process, in coordination with the Program Committee Chairs. This year’s sessions were:
– “Social Media Retrieval and Recommendation” organized by Liqiang Nie, Yan Yan, and Benoit Huet;
– “Modeling Multimedia Behaviors” organized by Peng Wang, Frank Hopfgartner, and Liang Bai;
– “Multimedia Computing for Intelligent Life” organized by Zhineng Chen, Wei Zhang, Ting Yao, Kai-Lung Hua, and Wen-Huang Cheng; and
– “Multimedia and Multimodal Interaction for Health and Basic Care Applications” organized by Stefanos Vrochidis, Leo Wanner, Elisabeth André, Klaus Schoeffmann.

Social Events

This year, there were two main social events at MMM 2017: a welcome reception at the Video Browser Showdown, as discussed above, and the conference banquet. Optional tours then allowed participants to further enjoy their stay on the unique and beautiful island.

The conference banquet was held in two parts. First, we visited the exotic Blue Lagoon, which is widely recognised as one of the modern wonders of the world and one of the most popular tourist destinations in Iceland. MMM participants had the option of bathing for two hours in this extraordinary spa, and applying the healing silica mud to their skin, before heading back for the banquet in Reykjavík.

The banquet itself was then held at the Harpa Reykjavik Concert Hall and Conference Centre in downtown Reykjavík. Harpa is one of Reykjavik‘s most recent, yet greatest and most distinguished landmarks. It is a cultural and social centre in the heart of the city and features stunning views of the surrounding mountains and the North Atlantic Ocean.

Harpa, the venue of the conference banquet.

Harpa, the venue of the conference banquet.

During the banquet, Steering Committee Chair Phoebe Chen gave a historical overview of the MMM conferences and announced the venues for MMM 2018 (Bangkok, Thailand) and MMM 2019 (Thessaloniki, Greece), before awards for the best contributions were presented. Finally, participants were entertained by a small choir, and were even asked to participate in singing a traditional Icelandic folk song.

MMM 2018 will be held at Chulalongkorn University in Bangkok, Thailand.  See

MMM 2018 will be held at Chulalongkorn University in Bangkok, Thailand. See


There are many people who deserve appreciation for their invaluable contributions to MMM 2017. First and foremost, we would like to thank our Program Committee Chairs, Laurent Amsaleg and Shin’ichi Satoh, who did excellent work in organizing the review process and helping us with the organization of the conference; indeed they are still hard at work with an MTAP special issue for selected papers from the conference. The Proceedings Chair, Gylfi Þór Guðmundsson, and Local Organization Chair, Marta Kristín Lárusdóttir, were also tirelessly involved in the conference organization and deserve much gratitude.

Other conference officers contributed to the organization and deserve thanks: Frank Hopfgartner and Esra Acar (demonstration chairs); Klaus Schöffmann, Werner Bailer and Jakub Lokoč (VBS Chairs); Yantao Zhang and Tao Mei (Sponsorship Chairs); all the Special Session Chairs listed above; the 150 strong Program Committee, who did an excellent job with the reviews; and the MMM Steering Committee, for entrusting us with the organization of MMM 2017.

Finally, we would like to thank our student volunteers (Atli Freyr Einarsson, Bjarni Kristján Leifsson, Björgvin Birkir Björgvinsson, Caroline Butschek, Freysteinn Alfreðsson, Hanna Ragnarsdóttir, Harpa Guðjónsdóttir), our hosts at Reykjavík University (in particular Arnar Egilsson, Aðalsteinn Hjálmarsson, Jón Ingi Hjálmarsson and Þórunn Hilda Jónasdóttir), the CP Reykjavik conference service, and all others who helped make the conference a success.

Report from ICACNI 2015


Report from the 3rd International Conference on Advanced Computing, Networking, and Informatics


Inauguration of 3rd ICACNI 2015

The 3rd International Conference on Advanced Computing, Networking and Informatics (ICACNI-2015), organized by School of Computer Engineering, KIIT University, Odisha, India, was held during 23-25 June, 2015.


Prof. Nikhil R. Pal during his keynote

The conference commenced with a keynote by Prof. Nikhil R. Pal (Fellow IEEE, Indian Statistical Institute, Kolkata, India) on ‘A Fuzzy Rule-Based Approach to Single Frame Super Resolution’.

Authors listening to technical presentations

Authors listening to technical presentations

Apart from three regular tracks on advanced computing, networking, and informatics, the conference hosted three invited special sessions. While a total of more than 550 articles across different tracks of the conference were received, 132 articles are finally selected for presentation and publication by Smart Innovation, Systems and Technologies series of Springer as Volume 43 and 44.

Prof. Nabendu Chaki during his technical talk

Prof. Nabendu Chaki during his technical talk

Extended versions of few extraordinary articles from these will be published by special issues of Egyptian Informatics Journal and Innovations in Systems and Software Engineering (A NASA Journal). The conference showcased a technical talk by Prof. Nabendu Chaki (Senior Member IEEE, Calcutta University, India) on ‘Evolution from Web-based Applications to Cloud Services: A Case Study with Remote Healthcare’.

A click from award giving ceremony

A click from award giving ceremony

The conference identified some wonderful works and have given away eight awards in different categories. The conference was successful to bring together academic scientists, professors, research scholars and students to share and disseminate information on knowledge and scientific research works related to the conference. 4th ICACNI 2016 is scheduled to be held at National Institute of Technology Rourkela, Odisha, India.

Summary of the 5th BAMMF


Bay Area Multimedia Forum (BAMMF)

BAMMF is a Bay Area Multimedia Forum series. Experts from both academia and industry are invited to exchange ideas and information through talks, tutorials, posters, panel discussions and networking sessions. Topics of the forum will include emerging areas in vision, audio, touch, speech, text, various sensors, human computer interaction, natural language processing, machine learning, media-related signal processing, communication, and cross-media analysis etc. Talks in the event may cover advancement in algorithms and development, demonstration of new inventions, product innovation, business opportunities, etc. If you are interested in giving a presentation at the forum, please contact us.

The 5th BAMMF

The 5th BAMMF was held in the George E. Pake Auditorium in Palo Alto, CA, USA on November 20, 2014. The slides and videos of the speakers at the forum have been made available on the BAMMF web page, and we provide here an overview of their talks. For speakers’ bios, the slides and videos, please visit the web page.

Industrial Impact of Deep Learning – From Speech Recognition to Language and Multimodal Processing

Li Deng (Deep Learning Technology Center, Microsoft Research, Redmond, USA)

Since 2010, deep neural networks have started making real impact in speech recognition industry, building upon earlier work on (shallow) neural nets and (deep) graphical models developed by both speech and machine learning communities. This keynote will first reflect on the historical path to this transformative success. The role of well-timed academic-industrial collaboration will be highlighted, so will be the advances of big data, big compute, and seamless integration between application-domain knowledge of speech and general principles of deep learning. Then, an overview will be given on the sweeping achievements of deep learning in speech recognition since its initial success in 2010 (as well as in image recognition since 2012). Such achievements have resulted in across-the-board, industry-wide deployment of deep learning. The final part of the talk will focus on applications of deep learning to large-scale language/text and multimodal processing, a more challenging area where potentially much greater industrial impact than in speech and image recognition is emerging.

Brewing a Deeper Understanding of Images

Yangqing Jia (Google)

In this talk I will introduce the recent developments in the image recognition fields from two perspectives: as a researcher and as an engineer. For the first part I will describe our recent entry “GoogLeNet” that won the ImageNet 2014 challenge, including the motivation of the model and knowledge learned from the inception of the model. For the second part, I will dive into the practical details of Caffe, an open-source deep learning library I created at UC Berkeley, and show how one could utilize the toolkit for a quick start in deep learning as well as integration and deployment in real-world applications.

Applied Deep Learning

Ronan Collobert (Facebook)

I am interested in machine learning algorithms which can be applied in real-life applications and which can be trained on “raw data”. Specifically, I prefer to trade simple “shallow” algorithms with task-specific handcrafted features for more complex (“deeper”) algorithms trained on raw features. In that respect, I will present several general deep learning architectures, which excels in performance on various Natural Language, Speech and Image Processing tasks. I will look into specific issues related to each application domain, and will attempt to propose general solutions for each use case.

Compositional Language and Visual Understanding

Richard Socher (Stanford)

In this talk, I will describe deep learning algorithms that learn representations for language that are useful for solving a variety of complex language tasks. I will focus on 3 projects:

  • Contextual sentiment analysis (e.g. having an algorithm that actually learns what’s positive in this sentence: “The Android phone is better than the IPhone”)
  • Question answering to win trivia competitions (like IBM Watson’s Jeopardy system but with one neural network)
  • Multimodal sentence-image embeddings to find images that visualize sentences and vice versa (with a fun demo!) All three tasks are solved with a similar type of recursive neural network algorithm.


Report from SLAM 2014


ISCA/IEEE Workshop on Speech, Language and Audio in Multimedia

Following SLAM 2013 in Marseille, France, SLAM 2014 was the second edition of the workshop, held in Malaysia as a satellite of Interspeech 2014. The workshop was organized over two days, one for science and one for socializing and community building. With about 15 papers and 30 attendees, the highly-risky second edition of the workshop showed the will to build a strong scientific community at the frontier of speech and audio processing, natural language processing and multimedia content processing.

The first day featured talks covering various topics related to speech, language and audio processing applied to multimedia data. Two keynotes from Shri Narayanan (University of Southern California) and Min-Yen Kan (National University of Singapore) nicely completed the program.
The second day took us on a tour of Penang followed by a visit of the campus of Universiti Sains Malaysia from which local organizers are. The tour offered plenty of opportunities to strengthen the links between participants and build a stronger community, as expected. Most participants later went ot Singapore to attend Interspeech, the main conference in the domain of speech communication, where further discussions went on.

We hope to collocate the next SLAM edition with a multimedia conference such as ACM Multimedia in 2015. Keep posted!

Report from ACM Multimedia 2013


Conference/Workshop Program Highlights

ACM Multimedia 2013 was held at the CCIB (Centre de Conventions Internacional de Barcelona) from October 21st to October 25th, 2012 in Barcelona. The Art Exhibition has been held for the entire duration of the conference at the FAD (Forment de les Arts i del Disseny) in the center of the city while the workshops were held in the Universitat Pompeu Fabra – Balmes building during the first two days of the conference (Oct. 21-Oct 22). It was the first time the conference was held in Spain and it offered a high-quality program and a few notable innovations. Dr. Nozha Boujemaa from INRIA, France, Dr. Alejandro Jaimes from Yahoo! Labs, Spain and Prof. Nicu Sebe from the University of Trento, Italy were the general co-chairs of the conference. Dr. Daniel Gatica-Perez from IDIAP & EPFL, Switzerland, Dr. David A. Shamma from Yahoo! Labs, USA, Prof. Marcel Worring from the University of Amsterdam, The Netherlands, and Prof. Roger Zimmermann from the National University of Singapore, Singapore were the program co-chairs. The entire organization committee is listed in Appendix A. The number of participants was 544. The main conference was attended by 476 participants out of which 425 paid and 51 participants were special cases (sponsors, student volunteers, etc.), and 68 participants attended workshops only. The tutorials which were free of charge were registered by 312 in advance. Multimedia art exhibition was open to public from Oct. 21 to Oct.28, and visited by more than 2,000 visitors. The total revenue of the conference was $318,151, and the surplus was $25,430.

The venue (CCIB)

Below is the list of the program components of Multimedia 2013.

  • ž Technical Papers: Full and Short papers
  • ž Keynote Talks
  • ž SIGMM Achievement Award Talk, Ph.D Thesis Award Talk
  • ž Panel
  • ž Brave New Ideas
  • ž Multimedia Grand Challenge Solutions
  • ž Technical Demos
  • ž Open Source Software Competition
  • ž Doctoral Symposium
  • ž Art Exhibition and Reception
  • ž Tutorials
  • ž Workshops
  • ž Awards and Banquet

Innovations made for Multimedia 2013:

In attempt to continuously improve ACM Multimedia and ensure its vibrant role for the multimedia community, we have made a number of enhancements for this year’s conference:

  • The Technical Program Committee defined twelve Technical Areas for major focus for this year’s conference, including introducing new Technical Areas for Music & Audio and Crowdsourcing to reflect their growing interest and promise. We have also changed the names of some traditional Technical Areas and provided extensive description of each area to help the authors choosing the most appropriate Technical Area for their manuscripts.
  • We have introduced a new role in the organization of the conference: the author’s advocate. His explicit role was to listen to the authors, and to help them if reviews are clearly below average quality. The authors could request the mediation of the author’s advocate after the reviews have been sent to them and they had to clearly justify the reasons why such mediation is needed (the reviews or the meta-review were below average quality). The task of the advocate was to investigate carefully the matter and to request additional review or reexamination of the decision of the particular manuscript. This year, the author’s advocate was Pablo Cesar from CWI, The Netherlands.
  • We have decided to keep a couple of plenary sessions which will bring singular focus to conference activities: keynotes, Multimedia Grand Challenge competition, Best Paper session, Technical Achievement Award and Best PhD Award sessions. The other technical sessions are held in parallel to allow pursuit of more specialized interests at the conference. We have limited the number of parallel session to no more than 3 to minimize the risk of having overlapping interests.
  • The use of video spotlights for advertising the works to be presented. These were meant to offer all attendees an opportunity to become aware of the content of each paper, and thus to be attracted to attend the corresponding poster or talk.
  • Workshops and Tutorials are held on separate days from the main conference in order to reduce conflict with the regular Technical Program.
  • The Multimedia Art Exhibition featured both invited and selected artists. It was open for the duration of the conference in the satellite venue located in the center of the city.
  • Following the last two years’ precedent, Tutorials are made free for all participants.
  • Recognizing that students are the lifeblood of our next generation of multimedia thinkers, this year’s Student Travel Grant was greatly expanded. We had a total amount of $26,000 received from SIGMM ($16,000) and NSF ($10,000) that supported 35 students.
  • Finally, we have decided to provide open access for the community to the proceedings available in the ACM Digital Libraries. As such, no USB proceedings were handed over to the participants encouraging everyone to get online access.

Technical Program

Following the guidelines of the ACM Multimedia Review Committee, the conference was structured into 12 Areas, with a two-tier TPC, a double-blind review process, and a target acceptance rate of 20% for long papers and 27.7% for short papers. Based on the experience from ACM Multimedia 2012 and the responses to our “Call for Areas” that we issued to the community, we selected the following Areas.

  1. Art, Entertainment, and Culture
  2. Authoring and Collaboration
  3. Crowdsourcing
  4. Media Transport and Delivery
  5. Mobile & Multi-device
  6. Multimedia Analysis
  7. Multimedia HCI
  8. Music & Audio
  9. Search, Browsing, and Discovery
  10. Security and Forensics
  11. Social Media & Presence
  12. Systems and Middleware

The Technical Program Committee was first created by appointing Area Chairs (ACs). A total of 29 colleagues agreed to serve in this role. Each Area was represented by two ACs, with exception of two Areas (Multimedia Analysis and Search, Browsing, and Discovery) whose scope has traditionally attracted the largest proportion of papers and so required further coordination. The added topic diversity brought an increase in gender diversity to the ACs, which increased from approximately 12% in previous years to 22% for 2013. We also made a conscious effort to bring new talent and excellence into the community and to better represent emerging trends in the field. For this we appointed many young and well recognized ACs who served in this role for the first time. For each junior AC, we co-appointed a senior researcher as their co-AC to aid in their shepherding. In a second step, the Area Chairs were responsible for appointing the TPC members (reviewers) for their coordinated areas. This was a large effort to grow the TPC base for the conference as well as ensure proper expertise was represented in each area. We coupled this with a hard goal of limiting the number of submissions assigned to each TPC member for review. For example, two years ago, the average number of papers assigned to a reviewer was 9 with over 38% of the approximately 225 TPC members receiving 10 or more papers to review. With our design, we had a total of 398 reviewers receiving an average of 4.13 papers per reviewer. While we were unable to keep a hard ceiling limitation, only 2.51% of the TPC received 10 or more papers to review—all TPC members who had agreed to serve in more than one area. The Area Chairs were in charge of assigning all papers for review, and each submission was reviewed double-blind by three TPC members. Reviews and reviewer assignments of papers co-authored by Area Chairs, Program Chairs, and General Chairs were handled by Program Chairs who had no conflicts of interest for each specific case. Another novelty introduced in the reviewing process was to set the paper submission deadline to a significantly earlier date than previous years, in order to allocate more time for reviews, rebuttals, discussions, and final decisions. Despite the reduced time given to authors, the response to the Call for Papers was enthusiastic with a total of 235 long papers and 278 short papers going through review. The authors of long papers were asked to write a rebuttal after receiving the reviews. A new element in the reviewing process was the introduction of the Author’s Advocate figure, created to provide authors with an independent channel to express concerns about the quality of the reviews for their papers, and to raise a flag about these reviews. All cases were brought to the attention to the corresponding Area Chair. After evaluating each case reported to him (16 reviews out of 761 long paper reviews), the Author’s Advocate recommended in 5 cases that new reviews were generated and added to the discussion. The reviewers had a period for on-line discussion of reviews and rebuttals, after which the Area Chairs drafted a meta-review for each paper. Decisions on long and short papers were made at the TPC meeting held at the University of Amsterdam on June 11, 2013. The meeting was physically attended by one of the General Chairs, three of the Program Chairs, the Author’s Advocate, and 86% of the ACs. Many of the ACs who were unable to attend were tele-present online for discussions. On the first half day of the TPC meeting, the Area Chairs worked in breakout sessions to discuss the papers that were weak accepts and weak rejects, with the exception of conflict of interest papers which were handled out of band as previously mentioned. In the second half of the first day, the ACs met in a plenary session where they reviewed the clear accepts and defended the decisions on the borderline papers based on the papers themselves, reviews, meta-reviews, on-line discussions, and authors’ rebuttal comments. In many cases, an emergency reviewer was added if there was clear intersection with a related submission area. If a paper had any conflict of interest during the plenary session with an Area, Program, or General Chair, they were excused from the room. On June 12, 2013, the Program Chairs finalized the process and conference program in a separate meeting—arranging the sessions by thematic narratives and not by submission area to promote cross-area conversations during the conference itself. The review process resulted in an overall acceptance rate of 20.0% for long papers and 27.7% for short papers (the distribution of submissions and the acceptance rate for each one of the 12 areas is shown in the graph below). All accepted long papers were shepherded by the Area Chairs themselves or by qualified TPC members who were in charge of verifying that the revised papers adequately addressed concerns raised by the reviewers and changes promised by authors in their rebuttals. This step ensured that all of the accepted papers are of the highest quality possible. In addition, four papers with high review scores were nominated at the TPC meeting as candidates for the Best Paper Award. Each nominated paper had to be successfully championed and defended by the ACs from that area. The winner was announced at the Conference Banquet.

ACM Multimedia 2013 Program at a Glance

The entire program of ACM Multimedia 2013 is shown below.

Workshop session

Conference venue

Opening ceremony

Keynote presentation

Poster/Demo session

SIGMM Achievement Award Talk

Keynote Talks

Multimedia Framed Dr. Elizabeth F. Churchill (Ebay Research Labs) Wednesday, Oct. 23, 2013 Abstract: Multimedia is the combination of several media forms. Information designers, educationalists and artists are concerned with questions such as: Is text, or audio or video, or a combination of all three, the best format for the message? Should another modality (e.g., haptics/touch, olfaction) be invoked instead to make the message more effective and/or the experience more engaging? How does the setting affect perception/reception? How does framing affect people’s experience of multimedia? How is the artifact changed through interaction with audience members? In this presentation, I will talk about people’s experience of multimedia artifacts like videos. I will discuss the ways in which framing affects how we experience multimedia. Framing can be intentional–scripted creations produced with clear intent by technologists, designers, media producers, media artists, film-makers, archivists, documentarians and architects. Framing can also be unintentional. Everyday acts of interest and consumption turn us, the viewers, into co-producers of the experiences of the multimedia artifacts we have viewed. We download, annotate, comment and share multimedia artifacts online. Our actions are reflected in view counts, displayed comments and content ranking. Our actions therefore change how multimedia artifacts are interpreted and understood by others. Drawing on examples from the history of film and of performance art, from current social media research and from research conducted with collaborators over the past 16 years, I will illustrate how content understanding is modulated by context, by the “framing” of the content. I will consider three areas of research that are addressing the issue of framing, and that have implications for our understanding of ‘multimedia’ consumption, now and in the future: (1) The psychology and psychophysiology of multimedia as multimodal experience; (2) Emerging practices with contemporary social media capture and sharing from personal devices; and (3) Innovations in social media and audience analytics focused on more deeply understanding media consumption. I will conclude with some technical excitements, design/development challenges and experiential possibilities that lie ahead. Dr. Elizabeth Churchill is Director of Human Computer Interaction at eBay Research Labs (ERL) in San Jose, California. Formerly a Principal Research Scientist at Yahoo! Research, she founded, staffed and managed the Internet Experiences Group. Until September of 2006, she worked at the Palo Alto Research Center (PARC), California, in the Computing Science Lab (CSL). Prior to that she formed and led the Social Computing Group at FX Palo Laboratory, Fuji Xerox’s research lab in Palo Alto. Originally a psychologist by training, throughout her career Elizabeth has focused on understanding people’s social and collaborative interactions in their everyday digital and physical contexts. With over 100 peer-reviewed publications and 5 edited books, topics she has written about include implicit learning, human-agent systems, mixed initiative dialogue systems, social aspects of information seeking, digital archive and memory, and the development of emplaced media spaces. She has been a regular columnist for ACM interactions since 2008. Elizabeth has a BSc in Experimental Psychology, an MSc in Knowledge Based Systems, both from the University of Sussex, and a PhD in Cognitive Science from the University of Cambridge. In 2010, she was recognised as a Distinguished Scientist by the Association for Computing Machinery (ACM). Elizabeth is the current Executive Vice President of ACM SigCHI (Human Computer Interaction Special Interest Group). She is a Distinguished Visiting Scholar at Stanford University’s Media X, the industry affiliate program to Stanford’s H-STAR Institute. The Space between the Images Leonidas J. Guibas (Stanford University) Thursday, Oct. 24, 2013 Abstract: Multimedia content has become a ubiquitous presence on all our computing devices, spanning the gamut from live content captured by device sensors such as smartphone cameras to immense databases of images, audio and video stored in the cloud. As we try to maximize the utility and value of all these petabytes of content, we often do so by analyzing each piece of data individually and foregoing a deeper analysis of the relationships between the media. Yet with more and more data, there will be more and more connections and correlations, because the data captured comes from the same or similar objects, or because of particular repetitions, symmetries or other relations and self-relations that the data sources satisfy. This is particularly true for media of a geometric character, such as GPS traces, images, videos, 3D scans, 3D models, etc. In this talk we focus on the “space between the images”, that is on expressing the relationships between different multimedia data items. We aim to make such relationships explicit, tangible, first class objects that themselves can be analyzed, stored, and queried — irrespective of the media they originate from. We discuss mathematical and algorithmic issues on how to represent and compute relationships or mappings between media data sets at multiple levels of detail. We also show how to analyze and leverage networks of maps and relationships, small and large, between inter-related data. The network can act as a regularizer, allowing us to to benefit from the “wisdom of the collection” in performing operations on individual data sets or in map inference between them. We will illustrate these ideas using examples from the realm of 2D images and 3D scans/shapes — but these notions are more generally applicable to the analysis of videos, graphs, acoustic data, biological data such as microarrays, homeworks in MOOCs, etc. This is an overview of joint work with multiple collaborators, as will be discussed in the talk. Prof. Leonidas Guibas obtained his Ph.D. from Stanford under the supervision of Donald Knuth. His main subsequent employers were Xerox PARC, DEC/SRC, MIT, and Stanford. He is currently the Paul Pigott Professor of Computer Science (and by courtesy, Electrical Engineering) at Stanford University. He heads the Geometric Computation group and is part of the Graphics Laboratory, the AI Laboratory, the Bio-X Program, and the Institute for Computational and Mathematical Engineering. Professor Guibas’ interests span geometric data analysis, computational geometry, geometric modeling, computer graphics, computer vision, robotics, ad hoc communication and sensor networks, and discrete algorithms. Some well-known past accomplishments include the analysis of double hashing, red-black trees, the quad-edge data structure, Voronoi-Delaunay algorithms, the Earth Mover’s distance, Kinetic Data Structures (KDS), Metropolis light transport, and the Heat-Kernel Signature. Professor Guibas is an ACM Fellow, an IEEE Fellow and winner of the ACM Allen Newell award.


SIGMM Achievement Award Talk Dick Bulterman, CWI, The Netherlands Friday, Oct. 25, 2013 The 2013 winner of SIGMM award for Outstanding Technical Contributions to Multimedia Computing, Communications and Applications is Prof. Dr. Dick Bulterman. The ACM SIGMM Technical Achievement award is given in recognition of outstanding contributions over a researcher’s career. Prof. Dick Bulterman has been selected for his outstanding technical contributions in multimedia authoring, media annotation, and social sharing from research through standardization to entrepreneurship, and in particular for promoting international Web standards for multimedia authoring and presentation (SMIL) in the W3C Synchronized Multimedia Working Group as well as his dedicated involvement in the SIGMM research community for many years. Dr. Dick Bulterman has been a long time intellectual leader in the area of temporal modeling and support for complex multimedia system. His research has led to the development of several widely used multimedia authoring systems and players. He developed the Amsterdam Hypermedia Model, the CMIF document structure, the CMIFed authoring environment, the GRiNS editor and player, and a host of multimedia demonstrator applications. In 1999, he started the CWI spinoff company called Oratrix Development BV, and he worked as CEO to widely deliver this software. He is currently a Research Group Head of the Distributed and Interactive Systems at Centrum Wiskunde & Informatica (CWI) in Amsterdam, The Netherlands. He is also a Full Professor of Computer Science at Vrije Universiteit, Amsterdam. His research interests are multimedia authoring and document processing. Dick has a strong international reputation for the development of the domain-specific temporal language for multimedia (SMIL). Much of this software has been incorporated into the widely used Ambulant Open Source SMIL Player, which has served to encourage development and use of time-based multimedia content. His conference publications and book on SMIL have helped to promote SMIL and its acceptance as a W3C standard. Dick’s recent work on social sharing of video will likely prove influential in upcoming Interactive TV products. This work has already been recognized in the academic community, earning the ACM SIGMM best paper award at ACM MM 2008 and also at the EUROITV conference. SIGMM Ph.D Thesis Award Talk Xirong Li, Remin University, China Friday, Oct. 25, 2013 The SIGMM Ph.D. Thesis Award Committee recommended this year’s award for the outstanding Ph.D. thesis in multimedia computing, communications and applications to Dr. Xirong Li. The committee considered Dr. Li’s dissertation titled “Content-based visual search learned from social media” as worthy of the award as it substantially extends the boundaries for developing content-based multimedia indexing and retrieval solutions. In particular, it provides fresh new insights into the possibilities for realizing image retrieval solutions in the presence of vast information that can be drawn from the social media. The committee considered the main innovation of Dr. Li’s work to be in the development of the theory and algorithms providing answers to the following challenging research questions: (a) what determines the relevance of a social tag with respect to an image, (b) how to fuse tag relevance estimators, (c) which social images are the informative negative examples for concept learning, (d) how to exploit socially tagged images for visual search and (e) how to personalize automatic image tagging with respect to a user’s preferences. The significance of the developed theory and algorithms lies in their power to enable effective and efficient deployment of the information collected from the social media to enhance the datasets that can be used to learn automatic image indexing mechanisms (visual concept detection) and to make this learning more personalized for the user. Dr. Xirong Li received the B.Sc. and M.Sc. degrees from the Tsinghua University, China, in 2005 and 2007, respectively, and the Ph.D. degree from the University of Amsterdam, The Netherlands, in 2012, all in computer science. The title of his thesis is “Content-based visual search learned from social media”. He is currently an Assistant Professor in the Key Lab of Data Engineering and Knowledge Engineering, Renmin University of China. His research interest is image search and multimedia content analysis. Dr. Li received the IEEE Transactions on Multimedia Prize Paper Award 2012, Best Paper Nominee of the ACM International Conference on Multimedia Retrieval 2012, Chinese Government Award for Outstanding Self-Financed Students Abroad 2011, and the Best Paper Award of the ACM International Conference on Image and Video Retrieval 2010. He served as publicity co-chair for ICMR 2013. Panel Cross-Media Analysis and Mining Wednesday, Oct 23, 2013 Panelists:Mark Zhang, Alberto del Bimbo, Selcuk Candan, Alexander Hauptmann, Ramesh Jain, Alexis Joly, Yueting Zhuang Motivation Today there are lots of heterogeneous and homogeneous media data from multiple sources, such as news media websites, microblog, mobile phone, social networking websites, and photo/video sharing websites. Integrated together these media data represent different aspects of the real-world and help document the evolution of the world. Consequently, it is impossible to correctly conceive and to appropriately understand the world without exploiting the data available on these different sources of rich multimedia content simultaneously and synergistically. Cross-media analysis and mining is a research area in the general field of multimedia content analysis which focuses on the exploitation of the data with different modalities from multiple sources simultaneously and synergistically to discover knowledge and understand the world. Specifically, we emphasize two essential elements in the study of cross-media analysis that help differentiate cross-media analysis from the rest of the research in multimedia content analysis or machine learning. The first is the simultaneous co-existence of data from two or more different data sources. This element indicates the concept of “cross”, e.g., cross-modality, cross-source, and cross cyberspace to reality. Cross-modality means that heterogeneous features are obtained from the data in different modalities; cross-source means that the data may be obtained across multiple sources (domains or collections); cross-space means that the virtual world (i.e., cyberspace) and the real world (i.e., reality) complement each other. The second is the leverage of different types of data across multiple sources for strengthening the knowledge discovery, for example, discovering the (latent) correlation or synergy between the data with different modalities across multiple sources, transferring the knowledge learned from one domain (e.g., a modality or a space) to generate knowledge in another related domain, and generating a summary with the data from multiple sources. There two essential elements help promote cross-media analysis and mining as a new, emerging, and important research area in today’s multimedia research. With the emphasis on knowledge discovery, cross-media analysis is different from the traditional research areas such as cross-lingual translation. On the other hand, with the general scenarios of the leverage of different types of data across multiple sources for strengthening the knowledge discovery, cross-media analysis and mining addresses a broader series of problems than the traditional research areas such as transfer learning. Overall, cross-media analysis and mining is beneficial for many applications in data mining, causal inference, machine learning, multimedia, and public security. Like other emerging hot topics in multimedia research, cross-media analysis and mining also has a number of fundamental and controversial issues that must be addressed in order to have a full and complete understanding of the research in this topic. These issues include but are not limited to whether or not there exists a unified representation or modeling for the same semantic concept from different media, and if there is what such unified representation or modeling is; whether or not there exists any “law” that governs the topic evolution and development over the time in different media and if there is what such “law” is and how it is formulated; whether or not there exists a mapping for a conceptual or semantic activity between the cyberspace and the real-world, and if there is what such a mapping is and how it is developed and formulated. Brave New Idea Program Brave New Ideas addressed long term research challenges, pointed to new research directions, or provided new insights or brave perspectives that pave the way to innovation. The selection process was different from the regular papers. First, submission of a 2 page abstract was requested. Then, the first selection was performed and a full paper was required for the selected abstracts and reviewed and chosen. We received 38 submissions for the first stage and 14 were invited to submit the full paper for the second reviewing stage. Finally, there were accepted 6 papers, which formed two sessions of oral presentations. Multimedia Grand Challenge Solutions We had received six challenges as shown below for the Multimedia Grand Challenge Solutions Program.

  1. NHK – Where is beauty? Grand Challenge
  2. Technicolor – Rich Multimedia Retrieval from Input Videos Grand Challenge
  3. Yahoo! – Large-scale Flickr-tag Image Classification Grand Challenge
  4. Huawei/3DLife – 3D human reconstruction and action recognition Grand Challenge
  5. MediaMixer/VideoLectures.NET – Temporal Segmentation and Annotation Grand Challenge
  6. Microsoft: MSR – Bing Image Retrieval Grand Challenge

We received 34 proposals for this program, and 14 of them were accepted for the presentation. In order to promote submissions, all presentations in this program were awarded as Multimedia Grand Challenge Finalists. The best prize and two second best prizes were chosen and awarded. Requested by Technicolor, the Grand Challenge Multimodal Prize was also chosen and awarded. Technical Demonstrations We have received 80 excellent technical demonstrations proposals. The number of submissions was in line to the demonstrations received in the previous year. Three reviewers were assigned to each demo proposal, and finally 40 proposals were chosen. The best demo prize was awarded. Open Source Software Competition This year was the 6th edition of the Open software competition being part of the ACM Multimedia program. The goal of this competition is to praise the invaluable contribution of researchers and software developers who advance the field by providing the community with implementations of codecs, middleware, frameworks, toolkits, libraries, applications, and other multimedia software. This year we have received 16 submissions and after assigning three reviewers to each of them we have selected 11 for the competition. The best open source software was awarded. Doctoral Symposium Doctoral Symposium was meant as a forum for mentoring graduate students. It was held in the afternoon of Oct. 25 both in the oral and poster formats. We have received 19 proposals for doctoral symposium. We accepted 13 presentations (6 oral + poster and 7 additional posters). Additionally, there was organized a Doctoral Symposium lunch in which the students had the opportunity to talk to their assigned mentors. Finally, the best doctoral symposium paper was awarded. Multimedia Art Exhibition and Reception ACM Multimedia provided a rich Multimedia Art Exhibition to stimulate artists and researchers alike to meet and discover the frontiers of multimedia artistic communication. The Art Exhibition has attracted significant work from a variety of digital artists collaborating with research institutions. We have endeavored to select exhibits that achieved an interesting balance between technology and artistic intent. The techniques underpinning these artworks are relevant to several technical tracks of the conference, in particular those dealing with human-centered and interactive media. We had a satellite venue, FAD (Forment de les Arts i del Disseny), for the art exhibition located in the center of the city. The venue had a very good public access. The exhibition was open from Oct. 21 to Oct. 28 and visited by more than 2,000 visitors. The reception event was held with the artists on Oct. 23. We had selected 10 art works for the exhibition:

  1. Emotion Forecast, Maurice Benayoun (City University of Hong Kong)
  2. Critical, Anabela Costa (France)
  3. Smile-Wall, Shen-Chi Chen, He-Lin Luo, Kuan-Wen Chen, Yu-Shan Lin, Hsiao-Lun Wang, Che-Yao Chan, Kai-Chih Huang, Yi-Ping Hung (National Taiwan University)
  4. SOMA, Guillaume Faure (France)
  5. A Feast of Shadow Puppetry, Zhenzhen Hu, Min Lin, Si Liu, Jiangguo Jiang, Meng Wang, Richang Hong, Shuicheng Yan, Hefei University of Technology and NUS
  6. Tele Echo Tube, Hill Hiroki Kobayashi, Kaoru Saito, Akio Fujiwara (University of Tokyo)
  7. 3D-Stroboscopy, Sujin Lee (Sogang University, South Korea)
  8. The Qi of Calligraphy, He-Lin Luo, Yi-Ping Hung (Taiwan National University), I-Chun Chen (Tainan National University of the Arts)
  9. Gestural Pen Animation, Sheng-Ying Pao and Kent Larson (MIT Media Lab, USA)
  10. MixPerceptions, Jose San Pedro (Telefonica Research, Spain), Aurelio San Pedro (Escola Massana, Barcelona), Juan Pablo Carrascal (UPF, Barcelona), Matylda Szmukier (Telefonica Research, Spain)

Attending the Art Exhibition

San Pedro’s Mix Perceptions


We received in 14 tutorial proposals and we have selected 8 tutorials for the main program. All tutorials were half day and were held on Oct. 21 and 22 in parallel with the workshops in the in the Universitat Pompeu Fabra – Balmes building. Tutorials were made free for all participants and we received 312 pre-registrations. Gerald Friedland(ICSI)

Tutorial 1 Foundations and Applications of Semantic Technologies for Multimedia Content
Ansgar Scherp (Uni Mannheim, Germany)
Tutorial 2 Towards Next-Generation Multimedia Recommendation Systems
Jialie Shen, (SMU Singapore)
Shuicheng Yan (NUS)
Xian-Sheng Hua (Microsoft)
Tutorial 3 Crowdsourcing for Multimedia Research
Mohammad Soleymani (Imperial College London)
Martha Larson (TU Delft)
Tutorial 4 Massive-Scale Multimedia Semantic Modeling
John R. Smith (IBM Research )
Liangliang Cao (IBM Research)
Tutorial 5 Social Interactions over Geographic-Aware Multimedia Systems
Roger Zimmerman (NUS)
Yi Yu (NUS)
Tutorial 6 Multimedia Information Retrieval: Music and Audio
Markus Schedl (JKU Linz)
Emilia Gomez (UPF)
Masataka Goto (AIST)
Tutorial 7 Blending the Physical and the Virtual in Musical Technology: From interface design to multimodal signal processing
George Tzanetakis (U Victoria, Canada)
Sidney Fels (UBC)
Michael Lyons (Ritsumeikan U, JP)
Tutorial 8 Privacy Concerns of Sharing Multimedia in Social Networks
Gerald Friedland (ICSI)


Workshops have always been an important part of the conference. Below is the list of workshops held in conjunction with ACM Multimedia 2013. We had 9 full day workshops and 4 half day workshops, which were held on Oct. 21-22 in parallel with the tutorials. We followed the rule from last year and two complementary workshop only registrations were provided for invited talks of each workshop to encourage participation of notable speakers.

Full Day Workshops (8)

  1. 2nd International Workshop on Socially-Aware Multimedia (SAM 2013) Organizers: Pablo Cesar (CWI, NL) Matthew Cooper (FXPAL) David A. Shamma (Yahoo!) Doug Williams (BT)
  1. 4th ACM/IEEE ARTEMIS 2013 International Workshop on Analysis and Retrieval of Tracked Events and Motion in Imagery Streams Organizers: Marco Bertini (University of Florence, Italy) Anastasios Doulamis (TU Crete, Greece) Nikolaos Doulamis (Cyprus University of Technology, Cyprus) Jordi Gonzàlez (Universitat Autònoma de Barcelona, Spain) Thomas Moeslund (University of Aalborg, Denmark)
  1. 5th International Workshop on Multimedia for Cooking and Eating Activities (CEA2013) Organizer: Kiyoharu Aizawa(Univ. of Tokyo, JP)
  1. 4th International Workshop on Human Behavior Understanding (HBU 2013) Organizers: Albert Ali Salah, Boğaziçi Univ., Turkey Hayley Hung, Delft Univ. of Technology, The Netherlands Oya Aran, Idiap Research Intitute, Switzerland Hatice Gunes, Queen Mary Univ. of London (QMUL), UK
  1. International ACM Workshop on Crowdsourcing for Multimedia 2013 (CrowdMM 2013) Organizers: Wei-Ta Chu (National Chung Cheng University, TW) Martha Larson (Delft University of Technology, NL) Kuan-Ta Chen (Academia Sinica, TW)
  1. First ACM MM Workshop on Multimedia Indexing and Information Retrieval for Healthcare (ACM MM MIIRH) Organizers: Jenny Benois-Pineau, University of Bordeaux 1, France Alexia Briasouli, CERTH -ITI Alex Hauptman, Carnegie-Mellon University, USA
  1. Workshop on Personal Data Meets Distributed Multimedia Organizers: Vivek Singh, MIT, USA Tat-Seng Chua, NUS Ramesh Jain, University of California, Irvine, USA Alex (Sandy) Pentland, MIT, USA
  1. Workshop on Immersive Media Experiences Organizers: Teresa Chambel, University of Lisbon, Portugal V. Michael Bove, MIT Media Lab, USA Sharon Strover, University of Texas at Austin, USAA Paula Viana, Polytechnic of Porto and INESC TEC, Portugal Graham Thomas, BBC, UK
  1. Workshop on Event-based Media Integration and Processing Organizers: Fausto Giunchiglia, University of Trento, Italy Sang “Peter” Chin, Johns Hopkins University, US Giulia Boato, University of Trento, Italy Bogdan Ionescu, University Politehnica of Bucharest, Romania Yiannis Kompatsiaris, Centre for Research and Technology Hellas, Greece

Half Day Workshops (4)

  1. ACM Multimedia Workshop on Geotagging and Its Applications Organizers: Liangliang Cao, IBM T. J. Watson Research Center, USA Gerald Friedland, International Computer Science Institute, USA, Pascal Kelm, Technische Universitaet of Berlin, Germany
  1. Data-driven challenge-based workshop ACM MM 2013(AVEC 2013) Organizers: Björn Schuller, TUM, Germany Michel Valstar, University of Nottingham, UK Roddy Cowie, Queen’s University Belfast, UK Maja Pantic, Imperial College London, UK Jarek Krajewski, University of Wuppertal, Germany
  1. 2nd ACM International Workshop on Multimedia Analysis for Ecological Data (MAED 2013) Organizers: Concetto Spampinato, University of Catania, Italy Vasileios Mezaris, CERTH, Greece Jacco van Ossenbruggen, CWI, The Netherlands
  1. 3rd International Workshop on Interactive Multimedia on Mobile and Portable Devices(IMMPD’13) Organizers: Jiebo Luo, University of Rochester, USA Caifeng Shan, Philips Research, The Netherlands Ling Shao, The University of Sheffield, UK Minoru Etoh, NTT DOCOMO, Japan


Awards were given in almost all the programs except for short papers during the banquet that was organized at the conference venue. The following awards have been given: Best Paper Award Luoqi Liu, Hui Xu, Junliang Xing, Si Liu, Xi Zhou and Shuicheng Yan, National University of Singapore (NUS), “Wow! You Are So Beautiful Today!” Best Student Paper Award Hanwang Zhang, Zheng-Jun Zha, Yang Yang, Shuicheng Yan, Yue Gao and Tat-Seng Chua, National University of Singapore (NUS), “Attributes-augmented Semantic Hierarchy for Image Retrieval” Grand Challenge 1st Place Award [Sponsored by Technicolor] Brendan Jou, Hongzhi Li, Joseph G. Ellis, Daniel Morozoff-Abegauz and Shih-Fu Chang, Digital Video & Multimedia (DVMM) Lab, Columbia University, “Structured Exploration of Who, What, When, and Where in Heterogenous Multimedia News Sources” Grand Challenge 2nd Place Award [Sponsored by Technicolor] Subhabrata Bhattacharya, Behnaz Nojavanasghari, Tao Chen, Dong Liu, Shih-Fu Chang, Mubarak Shah, University of Central Florida and Columbia University, “Towards a Comprehensive Computational Model for Aesthetic Assessment of Videos” Grand Challenge 3rd Place Award [Sponsored by Technicolor] Shannon Chen, Penye Xia, and Klara Nahrstedt, UIUC, “Activity-Aware Adaptive Compression: A Morphing-Based Frame Synthesis Application in 3DTI”

Program chairs during the banquet

Award ceremony

Banquet venue

Social program

Grand Challenge Multimodal Award [Sponsored by Technicolor] Chun-Che Wu, Kuan-Yu Chu, Yin-Hsi Kuo, Yan-Ying Chen, Wen-Yu Lee, Winston H. Hsu, National Taiwan University, Taiwan, “Search-Based Relevance Association with Auxiliary Contextual Cues” Best Demo Award Duong-Trung-Dung Nguyen, Mukesh Saini; Vu-Thanh Nguyen, Wei Tsang Ooi, National University of Singapore (NUS), “Jiku director: An online mobile video mashup system” Best Doctoral Symposium Paper Jules Francoise, Institut de Recherche et Coordination Acoustique/Musique (IRCAM), “Gesture-Sound Mapping by demonstration in Interactive Music Systems” Best Open Source Software Award Dmitry Bogdanov, Nicolas Wack, Emilia Gómez, Sankalp Gulati, Perfecto Herrera, Oscar Mayor, Gerard Roma, Justin Salamon, Jose Zapata Xavier Serra (UPF), “ESSENTIA: An Audio Analysis Library for Music Information Retrieval”

Prize amounts:

Best Paper Award 500 euro
Best Student Paper Award 250 euro
Grand Challenge 1st Prize 750 euro
Grand Challenge 2nd Prize 500 euro
Grand Challenge 3nd Prize 200 euro
Grand Challenge Multimodal Prize 500 euro
Best Technical Demo Award 250 euro
Best Doctoral Symposium Paper 250 euro
Best Open Source Software Award 250 euro
Student Travel Grant (35 students) $26,000 ($10,000 NSF, $16,000 SIGMM)

Sponsors:We had an incredible support from industries and funding organizations (38.5k euro). All the sponsors and the institutional supporters are listed in Appendix B. The sponsoring amount for each individual sponsor is as follows:

Sponsor Amount
FXPAL 5000 euro
Google 5000 euro
Huawei 5000 euro
Yahoo!Labs 5000 euro
Technicolor 4000 euro
Media Mixer 3500 euro
INRIA 3000 euro
Facebook 2000 euro
IBM 2000 euro
Telefonica 2000 euro
Microsoft 2000 euro
Total 38500 euro

The benefits for the sponsors were honorary registrations and publicity, that is, the company logo was published on the website of the conference, in the Proceedings, and the Booklet. On top of these amounts we have received 16k$ from SIGMM and 10K from NSF for student travel grants.

Geographical distribution of the participants

We had 544 participants at the main conference and workshops. The main conference was attended by 476 participants out of which 425 paid and 51 participants were special cases (sponsors, student volunteers, etc.), and 68 participants attended only the workshops. The tutorials which were free of charge were registered by 312 in advance. Country-wise distribution is shown below. As shown in the list, the geographical distribution was wide meaning that we managed to attract participants from a large number of countries.

Total  # of participants 544      
USA 75 Switzerland 20
Singapore 48 Germany 20
China 45 Portugal 20
Japan 40 Taiwan 18
UK 35 Korea 15
Italy 29 Australia 15
France 28 Greece 14
Netherlands 26 Turkey 14
Spain 26 25 other countries 56


In order to gather opinions from the participants at ACM Multimedia 2013 we have performed a post-conference survey and the results are summarized in Appendix C. Here we summarize the 10 most important issues that were compiled after analyzing the answers received. The effort to gather all this information is the first of its kind at ACM Multimedia and we hope this tradition will be continued in the future. The results of the survey represent in our opinion a very good source of information for the future organizers.

  1. Poster space too small
  2. Many people still want USB proceedings!!
  3. Oral topics in the same time slot overlapped too much. Need to diversify.
  4. Need to attract more multimedia niche topics. Should not become a second rate CV conference
  5. First day location hard to find. Workshop/tutorial better to be co-located with main conference
  6. Senior members of MM community should participate in paper sessions more
  7. Need to update web site program content and make it available earlier
  8. Consider offering short spotlight talks for poster papers
  9. Keep 15 mins for oral, but have them presented again in poster session for more discussion
  10. SIGMM business meeting too long. Not enough time for QA.


ACM Multimedia 2013 was a great success with a great number of submissions, an excellent technical program, attractive program components, and stimulating events. As a result, we welcomed a large number of participants, in line with our initial expectation. There were a few problems see above but this is only natural. We greatly acknowledge those who have contributed to the success of ACM Multimedia 2013. We thank the organizers of ACM Multimedia 2012 for their useful suggestions and comments which helped us to improve the organization the 2013 edition. We also thank them for giving us the template for the conference booklet. We thank the many paper authors and proposal contributors for the various technical and program components. We thank the large number of volunteers, including the Organizing Committee members and Technical Program Committee members who worked very hard to create this year’s outstanding conference. Every aspect of the conference was also aided by local committee members and by the hard work of Grupo Pacifico, to whom we are very grateful. We thank also ACM staff and Sheridan Printing Company for their constant support. This success was clearly due to the integration of their efforts.


General Co-Chairs  Alejandro (Alex) Jaimes (Yahoo Labs, Spain) Nicu Sebe (University of Trento, Italy) Nozha Boujemaa (INRIA, France) Technical Program Co-Chairs Daniel Gatica-Perez (IDIAP & EPFL, Switzerland) David A. Shamma (Yahoo Labs, USA) Marcel Worring (University of Amsterdam, The Netherlands) Roger Zimmermann (National University of Singapore, Singapore) Author’s Advocate Pablo Cesar (CWI, The Netherlands) Multimedia Grand Challenge Co-Chairs Yiannis Kompatsiaris (CERTH, Greece) Neil O’Hare (Yahoo Labs, Spain) Interactive Arts Co-Chairs Antonio Camurri (University of Genova, Italy) Marc Cavazza (Teesside University, UK) Local Arrangement Chair Mari-Carmen Marcos (Pompeu Fabra University, Spain) Sponsorship Chairs Ricardo Baeza-Yates (Yahoo Labs, Spain) Bernard Merialdo (Eurecom, France) Panel Co-Chairs  Yong Rui (Microsoft, China) Winston Hsu (National Tawain University, Taiwan) Michael Lew (University of Leiden, The Netherlands) Video Program Chairs Alexis Joly (INRIA, France) Giovanni Maria Farinella (University of Catania, Italy) Julien Champ (INRIA/LIRMM, France) Brave New Ideas Co-Chairs Jiebo Luo (University of Rochester, USA) Shuicheng Yan (National University of Singapore, Singapore) Doctorial Symposium Chairs Hayley Hung (Technical University of Delft, The Netherlands) Marco Cristani (University of Verona, Italy) Open Source Competition Chairs Ioannis (Yiannis) Patras (Queen Mary University, UK) Andrea Vedaldi (Oxford University, UK) Tutorial Co-Chairs Kiyoharu Aizawa (University of Tokyo, Japan) Lexing Xie (Australian National University, Australia) Workshop Co-Chairs Maja Pantic (Imperial College, UK ) Vladimir Pavlovic (Rutgers University, USA) Student Travel Grants Co-Chairs Ramanathan Subramanian (ADSC, Singapore) Jasper Uijlings (University of Trento, Italy) Publicity Co-Chairs Marco Bertini (University of Florence, Italy) Ichiro Ide (Nagoya University, Japan) Technical Demo Co-Chairs  Yi Yang (Carnegie Mellon University, USA) Xavier Anguera (Telefonica Research, Spain) Proceedings Co-Chairs  Bogdan Ionescu (University Politehnica of Bucharest, Romania) Qi Tian (University of Texas San Antonio, USA) Web Chair Michele Trevisol (Web Research Group UPF & Yahoo Labs, Spain)

Appendix B. ACM MM 2012 Sponsors & Supporters