Report from ICMR 2017

ACM International Conference on Multimedia Retrieval (ICMR) 2017

ACM ICMR 2017 in “Little Paris”

ACM ICMR is the premier International Conference on Multimedia Retrieval, and from 2011 it “illuminates the state of the arts in multimedia retrieval”. This year, ICMR was in an wonderful location: Bucharest, Romania also known as “Little Paris”. Every year at ICMR I learn something new. And here is what I learnt this year.


Final Conference Shot at UP Bucharest

UNDERSTANDING THE TANGIBLE: object, scenes, semantic categories – everything we can see.

1) Objects (and YODA) can be easily tracked in videos.

Arnold Smeulders delivered a brilliant keynote on “things” retrieval: given an object in an image, can we find (and retrieve) it in other images, videos, and beyond? Very interesting technique for tracking objects (e.g. Yoda) in videos based on similarity learnt through siamese networks.

Tracking Yoda with Siamese Networks

2) Wearables + computer vision help explore cultural heritage sites.

As showed in his keynote, at MICC University of Florence, Alberto del Bimbo and his amazing team have designed smart audio guides for indoor and outdoor spaces. The system detects, recognises, and describes landmarks and artworks from wearable camera inputs (and GPS coordinates, in case of outdoor spaces).

3) We can finally quantify how much images provide complementary semantics compared to text [BEST MULTIMODAL PAPER AWARD].

For ages, the community has asked how relevant different modalities are for multimedia analysis: this paper ( finally proposes a solution to quantify information gaps between different modalities.

4) Exploring news corpuses is now very easy: news graphs are easy to navigate and aware of the type of relations between articles.

Remi Bois and his colleagues presented this framework (, made for professional journalists and the general public, for seamlessly browsing through large-scale news corpus. They built a graph where nodes are articles in a news corpus. The most relevant items to each article are chosen (and linked) based on an adaptive nearest neighbor technique. Each link is then characterised according to the type of relation of the 2 linked nodes.

5) Panorama outdoor images are much easier to localise.

In his beautiful work (, Ahmet Iscen from Inria developed an algorithm for location prediction from StreetView images, outperforming the state of the art thanks to an intelligent stitching pre-processing step: predicting locations from panoramas (stitched individual views) instead of individual street images improves performances dramatically!

UNDERSTANDING THE INTANGIBLE: artistic aspects, beauty, intent: everything we can perceive

1) Image search intent can be predicted by the way we look.

In his best paper candidate research work (, Mohammad Soleymani showed that image search intent (seeking information, finding content, or re-finding content) can be predicted from physisological responses (eye gaze) and implicit user interaction (mouse movements).

2) Real-time detection of fake tweets is now possible using user and textual cues.

Another best paper candidate (, this time from CERTH. The team collected a large dataset of fake/real sample tweets spanning 17 events and built an effective model from misleading content detection from tweet content and user characteristics. A live demo here:

3) Music tracks have different functions in our daily lives.

Researchers from TU Delft have developed an algorithm ( which classifies music tracks according to their purpose in our daily activities: relax, study and workout.

4) By transferring image style we can make images more memorable!

The team at University of Trento built an automatic framework ( to improve image memorability. A selector finds the style seeds (i.e. abstract paintings) which are likely to increase memorability of a given image, and after style transfer, the image will be more memorable!

5) Neural networks can help retrieve and discover child book illustrations.

In this amazing work (, motivated by real children experiences, Pinar and her team from Hacettepe University collected a large dataset of children book illustrations and found that neural networks can predict and transfer style, allowing to make “Winnie the witch”-like many other illustrations.

Winnie the Witch

6) Locals perceive their neighborhood as less interesting, more dangerous and dirtier compared to non-locals.

In this wonderful work (, presented by Darshan Santain from IDIAP, researchers asked locals and crowd-workers to look at pictures from various neighborhoods in Guanajuato and rate them according to interestingness, cleanliness, and safety.

THE FUTURE: What’s Next?

1) We will be able to anonymize images of outdoor spaces thanks to Instagram filters, as proposed by this work ( in the Brave New Idea session.  When an image of an outdoor space is manipulated with appropriate Instagram filters, the location of the image can be masked from vision-based geolocation classifiers.

2) Soon we will be able to embed watermarks in our Deep Neural Network models in order to protect our intellectual property [BEST PAPER AWARD]. This is a disruptive, novel idea, and that is why this work from KDDI Research and Japan National Institute of Informatics won the best paper award. Congratulations!

3) Given an image view of an object, we will predict the other side of things (from Smeulders’ keynote). In the pic: predicting the other side of chairs. Beautiful.

Predicting the other side of things

THANKS: To the organisers, to the volunteers, and to all the authors for their beautiful work :)

EDITORIAL NOTE: A more extensive report from ICMR 2017 by Miriam is available on Medium

Awarding the Best Social Media Reporters

The SIGMM Records team has adopted a new strategy to encourage the publication of information, and thus increase the chances to reach the community, increase knowledge and foster interaction. It consists of awarding the best Social Media reporters for each SIGMM conference, being the award a free registration to one of the SIGMM conference within a period of one year. All SIGMM members are welcome to participate and contribute, and are candidates to receive the award.

The Social Media Editors will issue a new open Call for Reports (CfR) via the Social Media channels every time a new SIGMM conference takes place, so the community can remember or be aware of this initiative, as well as can refresh its requirements and criteria.

The CfR will encourage activity on Social Media channels, posting information and contents related to the SIGMM conferences, with the proper hashtags (see our Recommendations). The reporters will be encouraged to mainly use Twitter, but other channels and innovative forms or trends of dissemination will be very welcome!

The Social Media Editors will be the jury for deciding the best reports (i.e., collection of posts) on Social Media channels, and thus will not qualify for this award. The awarded reporters will be additionally asked to provide a post-summary of the conference. The number of awards for each SIGMM conference is indicated in the table below. The awarded reporter will get a free registration to one of the SIGMM conferences (of his/her choice) within a period of one year.

Read more

Posting about SIGMM on Social Media

In Social Media, a common and effective mechanism to associate the publications about a specific thread, topic or event is to use hashtags. Therefore, the Social Media Editors believe in the convenience of recommending standards or basic rules for the creation and usage of the hashtags to be included in the publications related to the SIGMM conferences.

In this context, a common doubt is whether to include the ACM word and the year in the hashtags for conferences. Regarding the year, our recommendation is to not include it, as the date is available for the publications themselves and, this way, a single hashtag can be used to gather all the publications for all the editions of a specific SIGMM conference. Regarding the ACM word, our recommendation is to include it in the hashtag only if the conference acronym contains less than four letters (i.e., #acmmm, #acmtvx) and otherwise not (i.e., #mmsys, #icmr). Although consistency is important, not including ACM for MM (and for TVX) is clearly not a good identifier, and including it for MMSYS and ICMR results in a too long hashtag. Indeed, the #acmmmsys and #acmicmr hashtags have not been used before, contrarily to the wide use of #acmmm (and also of #acmtvx). Therefore, our recommendations for the usage and inclusion of hashtags can be summarized as:

Conference Hashtag

Include #ACM and #SIGMM?

MM #acmmm Yes
MMSYS #mmsys Yes
ICMR #icmr Yes



Report from MMM 2017

Harpa, the venue of the conference banquet.

MMM 2017 — 23rd International Conference on MultiMedia Modeling

MMM is a leading international conference for researchers and industry practitioners for sharing new ideas, original research results and practical development experiences from all MMM related areas. The 23rd edition of MMM took place on January 4-6 of 2017, on the modern campus of Reykjavik University. In this short report, we outline the major aspects of the conference, including: technical program; best paper session; video browser showdown; demonstrations; keynotes; special sessions; and social events. We end by acknowledging the contributions of the many excellent colleagues who helped us organize the conference. For more details, please refer to the MMM 2017 web site.

Technical Program

The MMM conference calls for research papers reporting original investigation results and demonstrations in all areas related to multimedia modeling technologies and applications. Special sessions were also held that focused on addressing new challenges for the multimedia community.

This year, 149 regular full paper submissions were received, of which 36 were accepted for oral presentation and 33 for poster presentation, for a 46% acceptance ratio. Overall, MMM received 198 submissions for all tracks, and accepted 107 for oral and poster presentation, for a total of 54% acceptance rate. For more details, please refer to the table below.

MMM2017 Submissions and Acceptance Rates

MMM2017 Submissions and Acceptance Rates

Best Paper Session

Four best paper candidates were selected for the best paper session, which was a plenary session at the start of the conference.

The best paper, by unanimous decision, was “On the Exploration of Convolutional Fusion Networks for Visual Recognition” by Yu Liu, Yanming Guo, and Michael S. Lew. In this paper, the authors propose an efficient multi-scale fusion architecture, called convolutional fusion networks (CFN), which can generate the side branches from multi-scale intermediate layers while consuming few parameters.

Phoebe Chen, Laurent Amsaleg and Shin’ichi Satoh (left) present the Best Paper Award to Yu Liu and Yanming Guo (right).

Phoebe Chen, Laurent Amsaleg and Shin’ichi Satoh (left) present the Best Paper Award to Yu Liu and Yanming Guo (right).

The best student paper, partially chosen due to the excellent presentation of the work, was “Cross-modal Recipe Retrieval: How to Cook This Dish?” by Jingjing Chen, Lei Pang, and Chong-Wah Ngo. In this work, the problem of sharing food pictures from the viewpoint of cross-modality analysis was explored. Given a large number of image and recipe pairs acquired from the Internet, a joint space is learnt to locally capture the ingredient correspondence from images and recipes.

Phoebe Chen, Laurent Amsaleg and Shin’ichi Satoh (left) present the Best Student Paper Award to Jingjing Chen and Chong-Wah Ngo (right).

Phoebe Chen, Shin’ichi Satoh and Laurent Amsaleg (left) present the Best Student Paper Award to Jingjing Chen and Chong-Wah Ngo (right).

The two runners-up were “Spatio-temporal VLAD Encoding for Human Action Recognition in Videos” by Ionut Cosmin Duta, Bogdan Ionescu, Kiyoharu Aizawa, and Nicu Sebe, and “A Framework of Privacy-Preserving Image Recognition for Image-Based Information Services” by Kojiro Fujii, Kazuaki Nakamura, Naoko Nitta, and Noboru Babaguchi.

Video Browser Showdown

The Video Browser Showdown (VBS) is an annual live video search competition, which has been organized as a special session at MMM conferences since 2012. In VBS, researchers evaluate and demonstrate the efficiency of their exploratory video retrieval tools on a shared data set in front of the audience. The participating teams start with a short presentation of their system and then perform several video retrieval tasks with a moderately large video collection (about 600 hours of video content). This year, seven teams registered for VBS, although one team could not compete for personal and technical reasons. For the first time in 2017, live judging was included, in which a panel of expert judges made decisions in real-time about the accuracy of the submissions for ⅓ of the tasks.

Teams and spectators in the Video Browser Showdown.

Teams and spectators in the Video Browser Showdown.

On the social side, two changes were also made from previous conferences. First, VBS was held in a plenary session, to avoid conflicts with other schedule items. Second, the conference reception was held at VBS, which meant that attendees had extra incentives to attend VBS, namely food and drink. And third, Alan Smeaton served as “color commentator” during the competition, interviewing the organizers and participants, and helping explain to the audience what was going on. All of these changes worked well, and contributed to a very well attended VBS session.

The winners of VBS 2017, after a very even and exciting competition, were Luca Rossetto, Ivan Giangreco, Claudiu Tanase, Heiko Schuldt, Stephane Dupont and Omar Seddati, with their IMOTION system.

The winners of VBS 2017, after a very even and exciting competition, were Luca Rossetto, Ivan Giangreco, Claudiu Tanase, Heiko Schuldt, Stephane Dupont and Omar Seddati, with their IMOTION system.


Five demonstrations were presented at MMM. As in previous years, the best demonstration was selected using both a popular vote and a selection committee. And, as in previous years, both methods produced the same winner, which was: “DeepStyleCam: A Real-time Style Transfer App on iOS” by Ryosuke Tanno, Shin Matsuo, Wataru Shimoda, and Keiji Yanai.

The winners of the Best Demonstration competition hard at work presenting their system.

The winners of the Best Demonstration competition hard at work presenting their system.


The first keynote, held in the first session of the conference, was “Multimedia Analytics: From Data to Insight” by Marcel Worring, University of Amsterdam, Netherlands. He reported on a novel multimedia analytics model based on an extensive survey of over eight hundred papers. In the analytics model, the need for semantic navigation of the collection is emphasized and multimedia analytics tasks are placed on an exploration-search axis. Categorization is then proposed as a suitable umbrella task for realizing the exploration-search axis in the model. In the end, he considered the scalability of the model to collections of 100 million images, moving towards methods which truly support interactive insight gain in huge collections.

Björn Þór Jónsson introduces the first keynote speaker, Marcel Worring (right).

Björn Þór Jónsson introduces the first keynote speaker, Marcel Worring (right).

The second keynote, held in the last session of the conference, was “Creating Future Values in Information Access Research through NTCIR” by Noriko Kando, National Institute of Informatics, Japan. She reported on NTCIR (NII Testbeds and Community for Information access Research), which is a series of evaluation workshops designed to enhance the research in information access technologies, such as information retrieval, question answering, and summarization using East-Asian languages, by providing infrastructures for research and evaluation. Prof Kando provided motivations for the participation in such benchmarking activities and she highlighted the range of scientific tasks and challenges that have been explored at NTCIR over the past twenty years. She ended with ideas for the future direction of NTCIR.


Noriko Kando presents the second MMM keynote.

Special Sessions

During the conference, four special sessions were held. Special sessions are mini-venues, each focusing on one state-of-the-art research direction within the multimedia field. The sessions are proposed and chaired by international researchers, who also manage the review process, in coordination with the Program Committee Chairs. This year’s sessions were:
– “Social Media Retrieval and Recommendation” organized by Liqiang Nie, Yan Yan, and Benoit Huet;
– “Modeling Multimedia Behaviors” organized by Peng Wang, Frank Hopfgartner, and Liang Bai;
– “Multimedia Computing for Intelligent Life” organized by Zhineng Chen, Wei Zhang, Ting Yao, Kai-Lung Hua, and Wen-Huang Cheng; and
– “Multimedia and Multimodal Interaction for Health and Basic Care Applications” organized by Stefanos Vrochidis, Leo Wanner, Elisabeth André, Klaus Schoeffmann.

Social Events

This year, there were two main social events at MMM 2017: a welcome reception at the Video Browser Showdown, as discussed above, and the conference banquet. Optional tours then allowed participants to further enjoy their stay on the unique and beautiful island.

The conference banquet was held in two parts. First, we visited the exotic Blue Lagoon, which is widely recognised as one of the modern wonders of the world and one of the most popular tourist destinations in Iceland. MMM participants had the option of bathing for two hours in this extraordinary spa, and applying the healing silica mud to their skin, before heading back for the banquet in Reykjavík.

The banquet itself was then held at the Harpa Reykjavik Concert Hall and Conference Centre in downtown Reykjavík. Harpa is one of Reykjavik‘s most recent, yet greatest and most distinguished landmarks. It is a cultural and social centre in the heart of the city and features stunning views of the surrounding mountains and the North Atlantic Ocean.

Harpa, the venue of the conference banquet.

Harpa, the venue of the conference banquet.

During the banquet, Steering Committee Chair Phoebe Chen gave a historical overview of the MMM conferences and announced the venues for MMM 2018 (Bangkok, Thailand) and MMM 2019 (Thessaloniki, Greece), before awards for the best contributions were presented. Finally, participants were entertained by a small choir, and were even asked to participate in singing a traditional Icelandic folk song.

MMM 2018 will be held at Chulalongkorn University in Bangkok, Thailand.  See

MMM 2018 will be held at Chulalongkorn University in Bangkok, Thailand. See


There are many people who deserve appreciation for their invaluable contributions to MMM 2017. First and foremost, we would like to thank our Program Committee Chairs, Laurent Amsaleg and Shin’ichi Satoh, who did excellent work in organizing the review process and helping us with the organization of the conference; indeed they are still hard at work with an MTAP special issue for selected papers from the conference. The Proceedings Chair, Gylfi Þór Guðmundsson, and Local Organization Chair, Marta Kristín Lárusdóttir, were also tirelessly involved in the conference organization and deserve much gratitude.

Other conference officers contributed to the organization and deserve thanks: Frank Hopfgartner and Esra Acar (demonstration chairs); Klaus Schöffmann, Werner Bailer and Jakub Lokoč (VBS Chairs); Yantao Zhang and Tao Mei (Sponsorship Chairs); all the Special Session Chairs listed above; the 150 strong Program Committee, who did an excellent job with the reviews; and the MMM Steering Committee, for entrusting us with the organization of MMM 2017.

Finally, we would like to thank our student volunteers (Atli Freyr Einarsson, Bjarni Kristján Leifsson, Björgvin Birkir Björgvinsson, Caroline Butschek, Freysteinn Alfreðsson, Hanna Ragnarsdóttir, Harpa Guðjónsdóttir), our hosts at Reykjavík University (in particular Arnar Egilsson, Aðalsteinn Hjálmarsson, Jón Ingi Hjálmarsson and Þórunn Hilda Jónasdóttir), the CP Reykjavik conference service, and all others who helped make the conference a success.