Volume 9, Issue 2, June 2017 (ISSN 1947-4598)

Table of Contents

  1. @sigmm Records: serving the community
  2. Standards Column: JPEG and MPEG
  3. Datasets and Benchmarks Column: Introduction
  4. Hello Multidisciplinary Column!
  5. Open Source Column – Introduction
  6. Introduction to the Opinion Column
  7. Interview Column – Introduction
  8. @sigmm on #SocialMedia
  9. An interview with David Ayman Shamma
  10. MPEG Column: 118th MPEG Meeting
  11. JPEG Column: 75th JPEG Meeting in Sydney, Australia
  12. Report from MMM 2017
  13. Posting about SIGMM on Social Media
  14. Awarding the Best Social Media Reporters
  15. Report from ACM ICMR 2017
  16. PhD thesis abstracts
    1. Sucheta Ghosh
  17. Journal issue TOCs
  18. Job opportunities
  19. Back Matter
    1. Call for Contributions
    2. Notice to Contributing Authors to SIG Newsletters
    3. Impressum

@sigmm Records: serving the community

The SIGMM Records are renewing, with the continued ambition of being a useful resource for the multimedia community. We want to provide a forum for (open) discussion, but also to become the primary source of information for our community.

Firstly, I would like to thank Carsten who was run, single-handed, the whole records for many many years. We all agree that he has done an amazing job, and that his service deserves our gratitude, and possibly some beers, when you meet him at conferences and meetings.

As you are probably aware, a number of changes in the records are underway. We want your opinions and suggestions to make this resource the best it can be. Hence, we need your help to make this a success, so please drop us a line if you want to join the team.

The two main visible changes are:

We have a new amazing team to lead the records in the coming years. I am so glad to have their help:

We have reorganized the records and its structure, in three main clusters:

More changes to come. Stay tuned!

Pablo (Editor in Chief) + Carsten and Mario (Information Directors)

Pablo CesarDr. Pablo Cesar leads the Distributed and Interactive Systems group at Centrum Wiskunde & Informatica (CWI) in the Netherlands. Pablo’s research focuses on modeling and controlling complex collections of media objects (including real-time media and sensor data) that are distributed in time and space. His fundamental interest is in understanding how different customizations of such collections affect the user experience. Pablo is the PI of Public Private Partnership projects with Xinhuanet and ByBorre, and very successful EU-funded projects like 2-IMMERSE, REVERIE and Vconect. He has (co)-authored over 100 articles. He is member of the editorial board of, among others, ACM Transactions on Multimedia (TOMM). Pablo has given tutorials about multimedia systems in prestigious conferences such as ACM Multimedia, CHI, and the WWW conference. He acted as an invited expert at the European Commission’s Future Media Internet Architecture Think Tank and participates in standardisation activities at MPEG (point-cloud compression) and ITU (QoE for multi-party tele-meetings). Webpage:

Carsten GriwodzDr. Carsten Griwodz is Chief Research Scientist at the Media Department of theNorwegian research company Simula Research Laboratory AS, Norway, and professor at the University of Oslo. He is also co-founder of ForzaSys AS, a social media startup for sports. He is steering committee member of ACM MMSys and ACM/IEEE NetGames. He is associate editor of the IEEE MMTC R-Letter and was previously editor-in-chief of the ACM SIGMM Records and editor of ACM TOMM.


photo_mario_montagudDr. Mario Montagud (@mario_montagud) was born in Montitxelvo (Spain). He received a BsC in Telecommunications Engineering in 2011, an MsC degree in “Telecommunication Technologies, Systems and Networks” in 2012 and a PhD degree in Telecommunications (Cum Laude Distinction) in 2015, all of them at the Polytechnic University of Valencia (UPV). During his PhD degree and after completing it, he did 3 research stays (accumulating 18 months) at CWI (The National Research Institute for Mathematics and Computer Science in the Netherlands). He also has experience as a postdoc researcher at UPV. His topics of interest include Computer Networks, Interactive and Immersive Media, Synchronization, and QoE (Quality of Experience). Mario is (co-) author of over 50 scientific and teaching publications, and has contributed to standardization within the IETF (Internet Engineering Task Force). He is member of the Technical Committee of several international conferences (e.g., ACM MM, MMSYS and TVX), co-organizer of the international MediaSync Workshop series, and member of the Editorial Board of international journals. He is also lead editor of “MediaSync: Handbook on Multimedia Synchronization” (Springer, 2017) and Communication Embassador of ACM SIGCHI (Special Interest Group on Computer-Human Interaction). Webpage:

Standards Column: JPEG and MPEG


ISO/IEC JTC 1/SC 29 area of work comprises the standardization of coded representation of audio, picture, multimedia and hypermedia information and sets of compression and control functions for use with such information. SC29 basically hosts two working groups responsible for the development of international standards for the compression, decompression, processing, and coded representation of media content, in order to satisfy a wide variety of applications”, specifically WG1 targeting “digital still pictures”  — also known as JPEG — and WG11 targeting “moving pictures, audio, and their combination” — also known as MPEG. The earlier SC29 standards, namely JPEG, MPEG-1 and MPEG-2, received the technology & engineering Emmy award in 1995-96.

The standards columns within ACM SIGMM Records provide timely updates about the most recent developments within JPEG and MPEG respectively. The JPEG column is edited by Antonio Pinheiro and the MPEG column is edited by Christian Timmerer. The editors and an overview of recent JPEG and MPEG achievements as well as future plans are highlighted in this article.

Antonio Pinheiro received the BSc (Licenciatura) from I.S.T., Lisbon in 1988 and the PhD in faceAMGP3Electronic Systems Engineering from University of Essex in 2002. He is a lecturer at U.B.I. (Universidade da Beira Interior), Covilha, Portugal from 1988 and a researcher at I.T. (Instituto de Telecomunicações), Portugal. Currently, his research interests are on Image Processing, namely on Multimedia Quality Evaluation and Medical Image Analysis. He was a Portuguese representative of the European Union Actions COST IC1003 – QUALINET, COST IC1206 – DE-ID, COST 292 and currently of COST BM1304 – MYO-MRI. He is currently involved in the project EmergIMG funded by the Portuguese Funding agency and H2020, and he is a Portuguese delegate to JPEG, where he is currently the Communication Subgroup chair and involved with the JPEG Pleno project.



ct2013octChristian Timmerer received his M.Sc. (Dipl.-Ing.) in January 2003 and his Ph.D. (Dr.techn.) in June 2006 (for research on the adaptation of scalable multimedia content in streaming and constrained environments) both from the Alpen-Adria-Universität (AAU) Klagenfurt. He joined the AAU in 1999 (as a system administrator) and is currently an Associate Professor at the Institute of Information Technology (ITEC) within the Multimedia Communication Group. His research interests include immersive multimedia communications, streaming, adaptation, Quality of Experience, and Sensory Experience. He was the general chair of WIAMIS 2008, QoMEX 2013, and MMSys 2016 and has participated in several EC-funded projects, notably DANAE, ENTHRONE, P2P-Next, ALICANTE, SocialSensor, COST IC1003 QUALINET, and ICoSOLE. He also participated in ISO/MPEG work for several years, notably in the area of MPEG-21, MPEG-M, MPEG-V, and MPEG-DASH where he also served as standard editor. In 2012 he cofounded Bitmovin ( to provide professional services around MPEG-DASH where he holds the position of the Chief Innovation Officer (CIO).

Major JPEG and MPEG Achievements

In this section we would like to highlight major JPEG and MPEG achievements without claiming to be exhaustive.

JPEG developed the well-known digital pictures coding standard, known as JPEG image format almost 25 years ago. Due to the recent increase of social networks usage, the number of JPEG encoded images shared online grew to an impressive number of 1,800 billion per day in 2014. JPEG 2000 is another JPEG successful standard that also received the 2015 Technology and Engineering Emmy award. This standard uses state of the art compression technology providing higher compression and a wider applications domain. It is widely used at professional level, namely on movies production and medical imaging. JPEG also developed JBIG2, JPEG-LS, JPSearch and JPEG-XR standards. More recently JPEG launched JPEG-AIC, JPEG Systems and JPEG-XT. JPEG-XT defines backward compatible extensions of JPEG, adding support for HDR, lossless/near lossless, and alpha coding. An overview of the JPEG family of standards is shown in the figure below.

An overview of existing MPEG standards and achievements is shown in the figure below (taken from here).


A first major milestone and success was the development of MP3 which revolutionized digital audio content resulting in a sustainable change of the digital media ecosystem. The same holds for MPEG-2 video & systems where the latter, i.e., MPEG-2 Transport Stream, received the technology & engineering Emmy award. The mobile era within MPEG has been introduced with the MPEG-4 standard resulting in the development of AVC (received yet another Emmy award), AAC, and also the MP4 file format which have been deployed widely. Finally, streaming over the open internet is addressed by DASH and new forms of digital television including ultra high-definition & immersive services are targeted by MPEG-H comprising MMT, HEVC, and 3D audio.

Roadmap for Future JPEG and MPEG Standards

In this section we would like to highlight a roadmap for future JPEG and MPEG standards.

A roadmap for future JPEG standards is represented in the figure above. The main efforts are towards the JPEG Pleno project that aims to standardize new immersive technologies like light fields, point clouds or digital holography. Moreover, JPEG is launching JPEG-XS for low latency and light weight coding, while JPEG Systems is also developing a new part to add privacy and security protection to their standards. Furthermore, JPEG is continuously seeking new technological developments and it is committed on providing new standardized image coding solutions.


The future roadmap of MPEG standards is shown in the Figure below (taken from here).


MPEG’s roadmap for future standards comprises a variety of tools ranging from traditional audio-video coding to new forms of compression technologies like genome compression and lightfield. The systems aspects will cover applications domains which require media orchestration as well as focus on becoming the enabler for immersive media experiences.


In this article we briefly highlighted achievements and future plans of JPEG and MPEG but the future is not defined and requires participation from both industry and academia. We hope that our JPEG and MPEG columns will stimulate research and development within the multimedia domain and we are open for any kind of feedback. Contact Antonio Pinheiro ( or Christian Timmerer ( for any further questions or comments.

Datasets and Benchmarks Column: Introduction

Datasets are critical for research and development as, rather obviously, data is required for performing experiments, validating hypotheses, analyzing designs, and building applications. Over the years a plurality of multimedia collections have been put together, which can range from the one-off instances that have been exclusively created for supporting the work presented in a single paper or demo to those that have been created with multiple related or separate endeavors in mind. Unfortunately, the collected data is often not made publicly available. In some cases, it may not be possible to make a public release due to the proprietary or sensitive nature of the data, but other forces are also at work. For example, one might be reluctant to share data freely, as it has a value from the often substantial amount of time, effort, and money that was invested in collecting it. 

Once a dataset has been made public though, it becomes possible to perform validations of results reported in the literature and to make comparisons between methods using the same source of truth, although matters are complicated when the source code of the methods is not published or the ground truth labels are not made available. Benchmarks offer a useful compromise by offering a particular task to solve along with the data that one is allowed to use and the evaluation metrics that dictate what is considered success and failure. While benchmarks may not offer the cutting edge of research challenges for which utilizing the freshest data is an absolute requirement, they are a useful sanity check to ensure that methods that appear to work on paper also work in practice and are indeed as good as claimed.

Several efforts are underway to stimulate sharing of datasets and code, as well as to promote the reproducibility of experiments. These efforts provide encouragement to overcome the reluctance to share data by underlining the ways in which data becomes more valuable with community-wide use. They also offer insights on how researchers can put data sets that are publicly available to the best possible use. We provide here a couple of key examples of ongoing efforts. At the MMSys conference series, there is a special track for papers on datasets, and Qualinet maintains an index of known multimedia collections. The ACM Artifact Review and Badging policy proposal recommends journals and conferences to adopt a reviewing procedure where the submitted papers can be granted special badges to indicate to what extent the performed experiments are repeatable, replicable, and reproducible. For example, the “Artifacts Evaluated – Reusable” badge would indicate that artifacts associated with the research are found to be documented, consistent, complete, exercisable, and include appropriate evidence of verification and validation to the extent that reuse and repurposing is facilitated.

In future posts appearing in this column, we will be highlighting new public datasets and upcoming benchmarks through a series of invited guest posts, as well as provide insights and updates on the latest development in this area. The columns are edited by Bart Thomee and Martha Larson (see our bios at the end of this post).

To establish a baseline of popular multimedia datasets and benchmarks that have been used over the years by the research community, refer to the table below to see what the state of the art was as of 2015 when the data was compiled by Bart for his paper on the YFCC100M dataset. We can see the sizes of the datasets steadily increasing over the years, the license becoming less restrictive, and it now is the norm to also release additional metadata, precomputed features, and/or ground truth annotations together with the dataset. The last three entries in the table are benchmarks that include tasks such as video surveillance and object localization (TRECVID), diverse image search and music genre recognition (MediaEval), life-logging event search and medical image analysis (ImageCLEF), to name just a few. The table is most certainly not exhaustive, although it is reflective of the evolution of datasets over the last two decades. We will use this table to provide context for the datasets and benchmarks that we will cover in our upcoming columns, so stay tuned for our next post!


bartBart Thomee is a Software Engineer at Google/YouTube in San Bruno, CA, USA, where he focuses on web-scale real-time streaming and batch techniques to fight abuse, spam, and fraud. He was previously a Senior Research Scientist at Yahoo Labs and Flickr, where his research centered on the visual and spatiotemporal dimensions of media, in order to better understand how people experience and explore the world, and how to better assist them with doing so. He led the development of the YFCC100M dataset released in 2014, and previously was part of the efforts leading to the creation of both MIRFLICKR datasets. He has furthermore been part of the organization of the ImageCLEF photo annotation tasks 2012–2013, the MediaEval placing tasks 2013–2016, and the ACM MM Yahoo-Flickr Grand Challenges 2015–2016. In addition, he has served on the program committees of, amongst others, ACM MM, ICMR, SIGIR, ICWSM and ECIR. He was part of the Steering Committee of the Multimedia COMMONS 2015 workshop at ACM MM and co-chaired the workshop in 2016; he also co-organized the TAIA workshop at SIGIR 2015.

Martha Larson is professor in the area of multiSquaremedia information technology at Radboud University in Nijmegen, Netherlands. Previously, she researched and lectured in the area of audio-visual retrieval Fraunhofer IAIS, Germany, and at the University of Amsterdam, Netherlands. Larson is co-founder of the MediaEval international benchmarking initiative for Multimedia Evaluation. She has contributed to the organization of various other challenges, including CLEF NewsREEL 2015-2017, ACM RecSys Challenge 2016, and TRECVid Video Hyperlinking 2016. She has served on the program committees of numerous conferences in the area of information retrieval, multimedia, recommender systems, and speech technology. Other forms of service have included: Area Chair at ACM Multimedia 2013, 2014, 2017, and TPC Chair at ACM ICMR 2017. Currently, she is an Associated Editor for IEEE Transactions of Multimedia. She is a founding member of the ISCA Special Interest Group on Speech and Language in Multimedia and serves on the IAPR Technical Committee 12 Multimedia and Visual Information Systems. Together with Hayley Hung she developed and currently teaches an undergraduate course in Multimedia Analysis at Delft University of Technology, where she maintains a part-time membership in the Multimedia Computing Group.

Hello Multidisciplinary Column!

There is ‘multi’ in multimedia. Every day, an increasing amount of extremely diverse multimedia content has meaning and purpose to an increasing amount of extremely diverse human users, under extremely diverse use cases. As multimedia professionals, we work in an extremely diverse set of focus areas to enable this, ranging from systems aspects to user factors, which each have their own methodologies and related communities outside of the multimedia field.

In our multimedia publication venues, we see all this work coming together. However, are we already sufficiently aware of the multidisciplinary potential in our field? Do we take sufficient effort to consider our daily challenges under the perspectives and methodologies of radically different disciplines than our own? Do we sufficiently make use of existing experiences in problems related to our own, but studied in neighboring communities? And how can an increased multidisciplinary awareness help and inspire us to take the field further?

Feeling the need for a stage for multi- and interdisciplinary dialogue within the multimedia community—and beyond its borders—we are excited to serve as editors to this newly established multidisciplinary column of SIGMM records. This column will be published as part of the records, in 4 issues per year. Content-wise, we foresee a mix of opinion-based articles on multidisciplinary aspects of multimedia and interviews of peers whose work sits at the intersection of disciplines.

Call for contributions
We can only truly highlight the multidisciplinary merit of our field if the extreme diversity of our community is properly reflected in the contributions to this column. Therefore, in addition to invited articles, we are continuously looking for contributions from the community. Do you work at the junction of multimedia and another discipline? Did you get any important professional insights by interacting with neighboring communities? Do you want to share experiences on bridging towards other communities, or user audiences who are initially unfamiliar with our common interest areas? Can you contribute meta-perspectives on common case studies and challenges in our field? Do you know someone who should be interviewed or featured for this column? Then, please do not hesitate to reach out to us!

We see this column as a great opportunity to shape the multimedia community and raise awareness for multidisciplinary work, as well as neighboring communities. Looking forward to your input!

Cynthia and Jochen


Editor Biographies

Cynthia_Liem_2017Dr. Cynthia C. S. Liem is an Assistant Professor in the Multimedia Computing Group of Delft University of Technology, The Netherlands, and pianist of the Magma Duo. She initiated and co-coordinated the European research project PHENICX (2013-2016), focusing on technological enrichment of symphonic concert recordings with partners such as the Royal Concertgebouw Orchestra. Her research interests consider music and multimedia search and recommendation, and increasingly shift towards making people discover new interests and content which would not trivially be retrieved. Beyond her academic activities, Cynthia gained industrial experience at Bell Labs Netherlands, Philips Research and Google. She was a recipient of the Lucent Global Science and Google Anita Borg Europe Memorial scholarships, the Google European Doctoral Fellowship 2010 in Multimedia, and a finalist of the New Scientist Science Talent Award 2016 for young scientists committed to public outreach.


jochen_huberDr. Jochen Huber is a Senior User Experience Researcher at Synaptics. Previously, he was an SUTD-MIT postdoctoral fellow in the Fluid Interfaces Group at MIT Media Lab and the Augmented Human Lab at Singapore University of Technology and Design. He holds a Ph.D. in Computer Science and degrees in both Mathematics (Dipl.-Math.) and Computer Science (Dipl.-Inform.), all from Technische Universität Darmstadt, Germany. Jochen’s work is situated at the intersection of Human-Computer Interaction and Human Augmentation. He designs, implements and studies novel input technology in the areas of mobile, tangible & non-visual interaction, automotive UX and assistive augmentation. He has co-authored over 60 academic publications and regularly serves as program committee member in premier HCI and multimedia conferences. He was program co-chair of ACM TVX 2016 and Augmented Human 2015 and chaired tracks of ACM Multimedia, ACM Creativity and Cognition and ACM International Conference on Interface Surfaces and Spaces, as well as numerous workshops at ACM CHI and IUI. Further information can be found on his personal homepage:

Open Source Column – Introduction

Open source software is software that can be freely accessed, used, changed, and shared (in modified or unmodified form) by anyone” (cp. So open source software (OSS) is actually something that one or more people can work on, improve it, refine it, change it, adapt it and share or use it. Why would anyone support such a feature? Examples from the industry show that this is a valid approach for many software products. Prominent open source projects are in use worldwide on an everyday basis, including the Apache Web Server, the Linux Kernel, the GNU Compiler Collection, Samba, OpenSSL, and MySQL. For industry this means not only re-using components, and libraries, but also being able to fix them, adapt them to their needs and hire people who are already familiar with the tools. Business models based on open source software focus more on services than products and ensure the longevity of the software as even if companies vanish, the open source software is here to stay.

In academia open source provides a way to employ well-known methods as a base line or a starting point without having to re-invent the wheel by programming algorithms and methods all over again. This is especially popular in multimedia research, which would not be as agile and forward looking if it weren’t for OpenCV, ffmpeg, Caffe, and SciPy and NumPy, just to name a few.  In research the need for publishing source code and data along with the scientific publication to ensure reproducibility has been identified recently (cp. ACM Artifact Review and Badging, This of course includes stronger support for releasing software and data artifacts based on open licenses.

The SIGMM community has been very active in this regard, since ACM Intl. Conference on Multimedia hosts the Open Source Software Competition since 2004; this competition has attracted in the latest years an increasing number of submissions and, according to Google Scholar, two of the currently three top cited papers in the last 5 years of the conference were submitted to this competition. This year also the ACM Intl. Conference on Multimedia Retrieval has introduced an OSS track.

Our aim for SIGMM Records is to point out recent development, announce interesting releases, share insights from the community and actively support knowledge transfer from research to industry based on open source software and open data four times a year. If you are interested in writing for the open source column, or have something you would like to know more about in this area, please do not hesitate to contact the editors. Examples are articles on open source frameworks or projects like the Menpo projectthe Siva Suite, or the Yael library.

The SIGMM Records editors responsible for the open source are dedicated to the cause and have quite some history with open source in academia and industry.

avatar_Bertini_smallMarco Bertini ( is associate professor at the University of Florence and long term open source supporter, especially by having served as chair and co-chair of the open source software competition at ACM Intl. Conference on Multimedia.




Mathias LuxMathias Lux ( has participated in the very same challenge with several open source projects. He’s associate professor at Klagenfurt University and dedicated to open source in research and teaching and main contributor to several open source projects.

Introduction to the Opinion Column

Welcome to the SIGMM Community Discussion Column! In this very first edition we would like to introduce the column to the community, its objectives and main operative characteristics.

Given the exponential amount of multimedia data shared online and offline everyday, research in Multimedia is of unprecedented importance. We might be now facing a new era of our research field, and we would like the whole community to be involved in the improvement and evolution of our domain.

The column has two main goals. First, we will promote dialogue regarding topics of interests for the MM community, by providing tools for continuous discussion among the members of the multimedia community. Every quarter, we will discuss (usually) one topic via online tools. Topics will include “What is Multimedia, and what is the role of the Multimedia community in science?”; “Diversity and minorities in the community”; “The ACM code of ethics”; etc.

Second, we will monitor and summarize on-going discussions, and spread their results within and outside the community. Every edition of this column will then summarize the discussion, highlighting popular and non-popular opinions, agreed action points and future work.

To foster the discussion, we set up an online discussion forum to which all members of the multimedia community (expertise and seniority mixed) can participate: the Facebook MM Community Discussion group (follow this link: . For every edition of the column, we will choose an initial set of topics of high relevance for the community. We will include, for example, topics that have been previously discussed at ACM meetings (e.g., the code of ethics), or in related events (e.g., Diversity at MM Women lunch), or popular off-line discussions among MM researchers (e.g., review processes, vision of the scientific community…). In the first 15 days of the quarter, the members of the community will choose one topic from this short-list via an online poll shared through the MM Facebook group. We will then select the topic that received the higher number of votes as the subject for the quarterly discussion.

Volunteers or selected members of the MM group will start the discussion via Facebook posts on the group page. The discussion will be then open for a period of a month. All members of the community can participate by replying to posts or by directly posting on the group page, describing their point of view on the subject while being concise and clear. During this period, we will monitor and moderate (when needed) the discussion. At the end of the month, we will summarise the discussion by describing its evolution, exposing major and minor opinions, outlining highlights and lowlights. A final text with the summary and some relevant discussion extracts will be prepared and will appear in the SIGMM Records and in the Facebook “MM Community page”:

Hopefully, the community will benefit from this initiative by either reaching some consensus or by pointing out important topics that are not mature enough and require further exploration. In the long-term, we hope these process will make the community evolve through large consensus and bottom-up discussions.

Let’s contribute and foster research around topics of high interest for the community!

Xavi and Miriam

Xavier Almeda-PinedaDr. Xavier Alameda-Pineda (Xavi) is research scientist at INRIA. Xavi’s interdisciplinary background (Msc in Mathematics, Telecommunications and Computer Science) grounded him to pursue his PhD in Mathematics and Computer Science, and a further post-doc in the University of Trento. His research interests are signal processing, computer vision and machine learning for scene and behavior understanding using multimodal data. He is the winner of the best paper award of ACM MM 2015, the best student paper award at IEEE WASPAA 2015 and the best scientific paper award at IAPR ICPR 2016.



Mariam RediDr. Miriam Redi is a research scientist in the Social Dynamics team at Bell Labs Cambridge. Her research focuses on content-based social multimedia understanding and culture analytics. In particular, Miriam explores ways to automatically assess visual aesthetics, sentiment, and creativity and exploit the power of computer vision in the context of web, social media, and online communities. Previously, she was a postdoc in the Social Media group at Yahoo Labs Barcelona and a research scientist at Yahoo London. Miriam holds a PhD from the Multimedia group in EURECOM, Sophia Antipolis.

Interview Column – Introduction

The interviews in the SIGMM records aim to provide the community with the insights, visions, and views from outstanding researchers in multimedia. With the interviews we particularly try to find out what makes these researchers outstanding and also to a certain extend what is going on in their mind, what are their visions and what are their thoughts about current topics. Examples from the last issues include interviews with Judith Redi, Klara Nahrstedt, and Wallapak Tavanapong.

The interviewers are conducted via Skype or — even better — in person by meeting them at conferences or other community events. We aim to publish three to four interviews a year. If you have suggestions for who to interview, please feel free to contact one of the column editors, which are:

Michael Alexander Riegler is a scientific researcher at Simula Research Laboratory. He received his Master’s degree from Klagenfurt University with distinction and finished his PhD at the University of Oslo in two and a half years. His PhD thesis topic was efficient processing of medical multimedia workloads.
His research interests are medical multimedia data analysis and understanding, image processing, image retrieval, parallel processing, gamification and serious games, crowdsourcing, social computing and user intentions. Furthermore, he is involved in several initiatives like the MediaEval Benchmarking initiative for Multimedia Evaluation, which runs this year the Medico task (automatic analysis of colonoscopy videos,


Herman Engelbrecht is one of the directors at the MIH Electronic Media Laboratory at Stellenbosch University. He is a lecturer in Signal Processing at the Department of Electrical and Electronic Engineering. His responsibilities in the Electronic Media Laboratory are the following: Managing the immediate objectives and research activities of the Laboratory; regularly meeting with postgraduate researchers and their supervisors to assist in steering their research efforts towards the overall research goals of the Laboratory; ensuring that the Laboratory infrastructure is developed and maintained; managing interaction with external contractors and service providers; managing the capital expenditure of the Laboratory; and managing the University’s relationship with the post­graduate researchers – See more at:


Mathias Lux is associate professor at the Institute for Information Technology (ITEC) at Klagenfurt University. He is working on user intentions in multimedia retrieval and production and emergent semantics in social multimedia computing. In his scientific career he has (co-) authored more than 80 scientific publications, has served in multiple program committees and as reviewer of international conferences, journals and magazines, and has organized multiple scientific events. Mathias Lux is also well known for the development of the award winning and popular open source tools Caliph & Emir and LIRe ( for multimedia information retrieval. Dr. Mathias Lux received his M.S. in Mathematics 2004, his Ph.D. in Telematics 2006 from Graz University of Technology, both with distinction, and his Habilitation (venia docendi) from Klagenfurt University in 2013.


@sigmm on #SocialMedia

The new SIGMM Records team aims to extend the reach of relevant SIGMM-related news and events. It will also provide forums to stimulate discussion, interaction and collaboration between members of our community

The use of Social Media will be key to achieving the targeted mission. Initially, Twitter and Facebook will be used as the main Social Media channels for SIGMM. Youtube and LinkedIn will be used in a later stage.

Twitter (@sigmm) will be the main social media channel, for publishing information of interest in a variety of formats.

Facebook. It will include a Facebook page, ACM SIGMM, which will be used in a very similar manner than the Twitter account. In addition, a Facebook group will be created to have an interaction, discussion and collaboration forum.

The SIGMM and SIGMM Records websites will include Social Media icons, so the audience can share the contents on them via their personal Social Media channels.

Through the SIGMM Social Media and the personal communication channels, the Editors will encourage the community to contribute with interesting and relevant contents to be disseminated (e.g., outstanding contributions, Summer Schools, open positions, etc.).

The SIGMM Social Media channels will be particularly active during SIGMM sponsored events.

To promote community interaction, we recommend some policies for the creation and usage of the hashtags to be included in the publications related to the SIGMM conferences. They are summarized in the table below and can be accessed at this link. This will help the editors track contributions from the community and understand its impact.


Conference Hashtag

Include #ACM and #SIGMM?

MM #acmmm Yes
MMSYS #mmsys Yes
ICMR #icmr Yes

Apart from the SIGMM channels, many members of the community will contribute to publishing/sharing information of interest through their personal accounts (and ideally for their institutions’ ones), acting as Social Media reporters/advocates. The team includes: Christian Timmerer, Miriam Redi, Gwendal Simon, Michael Riegler, Wei Tsang Ooi, D. Ayman Shamma, and many others. The list is expected to grow, so please drop us a line if you are interested in joining us! Everybody is welcome to participate and contribute.

A further strategy to encourage the publication of information, and thus increase the chances to reach the community, increase knowledge and foster interaction, will consist of awarding those SIGMM members who provide the best posts and reports on Social Media for each SIGMM conference. The award will consist of a free registration to the next edition of the conference, and any SIGMM member is a candidate to get it. The awardees will be asked to provide a post-summary of the conference, which will be published on SIGMM Records. More details about the number of awards, their requirements and criteria can be found at this link.

We look forward to seeing you in our #SIGMM community! Follow us! 😉

photo_nial_murrayDr. Niall Murray ( is a lecturer and researcher with the Faculty of Engineering and Informatics and the Software Research Institute in the Athlone Institute of Technology (AIT), Ireland. He received his BE (Electronic and Computer Engineering) from National University of Ireland, Galway (2003), MEng (Computer and Communication Systems) from the University of Limerick (2004) and PhD in 2014. Since 2004, he has worked in R&D roles across a number of industries: telecommunications, finance, health and education. In 2014 he founded the Truly Immersive and Interactive Multimedia Experiences lab (TIIMEx). His research interests include immersive multimedia communication, multimedia synchronization, multisensory multimedia, quality of experience (QoE) and wearable sensor systems. In this context, TIIMEx builds and evaluates from a user perceived quality perspective, end-to-end communication systems and novel immersive and interactive applications.

xavier-giróXavier Giro-i-Nieto is an associate professor at the Universitat Politecnica de Catalunya (UPC). He graduated in Telecommuncations Engineering studies at ETSETB (UPC) in 2000, after completing his master thesis on image compression at the Vrije Universiteit in Brussels (VUB). He obtained his Phd on image retrieval in 2012, under the supervision by Professor Ferran Marqués from UPC and Professor Shih-Fu Chang from Columbia University. He was a visiting scholar during Summers 2008 to 2014 at the Digital Video and MultiMedia laboratory at Columbia University, in New York. He has served as area chair in ACM Multimedia 2016 and is currently a member of the editorial board of IEEE Transactions on Multimedia. His current research interests are focus on applying deep learning to multimodal applications, such as video analytics, eye gaze prediction and lifelogging.


LexingXieLexing Xie is Associate Professor in the Research School of Computer Science at the Australian National University, she leads the ANU Computational Media lab ( Her research interests are in machine learning, multimedia, social media. Of particular recent interest are stochastic time series models, neural network for sequences, and active learning, applied to diverse problems such as multimedia knowledge graphs, modeling popularity in social media, joint optimization and structured prediction problems, and social recommendation. Her research is supported from the US Air Force Office of Scientific Research, Data61, Data to Decisions CRC and the Australian Research Council. Lexing’s research has received six best student paper and best paper awards in ACM and IEEE conferences between 2002 and 2015. She is IEEE Circuits and Systems Society Distinguished Lecturer 2016-2017. She currently serves an associate editor of ACM Trans. MM, ACM TiiS and PeerJ Computer Science. Her service roles include the program and organizing committees of major multimedia, machine learning, web and social media conferences. She was research staff member at IBM T.J. Watson Research Center in New York from 2005 to 2010.

photo_mario_montagudDr. Mario Montagud (@mario_montagud) was born in Montitxelvo (Spain). He received a BsC in Telecommunications Engineering in 2011, an MsC degree in “Telecommunication Technologies, Systems and Networks” in 2012 and a PhD degree in Telecommunications (Cum Laude Distinction) in 2015, all of them at the Polytechnic University of Valencia (UPV). During his PhD degree and after completing it, he did 3 research stays (accumulating 18 months) at CWI (The National Research Institute for Mathematics and Computer Science in the Netherlands). He also has experience as a postdoc researcher at UPV. His topics of interest include Computer Networks, Interactive and Immersive Media, Synchronization, and QoE (Quality of Experience). Mario is (co-) author of over 50 scientific and teaching publications, and has contributed to standardization within the IETF (Internet Engineering Task Force). He is member of the Technical Committee of several international conferences (e.g., ACM MM, MMSYS and TVX), co-organizer of the international MediaSync Workshop series, and member of the Editorial Board of international journals. He is also lead editor of “MediaSync: Handbook on Multimedia Synchronization” (Springer, 2017) and Communication Embassador of ACM SIGCHI (Special Interest Group on Computer-Human Interaction). Webpage:

An interview with David Ayman Shamma

David Ayman Shamma was interviewed by Michael Riegler.

About David Ayman Shamma:

I am a Principal Investigator and Senior Scientist at Centrum Wiskunde & Informatica (CWI) where I lead a team looking at Social Computing, Internet of Things (IoT), and fashion. Formerly, I was Director of Research at Yahoo Labs where I ran the HCI Research Group and I was the scientific liaison to Flickr (where I co-founded the Data-science group there). Broadly speaking, I design and prototype systems for multimedia-mediated communication, as well as, develops targeted methods and metrics for understanding how people communicate online in small environments and at web scale. Additionally, I create media art installations that have been reviewed by The New York Times, International Herald Tribune, and Chicago Magazine and exhibited internationally, including Second City Chicago, the Berkeley Art Museum, SIGGRAPH ETECH, Chicago Improv Festival, and Wired NextFest/NextMusic.

I have a Ph.D. in Computer Science from the Intelligent Information Laboratory at Northwestern University and a B.S./M.S. from the Institute for Human and Machine Cognition at The University of West Florida. Before Yahoo!, I was an instructor at the Medill School of Journalism; I have also taught courses in Computer Science and Studio Art departments. Prior to receiving my Ph.D., I was a visiting research scientist for the Center for Mars Exploration at NASA Ames Research Center.

Michael Alexander Riegler: 

Michael is a scientific researcher at Simula Research Laboratory. He received his Master's degree from Klagenfurt University with distinction and finished his PhD at the University of Oslo in two and a half years. His PhD thesis topic was efficient processing of medical multimedia workloads.

His research interests are medical multimedia data analysis and understanding, image processing, image retrieval, parallel processing, gamification and serious games, crowdsourcing, social computing and user intentions. Furthermore, he is involved in several initiatives like the MediaEval Benchmarking initiative for Multimedia Evaluation, which runs this year the Medico task (automatic analysis of colonoscopy videos)footnote{}.



Describe your journey into computing from your youth up to the present. What foundational lessons did you learn from this journey? Why were you initially attracted to multimedia?

I’ve always been curious about solving problems.  Not so much the answer but actually I like to know how a problem can be broken down into parts, abstracted, and reasoned with—which often drives us to think about abstraction (is there a non-specific instance of this problem), theory (is there some known literature from the mathematical or social sciences that will help us frame what’s happening, and analogy (can we solve this because its structure is like another problem?).  My education included classes in psychology, philosophy, math, and engineering; eventually I realized Computer Science and specifically Artificial Intelligence embodied everything I was looking for: understanding people, modeling problems, and building new systems.

Interestingly enough, as an undergrad I took a job in an art department at the local state college as a technician; my job was to keep their Macs running with Adobe products. While I was there, I was allowed to audit studio art classes.  I began to see how artistic and creative processes were influenced by the tools we have—be it a 1:50 D-76 bath with fiber based paper in a darkroom or masking layers in Photoshop.  This connection between creative and constructive processes carried into my work at NASA’s Center for Mars Exploration where I worked on diagrammatic knowledge tools and then into my Ph.D on community driven Multimedia systems. It was around this time that I saw ACM Multimedia 2004 had a call for technical papers in the Interactive Arts.  Since then I’ve been active in the community, mostly focused on the Arts track but as my work began to include social computing in 2009 I started to think about hybrid social-visual systems.  In 2013, I was the Technical Program Co-chair, and  we started to look critically at the broad technical areas, the review process, and started some inclusion and diversity initiatives.

The main foundational lesson for me is to continue asking the right questions, even if you’re branching stemming out of some smaller, under-represented area or track.  In many cases, you’ll find new exciting research questions.  That said, I found I need to couple this with a personal understanding of the outside domain; only then can a truly functional hybrid system work; it’s not enough to look at divergent sources as just a big bag of the same data—pixels, tags, comments, clicks, they all carry an explicit or tacit semantic implication; respect that.

Tell us more about your vision and objectives behind your current roles? What do you hope to accomplish and how will you bring this about?

My Ph.D. dealt with social computing and community semantics: the objects in a photo carry a broader semantic conversation context of the online site sharing that photo. When I graduated, I joined an industry research lab. I spent 10 years there through a few organizational shifts. In my last 4 years there I founded the HCI Research group with a charter on investigating what our research meant to people.  My group’s research spanned across several domains: multimedia, computer vision, information visualisation, social computing, ethnography, and physical computing; this gave me deep perspective across many areas.  Personally, understanding how things are connected and what those connections meant became a focus of my research.  Data is created for a reason and structured link data can carry a tacit semantic that helps us understand people and tasks in the world. Lately, I’ve been thinking about physical spaces where people interact and create content. What sort of camera do you have on you? How does it change your practice of photography? What sensors might be in your clothes or in the world? These questions have been part of my current focus at Centrum Wiskunde & Informatica.  We’ve been working with a Dutch fashion designer in Amsterdam investigating how fashion and technology can be used in various situational tasks and environments through instrumenting clothing and creating structured data to understand people’s activity and flocking.  What’s exciting beyond the research is connecting goals of a fashion designer and computer science research; it’s an exciting bridge to create. Once all the fabric and sensors are accounted for, it becomes a social computing problem again…that’s where I like to live, creating bridges.

Can you profile your current research, its challenges, opportunities, and implications?

Now more than ever, we are a function of our own data.  Data drives much of computing today, be it data science or machine learning driven.  I like to emphasize how we collect and label data as it has direct consequences on what we can analyze, predict, and create.  For many, this means harvesting data for use.  For me, it means understanding how people act, behave, and communicate through those signals.  For example, at CSCW 2016 I published some work where we looked at the browsing behavior of millions of people on Flickr which we matched into a relatively small set of editorial judgements to surface high quality geo-tagged weather photos.  The alternate approach, which they did attempt at first, was to just train a neural net to find photos of storms or lightning or sunny days. While that’s recall optimistic, the editors were quick to point out everyone takes crummy photos of lightning so conventional approaches didn’t work. My research took a different approach, instead of training generic aesthetics into the system, we modeled a community-centric approach. Using the tacit aesthetic judgments from the Flickr community, we couple the structured link data with CNN to surface high quality photos.It’s not a case of active learning, in fact, it’s a supervised model where that supervision comes from implicit community actions and explicit editorial judgements.  We have some similar work to be published at CHI 2017 later this year where we were surfacing deviant/abuse images on Tumblr; a task that was even harder as the image may not be representative of such behavior, so the social-visual system was a necessity.

Taking you interest in AI and fashion into account, I am wondering what you generally think about the current hype on deep learning and in context to the fashion research. Do you think AI based systems will ever be able to understand context which is an important factor in fashion?

You know, I remember when DeepBlue beat Kasparov back in the 90s and while it was great, I didn’t think much of it as an AI victory (nor did IBM if I recall). The recent win by AlphaGo  is different and something amazing.  I don’t think it’s hype as things work and work well—however we still face many of the same limitations. With regard to fashion, it’s a great time to be excited about AI. I mean we see solutions to many of the older research and fashion issues (like point your camera at someone and find the clothes they are wearing to buy online) but I think smart electronics, AI and fashion is the new sweet spot.  There have been many advancements in textiles like pixel to stitch knitting and small electronics make for a fun new playground for AI, sensors, and IoT. We’re just now starting to explore how clothes and fashion can sense, detect, and respond to people and to the environment.  I get what you’re saying by AI hype and that’s another discussion, but right now I’m excited to build the next generation of wearable tech.

How generalizable is data from sources like Flickr? For example, are your insights on Flickr also valid in non-western countries?

I certainly have had reviewers ask me how generalizable research is because it used Flickr data or Yelp data or Twitter data or whatever; I see it as the hallmark of a bad review.  On one hand, there is no sense to believe that any slice of a specific social media dataset should be generalizable. People act differently on Flickr than they do on Instagram or on Snapchat.  The application/website dictates an interaction, and really that’s what we are studying—as a research community we need to move beyond just studying naive pixels and examine what it’s doing.  Ok, if you’re just looking for indoor vs outdoor shots in Yelp photos, then maybe.  But have you ever tried to find a restaurant in Japan versus Italy versus America? Store fronts look completely different. Internationalization is rarely studied by multimedia researchers and I think multimedia mediated cultural communication is more important than website generalization. 

I think it would be very interesting if you could also answer about what do you think is the role or responsibility of multimedia researchers in context of all the fake news/alternative new debate. Do you think we should focus on it?

In 2009, I began publishing work on doing multimedia summarization from using aggregated Twitter feeds from the Obama McCain debate. Back then, people really really wanted to tweet and it was a narrow interest community.  A few years later, during the Egyptian of 2011, I ran my methods against the Twitter firehose and saw some mis-information (like a bus on fire that was reported which was actually from another country years ago). Delayed information is a systemic problem, where something happened hours or days ago and it gets propagated as fresh information. I don’t believe we had widespread purposeful propagation of misinformation (least not like what we see in today’s world). So today, we have misplaced information, delayed information, fake/alt information and the field of multimedia is ripe to handle this problem. For example, take a fake news story with a photo.  Has the photo been altered to retell a story? Is the photo from a different news story? Are there clusters of other news sources that contradict? There’s a whole world of multimedia problems, many of which large companies are struggling to get a grip on, in finding fake news, but the hard problem will be the explanation. Identifying fake is half of the problem, explaining to people why it’s fake is the other.  News, now more than ever, is highly visual (photos/video) and social; dealing with a plurality of signals is the core of multimedia research.

In this context do you think that fake news are a problem of social network platforms or should newspapers also be investigated?

Can you name a news source that does not rely on social network platforms?  Conversely, have you seen Twitter deliver news?  Their streaming video with tweet interfaces speaks to research we did 10 years back.  I don’t think we can decouple the two, but we’ve seen how social media sites tend to amplify things by propagating clickable content.  So for a news agency, it starts with the title and snippet of a story and it’s related photo.  But then there’s also the face news agencies gaming the social sites.  There’s been some great work from UW cracking the problem, but I think it’s time for multimedia research to step up here as visual content always carries more engagement.

How would you describe the role of women especially in the field of multimedia?

Diversity of all types—gender, nationality, race—is critically important to the future of multimedia research.  When I was on the TPC for Multimedia in 2013 I did some data analytics of the past several years of the conference series; the gender stats were abysmal.  We worked hard to increase the gender diversity in the area chairs and in the conference.  To the former, following some advice from Maria Klawe I heard in a lecture maybe 10 years prior, we pushed on topic diversity for the conference.  The idea here is legacy areas can carry legacy diversity problems; so newer areas (social computing, affect, crowdsourcing, music, etc.) are more likely to have better gender leadership ratios.  It was the correct approach and we doubled the number of women in leadership roles in the ACs but still there was much room to grow.  We coupled this with finding corporate support for a womens & diversity lunch—a practice that I’m happy the conference has continued.  Diversity brings an expanded set of ideas, methods, and approaches in research.  We’ve come a ways since 2013 and I’m very happy to see the 2017 program also similarly expand its diversity but we have a very long way to go to catch up to some other SIGs.

How would you describe your top innovative achievements in terms of the problems you were trying to solve, your solutions, and the impact it has today and into the future?

Impact happens where research connects to people. For me, it’s usually revolves around creative practice in multimedia.  How online broadcasters DJing house and hip-hop connect with their audience online and how does it differ from when they are in a club?  If you have an iPad and an iPhone and want to take a picture, when do you reach for the iPad to take the photo?  If you’re posting a photo to Instagram, what filter will you use to enhance the photo?  The most valuable research include method, system, and people. Let’s take that last one as an example.  One could build a prediction model to automatically apply filters based on a training set of what got likes and the types of transformation but would that change people’s creative practice?  We found people enjoyed the process of selection (despite usually picking the same filter over and over again). So the question becomes how do we optimize the experience without hindering it.

In my time as Director of Research at Flickr, we enjoyed looking at the full stack: data, machine learning, engineering, visualization, and all the components that affect people and media experience. We knew there was an advantage to easily dive into 13 billion photos and 100 million people but felt, even inside a corporation, there should be more open data for all researchers.  This lead to the creation of the YFCC100M ( 100 million Creative Commons images in a single dataset for open research.  Beyond the data itself, we found ourselves reviewing small technical Creative Commons details to ensure legal and privacy concerns were met but still opening the data for wide academic and corporate use.  The impact has been incredible.  Outside of the multimedia and computer vision communities, in the first year since release we’ve seen published work using our dataset from the HCI, Data Science, and Visualization communities and even were featured by the Library of Congress.  All driven by the idea to share data we felt was too locked up; fortunately Flickr, Creative Commons, and Yahoo Legal shared our vision and we’ll look to see more impact to come.

Over your distinguished career, what are your top lessons you want to share with the audience?

Really nothing happens in a vacuum. Partnerships and collaborations make things interesting as they make one malleable and push one to think full stack. This is shaped by my 10 years in an industry lab, connecting with academia through hosting interns, collaborative work, and sponsorships really fueled my work.  I’d say still a good 70% of our work was internally driven but that 30% outreach was really valuable.  Now at an academic lab, I’m doing the reverse.  We partnered with a fashion designer to keep connected to their goals and their problems while we think about the wearable and social Internet of Things.  It’s great to think without constraints but really adapting to the real world and thinking end-to-end is a critical driver for me.  At the end of the day, I want to use it. Build what you love and make it real.  This was easier when I was at a corporation, but there are still plenty of ways to collaborate depending on scope. And really think full stack in system and evaluation.  You’ll find yourself evaluating your work on multiple levels from F-1 metrics to Likert scale surveys. What we do is develop new systems and methods but work with real impact will affect applications and design. My favorite research (of mine or others) always critically engages with the bigger picture.

Since you are active researcher in both US and in Europe, what do you think are the main differences? What is positive and what is negative? And what could we learn from each other?

I did a semester sabbatical at the Keio-NUS CUTE center in Singapore a few years back, so it’s not my first dive outside of industry.  I’m reminded in La Nausée Sartre wrote that anyplace you live feels the same after two weeks; the idea being once you get back to job and life, it becomes the same again. I can’t say I quite agree in this case. The move from an industry lab in California to an academic one in the Netherlands was a bit of a culture and cadence shift.  After almost a year, it’s clear to me that it’s the pace as we share research culture.  We tend to sprint constantly in industry and the sprinting seems to come and go in the academic. Each style has it’s pros and cons; there’s been times I wanted everyone to be running and times I was happy I could dive into something because we weren’t running. I don’t think it’s something to enumerate positive and negative points, just a different state of being.  I’m not sure why I gave you an existential response either.

MPEG Column: 118th MPEG Meeting

The original blog post can be found at the Bitmovin Techblog and has been updated here to focus on and highlight research aspects.

The entire MPEG press release can be found here comprising the following topics:

  • Coded Representation of Immersive Media (MPEG-I): new work item approved and call for test data issued
  • Common Media Application Format (CMAF): FDIS approved
  • Beyond High Efficiency Video Coding (HEVC): call for evidence for “beyond HEVC” and verification tests for screen content coding extensions of HEVC

Coded Representation of Immersive Media (MPEG-I)

MPEG started to work on the new work item referred to as ISO/IEC 23090 with the “nickname” MPEG-I targeting future immersive applications. The goal of this new standard is to enable various forms of audio-visual immersion including panoramic video with 2D and 3D audio with various degrees of true 3D visual perception. It currently comprises five parts: (pt. 1) a technical report describing the scope of this new standard and a set of use cases and applications; (pt. 2) an application format for omnidirectional media (aka OMAF) to address the urgent need of the industry for a standard is this area; (pt. 3) immersive video which is a kind of placeholder for the successor of HEVC (if at all); (pt. 4) immersive audio as a placeholder for the successor of 3D audio (if at all); and (pt. 5) for point cloud compression. The point cloud compression standard targets lossy compression for point clouds in real-time communication, six Degrees of Freedom (6 DoF) virtual reality, and the dynamic mapping for autonomous driving, cultural heritage applications, etc. Part 2 is related to OMAF which I’ve discussed in my previous blog post.

MPEG also established an Ad-hoc Group (AhG) on immersive Media quality evaluation with the following mandates: 1. Produce a document on VR QoE requirements; 2. Collect test material with immersive video and audio signals; 3. Study existing methods to assess human perception and reaction to VR stimuli; 4. Develop test methodology for immersive media, including simultaneous video and audio; 5. Study VR experience metrics and their measurability in VR services and devices. AhGs are open to everybody and mostly discussed using mailing lists (join here Interestingly, a Joint Qualinet-VQEG team on Immersive Media (JQVIM) has been recently established with similar goals and also the VR Industry Forum (VRIF) has issued a call for VR360 content. It seems there’s a strong need for a dataset similar to the one we have created for MPEG-DASH long time ago.

The JQVIM has been created as part of the QUALINET task force on “Immersive Media Experiences (IMEx)” which aims at providing end users the sensation of being part of the particular media which shall result in a worthwhile, informative user and quality of experience. The main goals are providing datasets and tools (hardware/software), subjective quality evaluations, field studies, cross- validation including a strong theoretical foundation relevant along the empirical databases and tools which hopefully results in a framework, methodology, and best practices for immersive media experiences.

Common Media Application Format (CMAF)

The Final Draft International Standard (FDIS) has been issued at the 118th MPEG meeting which concludes the formal technical development process of the standard. At this point in time national bodies can only vote Yes|No and editorial changes are allowed (if any) before the International Standard (IS) becomes available. The goal of CMAF is to define a single format for the transport and storage of segmented media including audio/video formats, subtitles, and encryption — it is derived from the ISO Base Media File Format (ISOBMFF). As it’s a combination of various MPEG standard it’s referred to as an Application Format (AS) which mainly takes existing formats/standards and glues them together for a specific target application. The CMAF standard clearly targets dynamic adaptive streaming (over — but not limited to — HTTP) but focusing on the media format only and excluding the manifest format. Thus, the CMAF standard shall be compatible with other formats such as MPEG-DASH and HLS. In fact, HLS has been extended already some time ago to support ‘fragmented MP4’ which we have demonstrated also and it has been interpreted as a first step towards the harmonization of MPEG-DASH and HLS; at least on the segment format. The delivery of CMAF contents with DASH will be described in part 7 of MPEG-DASH that basically comprises a mapping of CMAF concepts to DASH terms.

From a research perspective, it would be interesting to explore how certain CMAF concepts are able to address current industry needs, specifically in the context of low-latency streaming which has been demonstrated recently.

Beyond HEVC…

The preliminary call for evidence (CfE) on video compression with capability beyond HEVC has been issued and is addressed to interested parties that have technology providing better compression capability than the existing standard, either for conventional video material, or for other domains such as HDR/WCG or 360-degree (“VR”) video. Test cases are defined for SDR, HDR, and 360-degree content. This call has been made jointly by ISO/IEC MPEG and ITU-T SG16/Q6 (VCEG). The evaluation of the responses is scheduled for July 2017 and depending on the outcome of the CfE, the parent bodies of the Joint Video Exploration Team (JVET) of MPEG and VCEG collaboration intend to issue a Draft Call for Proposals by the end of the July meeting.

Finally, verification tests have been conducted for the Screen Content Coding (SCC) extensions to HEVC showing exceptional performance. Screen content is video containing a significant proportion of rendered (moving or static) graphics, text, or animation rather than, or in addition to, camera-captured video scenes. For scenes containing a substantial amount of text and graphics, the tests showed a major benefit in compression capability for the new extensions over both the Advanced Video Coding standard and the previous version of the newer HEVC standard without the new SCC features.

The question whether and how new codecs like (beyond) HEVC competes with AV1 is subject to research and development. It has been discussed also in the scientific literature but lacks of vendor neutral comparison which is difficult to achieve and not to compare apples with oranges (due to the high number of different coding tools and parameters). An important aspect which always needs to be considered is one typically compares specific implementations of a coding format and not the standard as the encoding is usually not defined, only the bitstream syntax that implicitly defines the decoder.

Publicly available documents from the 118th MPEG meeting can be found here (scroll down to the end of the page). The next MPEG meeting will be held in Torino, Italy, July 17-21, 2017. Feel free to contact us for any questions or comments.

JPEG Column: 75th JPEG Meeting in Sydney, Australia

The 75th JPEG meeting was held at National Standards Australia in Sydney, Australia, from 26 to 31 March. Multiples activities have been ensued, pursuing the development of new standards that meet the current requirements and challenges on imaging technology. JPEG is continuously trying to provide new reliable solutions for different image applications. The 75th JPEG meeting featured mainly the following highlights:

  • JPEG issues a Call for Proposals on Privacy & Security;Image may contain: 3 people, people sitting, screen and indoor
  • New draft Call for Proposal for a Part 15 of JPEG 2000 standard on High Throughput coding;
  • JPEG Pleno defines methodologies for proposals evaluation;
  • A test model for the upcoming JPEG XS standard was created;
  • A new standardisation effort on Next generation Image Formats was initiated.

In the following an overview of the main JPEG activities at the 75th meeting is given.

JPEG Privacy & Security – JPEG Privacy & Security is a work item (ISO/IEC 19566-4) aiming at developing a standard for providing technical solutions which can ensure privacy, maintaining data integrity, and protecting intellectual property rights (IPR). JPEG Privacy & Security is exploring ways on how to design and implement the necessary features without significantly impacting coding performance while ensuring scalability, interoperability, and forward & backward compatibility with current JPEG standard frameworks.
Since the JPEG committee intends to interact closely with actors in this domain, public workshops on JPEG Privacy & Security were organised in previous JPEG meetings. The first workshop was organized on October 13, 2015 during the JPEG meeting in Brussels, Belgium. The second workshop was organized on February 23, 2016 during the JPEG meeting in La Jolla, CA, USA. Following the great success of these workshops, a third and final workshop was organized on October 18, 2016 during the JPEG meeting in Chengdu, China. These workshops targeted on understanding industry, user, and policy needs in terms of technology and supported functionalities. The proceedings of these workshops are published on the Privacy and Security page of JPEG website at under Systems section.
The JPEG Committee released a Call for Proposals that invites contributions on adding new capabilities for protection and authenticity features for the JPEG family of standards. Interested parties and content providers are encouraged to participate in this standardization activity and submit proposals. The deadline for an expression of interest and submissions of proposals has been set to October 6th, 2017, as detailed in the Call for Proposals. The Call for Proposals on JPEG Privacy & Security is publicly available on the JPEG website,

High Throughput JPEG 2000 – The JPEG committee is working towards the creation of a new Part 15 to the JPEG 2000 suite of standards, known as High Throughput JPEG 2000 (HTJ2K). The goal of this project is to identify and standardize an alternate block coding algorithm that can be used as a drop-in replacement for the algorithm defined in JPEG 2000 Part-1. Based on existing evidence, it is believed that large increases in encoding and decoding throughput (e.g., 10X or beyond) should be possible on modern software platforms, subject to small sacrifices in coding efficiency. An important focus of this activity is inter-operability with existing systems and content repositories. In order to ensure this, the alternate block coding algorithm that will be the subject of this new Part of the standard should support mathematically lossless transcoding between HTJ2K and JPEG 2000 Part-1 codestreams at the code-block level. A draft Call for Proposals (CfP) on HTJ2K has been issued for public comment, and is available on JPEG web-site.

JPEG Pleno – The responses to the JPEG Pleno Call for Proposals on Light Field Coding will be evaluated at the July JPEG meeting in Torino. During JPEG 75th meetings has been defined the quality assessment procedure for this highly challenging type of large volume data. In addition to light fields, JPEG Pleno is also addressing point cloud and holographic data. Currently, the committee is undertaking in-depth studies to prepare standardization efforts on coding technologies for these image data types, encompassing the collection of use cases and requirements, but also investigations towards accurate and appropriate quality assessment procedures for associated representation and coding technologies. JPEG committee is probing for input from the involved industrial and academic communities.

JPEG XS – This project aims at the standardization of a visually lossless low-latency lightweight compression scheme that can be used as a mezzanine codec for the broadcast industry and Pro-AV markets. Targeted use cases are professional video links, IP transport, Ethernet transport, real-time video storage, video memory buffers, and omnidirectional video capture and rendering. After a Call for Proposal and the assessment of the submitted technologies, a test model for the upcoming JPEG XS standard was created and results of core experiments have been reviewed during the 75th JPEG meeting in Sydney. More core experiments are on their way to further improve the final standard: JPEG committee therefore invites interested parties – in particular coding experts, codec providers, system integrators and potential users of the foreseen solutions – to contribute to the further specification process.

Next generation Image Formats – The JPEG Committee is exploring a new activity, which aims to develop an image compression format that demonstrates higher compression efficiency at equivalent subjective quality of currently available formats, and that supports features for both low-end and high-end use cases.  On the low end, the new format addresses image-rich user interfaces and web pages over bandwidth-constrained connections. On the high end, it targets efficient compression for high-quality images, including high bit depth, wide color gamut and high dynamic range imagery.

Final Quote

“JPEG is committed to accommodate reliable and flexible security tools for JPEG file formats without compromising legacy usage of our standards said Prof. Touradj Ebrahimi, the Convener of the JPEG committee.

About JPEG

JPEG-signatureThe Joint Photographic Experts Group (JPEG) is a Working Group of ISO/IEC, the International Organisation for Standardization / International Electrotechnical Commission, (ISO/IEC JTC 1/SC 29/WG 1) and of the Interna
tional Telecommunication Union (ITU-T SG16), responsible for the popular JBIG, JPEG, JPEG 2000, JPEG XR, JPSearch and more recently, the JPEG XT, JPEG XS, JPEG Systems and JPEG Pleno families of imaging standards.

The JPEG group meets nominally three times a year, in Europe, North America and Asia. The latest 75th    meeting was held on March 26-31, 2017, in Sydney, Australia. The next (76th) JPEG Meeting will be held on July 15-21, 2017, in Torino, Italy.

More information about JPEG and its work is available at or by contacting Antonio Pinheiro ( or Frederik Temmermans ( of the JPEG Communication Subgroup.

If you would like to stay posted on JPEG activities, please subscribe to the jpeg-news mailing list on  Moreover, you can follow JPEG twitter account on

Future JPEG meetings are planned as follows:

  • No. 76, Torino, IT, 17 – 21 July, 2017
  • No. 77, Macau, CN, 23 – 27 October 2017

Report from MMM 2017

Date: 4.-6. January 2017
Place: Reykjavik University, Reykjavik, Iceland
Reporters: Björn Þór Jónsson (Reykjavik University), Cathal Gurrin (Dublin City University) -MMM2017 General Chairs 

MMM 2017 — 23rd International Conference on MultiMedia Modeling

MMM is a leading international conference for researchers and industry practitioners for sharing new ideas, original research results and practical development experiences from all MMM related areas. The 23rd edition of MMM took place on January 4-6 of 2017, on the modern campus of Reykjavik University. In this short report, we outline the major aspects of the conference, including: technical program; best paper session; video browser showdown; demonstrations; keynotes; special sessions; and social events. We end by acknowledging the contributions of the many excellent colleagues who helped us organize the conference. For more details, please refer to the MMM 2017 web site.

Technical Program

The MMM conference calls for research papers reporting original investigation results and demonstrations in all areas related to multimedia modeling technologies and applications. Special sessions were also held that focused on addressing new challenges for the multimedia community.

This year, 149 regular full paper submissions were received, of which 36 were accepted for oral presentation and 33 for poster presentation, for a 46% acceptance ratio. Overall, MMM received 198 submissions for all tracks, and accepted 107 for oral and poster presentation, for a total of 54% acceptance rate. For more details, please refer to the table below.

MMM2017 Submissions and Acceptance Rates

MMM2017 Submissions and Acceptance Rates

Best Paper Session

Four best paper candidates were selected for the best paper session, which was a plenary session at the start of the conference.

The best paper, by unanimous decision, was “On the Exploration of Convolutional Fusion Networks for Visual Recognition” by Yu Liu, Yanming Guo, and Michael S. Lew. In this paper, the authors propose an efficient multi-scale fusion architecture, called convolutional fusion networks (CFN), which can generate the side branches from multi-scale intermediate layers while consuming few parameters.

Phoebe Chen, Laurent Amsaleg and Shin’ichi Satoh (left) present the Best Paper Award to Yu Liu and Yanming Guo (right).

Phoebe Chen, Laurent Amsaleg and Shin’ichi Satoh (left) present the Best Paper Award to Yu Liu and Yanming Guo (right).

The best student paper, partially chosen due to the excellent presentation of the work, was “Cross-modal Recipe Retrieval: How to Cook This Dish?” by Jingjing Chen, Lei Pang, and Chong-Wah Ngo. In this work, the problem of sharing food pictures from the viewpoint of cross-modality analysis was explored. Given a large number of image and recipe pairs acquired from the Internet, a joint space is learnt to locally capture the ingredient correspondence from images and recipes.

Phoebe Chen, Laurent Amsaleg and Shin’ichi Satoh (left) present the Best Student Paper Award to Jingjing Chen and Chong-Wah Ngo (right).

Phoebe Chen, Shin’ichi Satoh and Laurent Amsaleg (left) present the Best Student Paper Award to Jingjing Chen and Chong-Wah Ngo (right).

The two runners-up were “Spatio-temporal VLAD Encoding for Human Action Recognition in Videos” by Ionut Cosmin Duta, Bogdan Ionescu, Kiyoharu Aizawa, and Nicu Sebe, and “A Framework of Privacy-Preserving Image Recognition for Image-Based Information Services” by Kojiro Fujii, Kazuaki Nakamura, Naoko Nitta, and Noboru Babaguchi.

Video Browser Showdown

The Video Browser Showdown (VBS) is an annual live video search competition, which has been organized as a special session at MMM conferences since 2012. In VBS, researchers evaluate and demonstrate the efficiency of their exploratory video retrieval tools on a shared data set in front of the audience. The participating teams start with a short presentation of their system and then perform several video retrieval tasks with a moderately large video collection (about 600 hours of video content). This year, seven teams registered for VBS, although one team could not compete for personal and technical reasons. For the first time in 2017, live judging was included, in which a panel of expert judges made decisions in real-time about the accuracy of the submissions for ⅓ of the tasks.

Teams and spectators in the Video Browser Showdown.

Teams and spectators in the Video Browser Showdown.

On the social side, two changes were also made from previous conferences. First, VBS was held in a plenary session, to avoid conflicts with other schedule items. Second, the conference reception was held at VBS, which meant that attendees had extra incentives to attend VBS, namely food and drink. And third, Alan Smeaton served as “color commentator” during the competition, interviewing the organizers and participants, and helping explain to the audience what was going on. All of these changes worked well, and contributed to a very well attended VBS session.

The winners of VBS 2017, after a very even and exciting competition, were Luca Rossetto, Ivan Giangreco, Claudiu Tanase, Heiko Schuldt, Stephane Dupont and Omar Seddati, with their IMOTION system.

The winners of VBS 2017, after a very even and exciting competition, were Luca Rossetto, Ivan Giangreco, Claudiu Tanase, Heiko Schuldt, Stephane Dupont and Omar Seddati, with their IMOTION system.


Five demonstrations were presented at MMM. As in previous years, the best demonstration was selected using both a popular vote and a selection committee. And, as in previous years, both methods produced the same winner, which was: “DeepStyleCam: A Real-time Style Transfer App on iOS” by Ryosuke Tanno, Shin Matsuo, Wataru Shimoda, and Keiji Yanai.

The winners of the Best Demonstration competition hard at work presenting their system.

The winners of the Best Demonstration competition hard at work presenting their system.


The first keynote, held in the first session of the conference, was “Multimedia Analytics: From Data to Insight” by Marcel Worring, University of Amsterdam, Netherlands. He reported on a novel multimedia analytics model based on an extensive survey of over eight hundred papers. In the analytics model, the need for semantic navigation of the collection is emphasized and multimedia analytics tasks are placed on an exploration-search axis. Categorization is then proposed as a suitable umbrella task for realizing the exploration-search axis in the model. In the end, he considered the scalability of the model to collections of 100 million images, moving towards methods which truly support interactive insight gain in huge collections.

Björn Þór Jónsson introduces the first keynote speaker, Marcel Worring (right).

Björn Þór Jónsson introduces the first keynote speaker, Marcel Worring (right).

The second keynote, held in the last session of the conference, was “Creating Future Values in Information Access Research through NTCIR” by Noriko Kando, National Institute of Informatics, Japan. She reported on NTCIR (NII Testbeds and Community for Information access Research), which is a series of evaluation workshops designed to enhance the research in information access technologies, such as information retrieval, question answering, and summarization using East-Asian languages, by providing infrastructures for research and evaluation. Prof Kando provided motivations for the participation in such benchmarking activities and she highlighted the range of scientific tasks and challenges that have been explored at NTCIR over the past twenty years. She ended with ideas for the future direction of NTCIR.


Noriko Kando presents the second MMM keynote.

Special Sessions

During the conference, four special sessions were held. Special sessions are mini-venues, each focusing on one state-of-the-art research direction within the multimedia field. The sessions are proposed and chaired by international researchers, who also manage the review process, in coordination with the Program Committee Chairs. This year’s sessions were:
– “Social Media Retrieval and Recommendation” organized by Liqiang Nie, Yan Yan, and Benoit Huet;
– “Modeling Multimedia Behaviors” organized by Peng Wang, Frank Hopfgartner, and Liang Bai;
– “Multimedia Computing for Intelligent Life” organized by Zhineng Chen, Wei Zhang, Ting Yao, Kai-Lung Hua, and Wen-Huang Cheng; and
– “Multimedia and Multimodal Interaction for Health and Basic Care Applications” organized by Stefanos Vrochidis, Leo Wanner, Elisabeth André, Klaus Schoeffmann.

Social Events

This year, there were two main social events at MMM 2017: a welcome reception at the Video Browser Showdown, as discussed above, and the conference banquet. Optional tours then allowed participants to further enjoy their stay on the unique and beautiful island.

The conference banquet was held in two parts. First, we visited the exotic Blue Lagoon, which is widely recognised as one of the modern wonders of the world and one of the most popular tourist destinations in Iceland. MMM participants had the option of bathing for two hours in this extraordinary spa, and applying the healing silica mud to their skin, before heading back for the banquet in Reykjavík.

The banquet itself was then held at the Harpa Reykjavik Concert Hall and Conference Centre in downtown Reykjavík. Harpa is one of Reykjavik‘s most recent, yet greatest and most distinguished landmarks. It is a cultural and social centre in the heart of the city and features stunning views of the surrounding mountains and the North Atlantic Ocean.

Harpa, the venue of the conference banquet.

Harpa, the venue of the conference banquet.

During the banquet, Steering Committee Chair Phoebe Chen gave a historical overview of the MMM conferences and announced the venues for MMM 2018 (Bangkok, Thailand) and MMM 2019 (Thessaloniki, Greece), before awards for the best contributions were presented. Finally, participants were entertained by a small choir, and were even asked to participate in singing a traditional Icelandic folk song.

MMM 2018 will be held at Chulalongkorn University in Bangkok, Thailand.  See

MMM 2018 will be held at Chulalongkorn University in Bangkok, Thailand. See


There are many people who deserve appreciation for their invaluable contributions to MMM 2017. First and foremost, we would like to thank our Program Committee Chairs, Laurent Amsaleg and Shin’ichi Satoh, who did excellent work in organizing the review process and helping us with the organization of the conference; indeed they are still hard at work with an MTAP special issue for selected papers from the conference. The Proceedings Chair, Gylfi Þór Guðmundsson, and Local Organization Chair, Marta Kristín Lárusdóttir, were also tirelessly involved in the conference organization and deserve much gratitude.

Other conference officers contributed to the organization and deserve thanks: Frank Hopfgartner and Esra Acar (demonstration chairs); Klaus Schöffmann, Werner Bailer and Jakub Lokoč (VBS Chairs); Yantao Zhang and Tao Mei (Sponsorship Chairs); all the Special Session Chairs listed above; the 150 strong Program Committee, who did an excellent job with the reviews; and the MMM Steering Committee, for entrusting us with the organization of MMM 2017.

Finally, we would like to thank our student volunteers (Atli Freyr Einarsson, Bjarni Kristján Leifsson, Björgvin Birkir Björgvinsson, Caroline Butschek, Freysteinn Alfreðsson, Hanna Ragnarsdóttir, Harpa Guðjónsdóttir), our hosts at Reykjavík University (in particular Arnar Egilsson, Aðalsteinn Hjálmarsson, Jón Ingi Hjálmarsson and Þórunn Hilda Jónasdóttir), the CP Reykjavik conference service, and all others who helped make the conference a success.

Posting about SIGMM on Social Media

In Social Media, a common and effective mechanism to associate the publications about a specific thread, topic or event is to use hashtags. Therefore, the Social Media Editors believe in the convenience of recommending standards or basic rules for the creation and usage of the hashtags to be included in the publications related to the SIGMM conferences.

In this context, a common doubt is whether to include the ACM word and the year in the hashtags for conferences. Regarding the year, our recommendation is to not include it, as the date is available for the publications themselves and, this way, a single hashtag can be used to gather all the publications for all the editions of a specific SIGMM conference. Regarding the ACM word, our recommendation is to include it in the hashtag only if the conference acronym contains less than four letters (i.e., #acmmm, #acmtvx) and otherwise not (i.e., #mmsys, #icmr). Although consistency is important, not including ACM for MM (and for TVX) is clearly not a good identifier, and including it for MMSYS and ICMR results in a too long hashtag. Indeed, the #acmmmsys and #acmicmr hashtags have not been used before, contrarily to the wide use of #acmmm (and also of #acmtvx). Therefore, our recommendations for the usage and inclusion of hashtags can be summarized as:

Conference Hashtag

Include #ACM and #SIGMM?

MM #acmmm Yes
MMSYS #mmsys Yes
ICMR #icmr Yes



Awarding the Best Social Media Reporters

The SIGMM Records team has adopted a new strategy to encourage the publication of information, and thus increase the chances to reach the community, increase knowledge and foster interaction. It consists of awarding the best Social Media reporters for each SIGMM conference, being the award a free registration to one of the SIGMM conference within a period of one year. All SIGMM members are welcome to participate and contribute, and are candidates to receive the award.

The Social Media Editors will issue a new open Call for Reports (CfR) via the Social Media channels every time a new SIGMM conference takes place, so the community can remember or be aware of this initiative, as well as can refresh its requirements and criteria.

The CfR will encourage activity on Social Media channels, posting information and contents related to the SIGMM conferences, with the proper hashtags (see our Recommendations). The reporters will be encouraged to mainly use Twitter, but other channels and innovative forms or trends of dissemination will be very welcome!

The Social Media Editors will be the jury for deciding the best reports (i.e., collection of posts) on Social Media channels, and thus will not qualify for this award. The awarded reporters will be additionally asked to provide a post-summary of the conference. The number of awards for each SIGMM conference is indicated in the table below. The awarded reporter will get a free registration to one of the SIGMM conferences (of his/her choice) within a period of one year.

Read more

Report from ACM ICMR 2017

ACM ICMR 2017 in “Little Paris”

ACM ICMR is the premier International Conference on Multimedia Retrieval, and from 2011 it “illuminates the state of the arts in multimedia retrieval”. This year, ICMR was in an wonderful location: Bucharest, Romania also known as “Little Paris”. Every year at ICMR I learn something new. And here is what I learnt this year.


Final Conference Shot at UP Bucharest

UNDERSTANDING THE TANGIBLE: object, scenes, semantic categories – everything we can see.

1) Objects (and YODA) can be easily tracked in videos.

Arnold Smeulders delivered a brilliant keynote on “things” retrieval: given an object in an image, can we find (and retrieve) it in other images, videos, and beyond? Very interesting technique for tracking objects (e.g. Yoda) in videos based on similarity learnt through siamese networks.

Tracking Yoda with Siamese Networks

Tracking Yoda with Siamese Networks

2) Wearables + computer vision help explore cultural heritage sites.

As showed in his keynote, at MICC University of Florence, Alberto del Bimbo and his amazing team have designed smart audio guides for indoor and outdoor spaces. The system detects, recognises, and describes landmarks and artworks from wearable camera inputs (and GPS coordinates, in case of outdoor spaces)

3) We can finally quantify how much images provide complementary semantics compared to text [BEST MULTIMODAL PAPER AWARD].

For ages, the community have asked how relevant different modalities are for multimedia analysis: this paper ( finally proposes solution to quantify information gaps between different modalities.

4) Exploring news corpuses is now very easy: news graphs are easy to navigate and aware of the type of relations between articles.

Remi Bois and his colleagues presented this framework (, made for professional journalists and the general public, for seamlessly browsing through large-scale news corpus. They built a graph where nodes are articles in a news corpus. The most relevant items to each article are chosen (and linked) based on an adaptive nearest neighbor technique. Each link is then characterised according to the type of relation of the 2 linked nodes.

5) Panorama outdoor images are much easier to localise.

In his beautiful work (, Ahmet Iscen from Inria developed an algorithm for location prediction from StreetView images, outperforming the state of the art thanks to an intelligent stitching pre-processing step: predicting locations from panoramas (stitched individual views) instead of individual street images improve performances dramatically!

UNDERSTANDING THE INTANGIBLE: artistic aspects, beauty, intent: everything we can perceive

1) Image search intent can be predicted by the way we look.

In his best paper candidate research work (, Mohammad Soleymani showed that image search intent (seeking information, finding content, or re-finding content) can be predicted from physisological responses (eye gaze) and implicit user interaction (mouse movements).

2) Real-time detection of fake tweets is now possible using user and textual cues.

Another best paper ( candidate, this time from CERTH. The team collected a large dataset of fake/real sample tweets spanning 17 events and built an effective model from misleading content detection from tweet content and user characteristics. A live demo here:

3) Music tracks have different functions in our daily lives.

Researchers from TU Delft have developed an algorithm ( which classifies music tracks according to their purpose in our daily activities: relax, study and workout.

4) By transferring image style we can make images more memorable!

The team at University of Trento built an automatic framework ( to improve image memorability. A selector finds the style seeds (i.e. abstract paintings) which are likely to increase memorability of a given image, and after style transfer, the image will be more memorable!

5) Neural networks can help retrieve and discover child book illustrations.

In this ( amazing work, motivated by real children experiences, Pinar and her team from Hacettepe University collected a large dataset of children book illustrations and found that neural networks can predict and transfer style, allowing to make “Winnie the witch”-like many other illustrations.

Winnie the Witch

Winnie the Witch

6) Locals perceive their neighborhood as less interesting, more dangerous and dirtier compared to non-locals. 

In this ( wonderful work presented by Darshan Santain from IDIAP, researchers asked locals and crowd-workers to look at pictures from various neighborhoods in Guanajuato.

THE FUTURE: What’s Next?

1) We will be able to anonymize images of outdoor spaces thanks to Instagram filters, as proposed by this ( work in the Brave New Idea session.

When an image of an outdoor space is manipulated with appropriate Instagram filters, the location of the image can be masked from vision-based geolocation classifiers.

2) Soon we will be able to embed watermarks in our Deep Neural Network models in order to protect our intellectual property [BEST PAPER AWARD].

This is a disruptive, novel idea, and that is why this work from KDDI Research and Japan National Institute of Informatics won the best paper award. Congratulations!

3) Given an image view of an object, we will predict the other side of things (from Smeulders’ keynote). In the pic: predicting the other side of chairs. Beautiful.

Predicting the other side of things

Predicting the other side of things 


THANKS: To the organisers, to the volunteers, and to all the authors for their beautiful work :)

Sucheta Ghosh

End-to-End Discourse Parsing with Cascaded Structured Prediction

Supervisor(s) and Committee member(s): Firstname Lastname (supervisor), ...


Parsing discourse is a challenging natural language processing task. In this research work first we take a data driven approach to identify arguments of explicit discourse connectives. In contrast to previous work we do not make any assumptions on the span of arguments and consider parsing as a token-level sequence labeling task. We design the argument segmentation task as a cascade of decisions based on conditional random fields (CRFs). We train the CRFs on lexical, syntactic and semantic features extracted from the Penn Discourse Treebank and evaluate feature combinations on the commonly used test split. We show that the best combination of features includes syntactic and semantic features. The comparative error analysis investigates the performance variability over connective types and argument positions. We also compare the results of cascaded pipeline with a non-cascaded structured prediction setting that shows us definitely the cascaded structured prediction is a better performing method for discourse parsing.

We present a novel end-to-end discourse parser that, given a plain text document in input, identifies the discourse relations in the text, assigns them a semantic label and detects discourse arguments spans. The parsing architecture is based on a cascade of decisions supported by Conditional Random Fields (CRF). We train and evaluate three different parsers using the PDTB corpus. The three system versions are compared to evaluate their robustness with respect to deep/shallow and automatically extracted syntactic features.

Next, we describe two constraint-based methods that can be used to improve the recall of a shallow discourse parser based on conditional random field chunking. These method uses a set of natural structural constraints as well as others that follow from the annotation guidelines of the Penn Discourse Treebank. We evaluated the resulting systems on the standard test set of the PDTB and achieved a rebalancing of precision and recall with improved F-measures across the board. This was especially notable when we used evaluation metrics taking partial matches into account; for these measures, we achieved F-measure improvements of several points.

Finally, we address the problem of optimization in discourse parsing. A good model for discourse structure analysis needs to account both for local dependencies at the token-level and for global dependencies and statistics. We present techniques on using inter-sentential or sentence-level(global), data-driven, non-grammatical features in the task of parsing discourse. The parser model follows up previous approach based on using token-level (local) features with conditional random fields for shallow discourse parsing, which is lacking in structural knowledge of discourse.  The parser adopts a two-stage approach where first the local constraints are applied and then global constraints are used on a reduced weighted search space ($n$-best). In the latter stage we experiment with different rerankers trained on the first stage $n$-best parses, which are generated using lexico-syntactic local features. The two-stage parser yields significant improvements over the best performing model of discourse parser on the PDTB corpus.

Journal issue TOCs

  1. IJMIR Volume 6, Issue 2
  2. MMSJ Volume 23, Issue 3
  3. MMSJ Volume 23, Issue 4
  4. TOMM Volume 13, Issue 3s
  5. TOMM Volume 13, Issue 3
  6. TOMM Volume 13, Issue 2
  7. MTAP Volume 76, Issue 7
  8. MTAP Volume 76, Issue 8
  9. MTAP Volume 76, Issue 9
  10. MTAP Volume 76, Issue 10
  11. MTAP Volume 76, Issue 11
  12. MTAP Volume 76, Issue 12
  13. MTAP Volume 76, Issue 13
  14. MTAP Volume 76, Issue 14

IJMIR Volume 6, Issue 2

MMSJ Volume 23, Issue 3

Editor-in-Chief: Thomas Plagemann


Published: June 2017


MMSJ Volume 23, Issue 4

Editor-in-Chief: Thomas Plagemann


Published: July 2017

TOMM Volume 13, Issue 3s

Alberto Del Bimbo


Published: July 2017


TOMM Volume 13, Issue 3

Alberto Del Bimbo


Published: July 2017


TOMM Volume 13, Issue 2

Alberto Del Bimbo


Published: May 2017


MTAP Volume 76, Issue 7

Editor-in-Chief: Borko Furht


Published: April 2017


MTAP Volume 76, Issue 8

Editor-in-Chief: Borko Furht


Published: April 2017


MTAP Volume 76, Issue 9

Editor-in-Chief: Borko Furht


Published: May 2017


MTAP Volume 76, Issue 10

Editor-in-Chief: Borko Furht


Published: May 2017


MTAP Volume 76, Issue 11

Editor-in-Chief: Borko Furht


Published: June 2017


MTAP Volume 76, Issue 12

Editor-in-Chief: Borko Furht


Published: June 2017


MTAP Volume 76, Issue 13

Editor-in-Chief: Borko Furht


Published: July 2017


MTAP Volume 76, Issue 14

Editor-in-Chief: Borko Furht


Published: July 2017


Back Matter

Call for Contributions

We await your contributions continuously. Please take a closer look at the information that is required for a particular submission to the newsletter.

Notice to Contributing Authors to SIG Newsletters

By submitting your article for distribution in this Special Interest Group publication, you hereby grant to ACM the following non-exclusive, perpetual, worldwide rights:

  • to publish in print on condition of acceptance by the editor
  • to digitize and post your article in the electronic version of this publication
  • to include the article in the ACM Digital Library and in any Digital Library related services
  • to allow users to copy and distribute the article for noncommercial, educational or research purposes

However, as a contributing author, you retain copyright to your article and ACM will refer requests for republication directly to you.



Pablo Cesar, CWI


  • Matthias Lux, Klagenfurt University
  • Marco Bertini, University of Florence
  • Michael Riegler, Simula Research Laboratory
  • Christian Timmerer, Klagenfurt University
  • Mario Montagud, UPV (Spain)
  • Niall Murray, Athlone Institute of Technology (AIT)
  • Jochen Huber, Synaptics
  • Herman Engelbrecht, Stellenbosch University
  • Antonio Pinheiro, University Beira Interior
  • Bart Thomee, Google/YouTube
  • Miriam Redi, Nokia
  • Xavier Almeda-Pineda, INRIA Grenoble Rhône-Alpes
  • Carsten Griwodz, Simula Research Laboratory