Opinion Column: Privacy and Multimedia

 

The discussion: multimedia data is affected by new forms of privacy threats, let’s learn, protect, and engage our users.

For this edition of the SIGMM Opinion Column, we carefully selected the discussion’s main topic, looking for an appealing and urgent problem arising for our community. Given the recent Cambridge Analytica’s scandal, and the upcoming enforcement of the General Data Protection Act in EU countries, we thought we should have a collective reflection on  ‘privacy and multimedia’.

The discussion: multimedia data is affected by new forms of privacy threats, let’s learn, protect, and engage our users.

Users share their data often unintentionally. One could indeed observe a diffuse sense of surprise and anger following the data leaks from Cambridge Analytica. As mentioned in a recent blog post from one of the participants, so far, large-scale data leaks have mainly affected private textual and social data of social media users. However, images and videos also contain private user information. There was a general consensus that it is time for our community to start thinking about how to protect private visual and multimedia data.

It was noted that computer vision technologies are now able to infer sensitive information from images (see, for example, a recent work on sexual orientation detection from social media profile pictures). However few technologies exist that defend users against automatic inference of private information from their visual data. We will need to design protection techniques to ensure users’ privacy protection for images as well, beyond simple face de-identification. We might also want users to engage and have fun with image privacy preserving tools, and this is the aim of the Pixel Privacy project.

But in multimedia, we go beyond image analysis. By nature, as multimedia researchers, we combine different sources of information to design better media retrieval or content serving technologies, or to ‘get more than the sum of parts’. While this is what makes our research so special, in the discussion participants noted that multimodal approaches might also generate new forms of privacy threats. Each individual source of data comes with its own privacy dimension, and we should be careful about the multiple privacy breaches we generate by analyzing each modality. At the same time, by combining different medias and their privacy dimensions, and performing massive inference on the global multimodal knowledge, we might also be generating new forms of threats to user privacy that individual stream don’t have.

Finally, we should also inform users about these new potential threats:  as experts who are doing ‘awesome cutting-edge work’, we also have a responsibility to make sure people know what the potential consequences are.

A note on the new format, the response rate, and a call for suggestions!

This quarter, we experimented with a new, slimmer format, hoping to reach out to more members of the community, beyond Facebook subscribers.

We extended the outreach beyond Facebook: we used the SIGMM Linkedin group for our discussion, and we directly contacted senior community members. To engage community members with limited time for long debates, we also lightened the format, asking anyone who is interested in giving us their opinion on the topic to send us or share with the group a one-liner reflecting their view on privacy on multimedia.

Despite the new format, we received a limited number of replies. We will keep trying new formats. Our aim is to generate fruitful  discussions, and gather opinions on crucial problems in a bottom-up fashion. We hope, edition after edition, to get better at giving voice to more and more members of the Multimedia Community.

We are happy to hear your thoughts on how to improve, so please reach out to us!

Multidisciplinary Community Spotlight: Assistive Augmentation

 

Emphasizing the importance of neighboring communities for our work in the field of multimedia was one of the primary objectives we set out with when we started this column about a year ago. In past issues, we gave related communities a voice through interviews and personal accounts. For instance, in the third issue of 2017, Cynthia shared personal insights from the International Society of Music Information Retrieval [4]. This issue continues the spotlight series.

Since its inception, I was involved with the Assistive Augmentation community—a multidisciplinary field that sits at the intersection of accessibility, assistive technologies, and human augmentation. In this issue, I briefly reflect on my personal experiences and research work within the community.

First, let me provide a high-level view on Assistive Augmentation and its general idea which is that of cross-domain assistive technology. Instead of putting sensorial capability in individual silos, the approach puts it on a continuum of usability for a specific technology. As an example, a reading aid for people with visual impairments enables access to printed text. At the same time, the reading aid can also be used by those with an unimpaired visual sense for other applications like language learning. In essence, the field is concerned with the design, development, and study of technology that substitutes, recovers, empowers or augments physical, sensorial or cognitive capabilities, depending on specific user needs (see Figure 1).

Assistive Augmentation Continuum

Figure 1.  Assistive Augmentation Continuum

Now let us take a step back. I joined the MIT Media Lab as a postdoctoral fellow in 2013 pursuing research on multi-sensory cueing for mobile interaction. With my background in user research and human-computer interaction, I was immediately attracted by an ongoing project at the lab lead by Roy Shilkrot, Suranga Nanayakkara and Pattie Maes, that involved studying how the MIT visually impaired and blind user group (VIBUG) uses assistive technology. People in that group are particularly tech-savvy. I came to know products like the ORCAM MyEye. It is priced at about 2500-4500 USD and aims at recognizing text, objects and so forth. Back in 2013 it had a large footprint and made its users really stand out. Our general observations were, to briefly summarize, that many tools we got to know during regular VIBUG meetings were highly specialized for this very target group. The latter is, of course, a good thing since it focuses directly on the actual end user. However, we also concluded that it locks the products in silos of usability defined by its’ end users’ sensorial capabilities. 

These anecdotal observations bring me back to the general idea of Assistive Augmentation. To explore this idea further, we proposed to hold a workshop at a conference, jointly with colleagues in neighboring communities. With ACM CHI attracting folks from different fields of research, we felt like it would be a good fit to test the waters and see whether we could get enough interest from different communities. Our proposal was successful: the workshop was held in 2014 and set the stage for thinking about, discussing and sketching out facets of Assistive Augmentation. As intended, our workshop attracted a very diverse crowd from different fields. Being able to discuss opportunities and the potential of Assistive Augmentation with such a group was immensely helpful and contributed significantly to our ongoing efforts to define the field. A practice I would encourage everyone at a similar stage to follow.

As a tangible outcome of this very workshop, our community decided to pursue a jointly edited volume which Springer published earlier this year [3]. The book illustrates two main areas of Assistive Augmentation by example: (i) sensory enhancement and substitution and (ii) design for Assistive Augmentation. Peers contributed comprehensive reports on case studies which serve as lighthouse projects to exemplify Assistive Augmentation research practice. Besides, the book features field-defining articles that introduce each of the two main areas.

Many relevant areas have yet to be touched upon, for instance, ethical issues, quality of augmentations and their appropriations. Augmenting human perception, another important research thrust, has recently been discussed in both SIGCHI and SIGMM communities. Last year, a workshop on “Amplification and Augmentation of Human Perception” was held by Albrecht Schmidt, Stefan Schneegass, Kai Kunze, Jun Rekimoto and Woontack Woo at ACM CHI [5]. Also, one of last year’s keynotes at ACM Multimedia focused on “Enhancing and Augmenting Human Perception with Artificial Intelligence” by Achin Bhowmik [1]. These ongoing discussions in academic communities underline the importance of investigating, shaping and defining the intersection of assistive technologies and human augmentations. Academic research is one avenue that must be pursued, with work being disseminated at dedicated conference series such as Augmented Human [6]. Other avenues that highlight and demonstrate the potential of Assistive Augmentation technology include for instance sports, as discussed within the Superhuman Sports Society [7]. Most recently, the Cybathlon was held for the very first time in 2016. Athletes with “disabilities or physical weakness use advanced assistive devices […] to compete against each other” [8].

Looking back at how the community came about, I conclude that organizing a workshop at a large academic venue like CHI was an excellent first step for establishing the community. In fact, the workshop created a fantastic momentum within the community. However, focusing entirely on a jointly edited volume as the main tangible outcome of the workshop had several drawbacks. In retrospect, the publication timeline was far too long, rendering it impossible to capture the dynamics of an emerging field. But indeed, this cannot be the objective of a book publication—this should have been the objective of follow-up workshops in neighboring communities (e.g., at ACM Multimedia) or special issues in a journal with a much shorter turn-around. With our book project now being concluded, we aim to pick up on past momenta with a forthcoming special issue on Assistive Augmentation in MDPI’s Multimodal Technologies and Interaction journal. I am eagerly looking forward to what is next and to our communities’ joint work across disciplines towards pushing our physical, sensorial and cognitive abilities.

References

[1]       Achin Bhowmik. 2017. Enhancing and Augmenting Human Perception with Artificial Intelligence Technologies. In Proceedings of the 2017 ACM on Multimedia Conference(MM ’17), 136–136.

[2]       Ellen Yi-Luen Do. 2018. Design for Assistive Augmentation—Mind, Might and Magic. In Assistive Augmentation. Springer, 99–116.

[3]       Jochen Huber, Roy Shilkrot, Pattie Maes, and Suranga Nanayakkara (Eds.). 2018. Assistive Augmentation. Springer Singapore.

[4]       Cynthia Liem. 2018. Multidisciplinary column: inclusion at conferences, my ISMIR experiences. ACM SIGMultimedia Records9, 3 (2018), 6.

[5]       Albrecht Schmidt, Stefan Schneegass, Kai Kunze, Jun Rekimoto, and Woontack Woo. 2017. Workshop on Amplification and Augmentation of Human Perception. In Proceedings of the 2017 CHI Conference Extended Abstracts on Human Factors in Computing Systems, 668–673.

[6]       Augmented Human Conference Series. Retrieved June 1, 2018 from http://www.augmented-human.com/

[7]       Superhuman Sports Society. Retrieved June 1, 2018 from http://superhuman-sports.org/

[8]       Cybathlon. Cybathlon – moving people and technology. Retrieved June 1, 2018 from http://www.cybathlon.ethz.ch/

 


About the Column

The Multidisciplinary Column is edited by Cynthia C. S. Liem and Jochen Huber. Every other edition, we will feature an interview with a researcher performing multidisciplinary work, or a column of our own hand. For this edition, we feature a column by Jochen Huber.

Editor Biographies

Cynthia_Liem_2017Dr. Cynthia C. S. Liem is an Assistant Professor in the Multimedia Computing Group of Delft University of Technology, The Netherlands, and pianist of the Magma Duo. She initiated and co-coordinated the European research project PHENICX (2013-2016), focusing on technological enrichment of symphonic concert recordings with partners such as the Royal Concertgebouw Orchestra. Her research interests consider music and multimedia search and recommendation, and increasingly shift towards making people discover new interests and content which would not trivially be retrieved. Beyond her academic activities, Cynthia gained industrial experience at Bell Labs Netherlands, Philips Research and Google. She was a recipient of the Lucent Global Science and Google Anita Borg Europe Memorial scholarships, the Google European Doctoral Fellowship 2010 in Multimedia, and a finalist of the New Scientist Science Talent Award 2016 for young scientists committed to public outreach.

 

jochen_huberDr. Jochen Huber is a Senior User Experience Researcher at Synaptics. Previously, he was an SUTD-MIT postdoctoral fellow in the Fluid Interfaces Group at MIT Media Lab and the Augmented Human Lab at Singapore University of Technology and Design. He holds a Ph.D. in Computer Science and degrees in both Mathematics (Dipl.-Math.) and Computer Science (Dipl.-Inform.), all from Technische Universität Darmstadt, Germany. Jochen’s work is situated at the intersection of Human-Computer Interaction and Human Augmentation. He designs, implements and studies novel input technology in the areas of mobile, tangible & non-visual interaction, automotive UX and assistive augmentation. He has co-authored over 60 academic publications and regularly serves as program committee member in premier HCI and multimedia conferences. He was program co-chair of ACM TVX 2016 and Augmented Human 2015 and chaired tracks of ACM Multimedia, ACM Creativity and Cognition and ACM International Conference on Interface Surfaces and Spaces, as well as numerous workshops at ACM CHI and IUI. Further information can be found on his personal homepage: http://jochenhuber.com

 

Sharing and Reproducibility in ACM SIGMM

 

This column discusses the efforts of ACM SIGMM towards sharing and reproducibility. Apart from the specific sessions dedicated to open source and datasets, ACM Multimedia Systems started to provide official ACM badges for articles that make artifacts available since last year. This year, it has marked a record with 45% of the articles acquiring such a badge.


Without data it is impossible to put theories to the test. Moreover, without running code it is tedious at best to (re)produce and evaluate any results. Yet collecting data and writing code can be a road full of pitfalls, ranging from datasets containing copyrighted materials to algorithms containing bugs. The ideal datasets and software packages are those that are open and transparent for the world to look at, inspect, and use without or with limited restrictions. Such “artifacts” make it possible to establish public consensus on their correctness or otherwise to start a dialogue on how to fix any identified problems.

In our interconnected world, storing and sharing information has never been easier. Despite the temptation for researchers to keep datasets and software to themselves, a growing number are willing to share their resources with others. To further promote this sharing behavior, conferences, workshops, publishers, non-profit and even for-profit companies are increasingly recognizing and supporting these efforts. For example, the ACM Multimedia conference has hosted an open source software competition since 2004, and the ACM Multimedia Systems conference has included an open datasets and software track since 2011 . The ACM Digital Library now also hands out badges to public artifacts that have been made available and optionally reviewed and verified by members of the community. At the same time, organizations such as Zenodo and Amazon host open datasets for free. Sharing ultimately pays off: the citation statistics for ACM Multimedia Systems conferences over the past five years, for example, show that half of the 20 most cited papers shared data and code although they have represented a small fraction of the published papers so far.

graphic datasets

Good practices are increasingly adopted. In this year’s edition of the ACM Multimedia Systems conference, 69 works (papers, demos, datasets, software) were accepted, out of which 31 (45%) were awarded an ACM badge. This is a large increase compared to last year, when out of 42 works only a total of 13 (31%) received one. This greatly expands one of the core objectives of both the conference and SIGMM towards open science. At this moment, the ACM Digital Library does not separately index which papers received a badge, making it challenging to find all papers who have one. It further appears not many other ACM conferences are aware of the badges yet; for example, while ACM Multimedia accepted 16 open source papers in 2016 and 6 papers in 2017, none applied for a badge. This year at ACM Multimedia Systems only “artifacts available” badges have been awarded. For next year our intention is to ensure all dataset and software submissions receive the “artifacts evaluated” badge. This would require several committed community members to spend time working with the authors to get the artifacts running on all major platforms with corresponding detailed documentation.

The accepted artifacts this year are diverse in nature: several submissions focus on releasing artifacts related to quality of experience of (mobile/wireless) streaming video, while others center on making datasets and tools related to images, videos, speech, sensors, and events available; in addition, there are a number of contributions in the medical domain. It is great to see such a range of interests in our community!

SIGMM Annual Report (2018)

 

Dear Readers,

Each year SIGMM, like all ACM SIGs, produces an annual report summarising our activities which includes our sponsored and i-cooperation conferences and also the initiatives we are undertaking to support our community and broaden participation. The report also includes our significant papers, our awards given and the major issues that face us going forward. Below is the main text of the SIGMM report 2017-2018 which is augmented by further details on our conferences which is provided by the ACM Office. We hope you enjoy reading this ad learning about what SIGMM does.


SIGMM Annual Report (2018)
Prepared by SIGMM Chair (Alan Smeaton),
Vice Chair (Nicu Sebe), and Conference Director (Gerald Friedland)
August 6th, 2018

Mission: SIGMM provides an international interdisciplinary forum for researchers, engineers, and practitioners in all aspects of multimedia computing, communication, storage and application.

1. Awards:
SIGMM gives out three awards each year and these were as follows:

  • SIGMM Technical Achievement Award for lasting contributions to multimedia computing, communications and applications was presented to Arnold W.M. Smeulders, University of Amsterdam, the Netherlands. The award was given in recognition of his outstanding and pioneering contributions to defining and bridging the semantic gap in content-based image retrieval.
  • SIGMM 2016 Rising Star Award was given to Dr Liangliang Cao of HelloVera. AI for his significant contributions in large-scale multimedia recognition and social media mining.
  • SIGMM Outstanding PhD Thesis in Multimedia Computing Award was given to Chien-Nan (Shannon) Chen for a thesis entitled Semantics-Aware Content Delivery Framework For 3D Tele-Immersion at the University of Illinois at Urbana-Champaign, US.

2. Significant Papers:

The SIGMM flagship conference, ACM Multimedia 2017, was held in Mountain View, Calif. And presented the following awards plus other awards for Best Grand Challenge Video Captioning Paper, Best Grand Challenge Social Media Prediction Paper, Best Brave New Idea Paper:

  • Best paper award to “Adversarial Cross-Modal Retrieval”, by Bokun Wang, Yang Yang, Xing Xu, Alan Hanjalic, Heng Tao Shen
  • Best student paper award to “H-TIME: Haptic-enabled Tele-Immersive Musculoskeletal Examination”, by Yuan Tian, Suraj Raghuraman, Thiru Annaswamy, Aleksander Borresen, Klara Nahrstedt, Balakrishnan Prabhakaran
  • Best demo award to “NexGenTV: Providing Real-Time Insight during Political Debates in a Second Screen Application” by Olfa Ben Ahmed, Gabriel Sargent, Florian Garnier, Benoit Huet, Vincent Claveau, Laurence Couturier, Raphaël Troncy, Guillaume Gravier, Philémon Bouzy  and Fabrice Leménorel.
  • Best Open source software award to “TensorLayer: A Versatile Library for Efficient Deep Learning Development” by Hao Dong, Akara Supratak, Luo Mai, Fangde Liu, Axel Oehmichen, Simiao Yu, Yike Guo.

The 9th ACM International Conference on Multimedia Systems (MMSys 2018), was held in Amsterdam, the Netherlands, and presented a range awards including:

  • Best paper award to “Dynamic Adaptive Streaming for Multi-Viewpoint Omnidirectional Videos” by Xavier Corbillon, Francesca De Simone, Gwendal Simon and Pascal Frossard.
  • Best student-paper award to “Want to Play DASH? A Game Theoretic Approach for Adaptive Streaming over HTTP” by Abdelhak Bentaleb, Ali C. Begen, Saad Harous and Roger Zimmermann.

The International Conference in Multimedia Retrieval (ICMR) 2018 was held in Yokohama, Japan, and presented a range of awards including:

  • Best paper award to “Learning Joint Embedding with Multimodal Cues for Cross-Modal Video-Text Retrieval” by Niluthpol Mithun, Juncheng Li, Florian Metze and Amit Roy-Chowdhury.

The best paper and best student paper from each of these three conferences were then reviewed by a specially set up committee to select one paper which has been nominated for Communications of the ACM Research Highlights and that is presently under consideration.

In addition to the above, SIGMM presented the 2017 ACM Transactions on Multimedia Computing, Communications and Applications (TOMM) Nicolas D. Georganas Best Paper Award to the paper “Automatic Generation of Visual-Textual Presentation Layout” (TOMM vol. 12, Issue 2) by Xuyong Yang, Tao Mei, Ying-Qing Xu, Yong Rui, and Shipeng Li.

3. Significant Programs that Provide a Springboard for Further Technical Efforts

  • SIGMM provided support for student travel through grants, at all of our SIGMM-sponsored conferences.
  • Apart from the specific sessions dedicated to open source and datasets, the ACM Multimedia Systems Conference (MMSys) has started to provide official ACM badging for articles that make artifacts available. This year, our second year for doing this, has marked a record with 45% of the articles published at the conference acquiring such a reproducibility badge.

4. Innovative Programs Providing Service to Some Part of Our Technical Community

  • A large part of our research area in SIGMM is driven by the availability of large datasets, usually used for training purposes.  Recent years have shown a large growth in the emergence of openly available datasets coupled with grand challenge events at our conferences and workshops. Mostly these are driven by our corporate researchers but this allows all of our researchers the opportunity to carry out their research at scale.  This provides great opportunities for our community.
  • Following the lead of SIGARCH we have commissioned a study of gender distribution among the SIGMM conferences, conference organization and awards. This report will be completed and presented at our flagship conference in October.  We have also commissioned a study of the conferences and journals which mostly influence, and are influenced by, our own SIGMM conferences as an opportunity for some self-reflection on our origins, and our future.  Both these follow an open call for new initiatives to be supported by SIGMM. 
  • SIGMM Conference Director Gerald Friedland worked with several volunteers from SIGMM to improve the content and organization of ACM Multimedia and connected conferences. Volunteer Dayid Ayman Shamma used data science methods to analyze several ACM MM conferences in the past five years with the goal of identifying biases and patterns of irregularities. Some results were presented at the ACM MM TPC meeting. Volunteers Hayley Hung and Martha Larson gave an account of their expectations and experiences with ACM Multimedia and Dr. Friedland himself volunteered as a reviewer for conferences of similar size and importance, including NIPS and CSCW and approached the chairs to get external feedback into what can be improved in the review process. Furthermore, in September, Dr. Friedland will travel to Berlin to visit Lutz Prechelt, who invented a review quality management system. The results of this work will be included into a conference handbook that will put down standard recommendations of best practices for future organizers of SIGMM conferences. We expect the book to be finished by the end of 2018.
  • Last year SIGMM made a decision to try to co-locate conferences and other events as much as possible and the ACM Multimedia conference was co-located with the European Conference on Computer Vision (ECCV) in 2016 with joint workshops and tutorials. This year the ACM MultiMedia Systems (MMSys) conference was co-located with the 10th International Workshop on Immersive Mixed and Virtual Environment Systems (MMVE2018), the16th Annual Workshop on Network and Systems Support for Games (NetGames2018), the 28th ACM SIGMM Workshop on Network and Operating Systems Support for Digital Audio and Video (NOSSDAV2018) and the 23rd Packet Video Workshop (PV2018).  In addition, the Technical Program Committee meeting for the Multimedia Conference was co-located with the ICMR conference.

5. Events or Programs that Broaden Participation

  • SIGMM has approved the launch of a new conference series called Multimedia Asia which will commence in 2019. This will be run by the SIGMM China Chapter and consolidates two existing multimedia-focused conferences in Asia under the sponsorship and governance of SIGMM. This follows a very detailed review and the successful location for the inaugural conference in 2019 will be announced at our flagship conference in October 2018.
  • The Women / Diversity in Multimedia Lunch at ACM MULTIMEDIA 2017 (previously the Women’s Lunch) continued this year with an enlarged program of featured speakers and discussion which led to the call for the gender study in Multimedia mentioned earlier.
  • SIGMM continues to pursue an active approach to nurturing the careers of our early stage researchers. The “Emerging Leaders” event (formerly known as Rising Stars) skipped a year in 2017 but will be happening again in 2018 at the Multimedia Conference.  Giving these early career researchers the opportunity to showcase their vision helps to raise their visibility and helps SIGMM to enlarge the pool of future volunteers.
  • The expansion we put in place in our social media communication team has proven to be a shrewd move with a large growth in our website traffic and raised profile on social media. We also invite conference attendees to post on twitter and/or Facebook about papers, demos, talks that they think are most thought provoking and forward looking and the most active of these are rewarded with a free registration at a future SIGMM-sponsored conference.

6. Issues for SIGMM in the next 2-3 years

  • Like other SIGs, we realize that improving the diversity of the community we serve is essential to continuing our growth and maintaining our importance and relevance. This includes diversity in gender, in geographical location, and in many other facets.  We have started to address these through some of the initiatives mentioned earlier, and at our flagship conference in 2017 we ran a Workshop emphasizing contributions focusing on research from South Africa and the African continent in general.
  • Leadership and supporting young researchers in the early stages of their careers is also important and we highlight this through 2 of our regular awards (Rising Stars and Best Thesis). The “Emerging Leaders” event (formerly known as Rising Stars) skipped a year in 2017 but will be happening again in 2018 at the Multimedia Conference.
  • We wish to reach to other SIGs with whom we could have productive engagement because we see multimedia as a technology enabler as well as an application unto itself. To this end we will continue to try to hold joint panels or workshops at our conferecnes.
  • Our research area is marked by the growth and availability of open datasets and grand challenge competitions held at our conferences and workshops. These datasets are often provided from the corporate sector and this is both an opportunity for us to do research on datasets otherwise unavailable to us, as well as being a threat to the balance between corporate influence and independence.
  • In a previous annual report we highlighted the difficulties caused by a significant portion of our conference proceedings not being indexed by Thomson Web of Science. In a similar vein we find our conference proceedings are not used as input into CSRankings, a metrics-based ranking of Computer Science institutions worldwide. Publishing at venues which are considered in CSRankings’ operation is important to much of our community and while we are in the process of trying to re-dress this, support of ACM on making this case would be welcome.

Towards Data Driven Conferences Within SIGMM

There is no doubt that research in our field has become more data driven. And while the term data science might suggest there is some science without data, it is data the feeds our systems, trains our networks, and tests our results. Recently we have seen examples of a few conferences experimenting with their review process to gain new insights. For those of us in the Multimedia (MM) community, it is not an entirely new thing. In 2013, I led an effort along with my TPC Co-Chairs to look at the past several years of conferences and examine the reviewing system, the process, the scores, and the reviewer load. Additionally, we ran surveys to the authors (accepted and rejected) and the reviewers and ACs to gauge how we did. This was presented at the business meeting in Barcelona along with the suggestion that this practice continues. While this was met with great praise, it was never repeated.

Fast forward to 2017, I found myself asking the same questions about the MM review process which went through several changes (such as the late addition of the “Thematic Workshops” as well as an explicit COI—for papers from the Chairs —which we stated in 2013 could have adverse effects). And, just like before, I requested data from the Director of Conferences and SIGMM chair so I could run an analysis. There are a few things to note about the 2017 data.

  • Some reviews were contained in attachments which were unavailable.
  • Rebuttals were not present (some chairs allowed them, some did not).
  • The conference was divided into a “Regular” set of tracks and a “COI” track for anyone who was on the PC and submitted a paper.
  • The Call for Papers was a mixture of “Papers and Posters”. 

The conference reports:

Finally, the Program Committee accepted 189 out of 684 submissions, yielding an acceptance rate of 27.63 percent. Among the 189 accepted full papers, 49 are selected to give oral presentations on the conference, while the rest are arranged to present to conference attendees in a poster format. 

Track Reviewed Accepted Rate
Main Paper 684 189 27.63%
Thematic Workshop 495 64 12.93%

In 2017, in a departure from previous years, the chairs decided to invite roughly 9% of the accepted papers for an oral presentation with the remaining accepts delegated to a larger poster session. During the review process, to be inclusive, a decision was made to invite some of the rejected papers to a non-archival Thematic workshop where their work could be presented as posters in a non-archival format such that the article can be published elsewhere at a future date. The published rates for these Thematic workshop was 64/495 or roughly 13% of the rejected papers. To dive in further, first, we compute the accepted orals and posters against the total submitted. Second, amongst the rejected papers, we compute the percent of rejects that were invited to a Thematic Workshop. However in the dataset there were 113 papers invited for Thematic Workshops; 49 of these did not make it into the program as the authors refused the automatic enrollment invitation.

  Normal COI Normal Rate COI Rate
Oral 41 8 7.03% 7.92%
Poster 123 17 21.1% 16.83%
Workshop 79 34 18.85% 44.74%
Reject 339 42 58.15% 41.58%

Comparing the Regular and COI tracks, we find the scores to be correlated (p<0.003) if the workshops are treated as rejects. Including the workshops into the calculation shows no correlation (p<0.093). To further examine this, we plotted the percent decision by area and track.

Decision Percent Rates by Track and Area

While one must remember the numbers by volume are smaller in the COI track, some inflation will be seen here. Again, by percentage, you can see Novel Topics – Privacy and Experience – Novel Interactions have a higher oral accept rate while Understanding Vision & Deep Learning and Experience Perceptual pulled in higher Thematic Workshop rates.

No real change was seen in the score distribution across the tracks and areas (as seen here in the following jitter plots).

ayman_2b

ayman_3b

For the review lengths, the average size by character was 1452 with an IQR of 1231. Some reviews skewed longer in the Regular track but they still are outliers for the most part. The load averaged around 4 papers per reviewer with some normal exception. The people depicted with more than 10 papers were TCP members or ACs.

ayman_4b

Overall, there is some difference but still a correlation between the COI and Regular tracks and the average number of papers per reviewer was kept to a manageable number. The score distributions roughly seems similar with the exception of the IQR but this is likely more product of the COI track being smaller. For the Thematic Workshops thereʼs an inflation in the accept rate for the COI track: accepting at 18.85% for the regular submissions but 44.74% for the COI. This was dampened by authors rejecting the Thematic Workshop invitation. Of the 79 Regular Workshop invitations and 34 COI invitations, only 50 regular and 14 COI were in the final program. So the final accept rates for what was actually at the conference became 11.93% for Regular Thematic workshop submissions and 18.42% for COI.

So where do we go from here?

Removal of a COI track. A COI track comes and goes in ACM MM and it seems its removal is at the top of the list. Modern conference management software (EasyChair, PCS, CMT, etc.) handles conflicts extremely well already. 

TCP and ACs must monitor reviews. Next, while quantity is not related to quality, a short review length might be an early indicator of poor quality. TCP Chairs and ACs should monitor these reviews because a review of 959 characters is not particularly informative despite it being positive or negative (in fact this paragraph is almost as long as the average review). While some might believe trapping that error is the job of the authors and the Author Advocate (and hence the authors who need to invoke the Advocate), it is the job of the ACs and the TPC to ensure review quality and make sure the Advocate never gets invoked (as we presented the role back when we invented it in 2014).

CMS Systems Need To Support Us. There is no shortage of Conference Management Systems (CMS); none of them are data-driven. Why do I have to export a set of CSVs from a conference system then write R scripts to see there are short reviews? Yelp and TripAdvisor give me guidance on how long my review should be, how is it that a review for ACM MM can be two short sentences?

Provide upfront submission information. The Thematic Workshops were introduced late into the submission process and came as a surprise to many authors. While some innovation in the Technical program is a good idea, the decline rate showed it was undesirable. Some a priori communication with the community might give insights into what experiments we should try and what to avoid. Which falls into the final point.

We need a New Role. And while the SIGMM EC has committed to looking back at past conferences, we should continue this practice routinely. Conferences or ideally the SIGMM EC should create a “Data Health and Metrics” role (or assign this to the Director of Conferences) to continue to oversee the TPC as well as issue post-review and post-conference surveys to learn how we did at each step and ensure we can move forward and grow our community. However, if done right, it will be considerable work and should likely be its own role.

To get started, the SIGMM Executive Committee is working on obtaining past MM conference datasets to further track the history of the conference in a data-forward method. Hopefully youʼll hear more at the ACM MM Business Meeting and Town Hall in Seoul; SIGMM is looking to hear more from the community.

Socially significant music events

Social media sharing platforms (e.g., YouTube, Flickr, Instagram, and SoundCloud) have revolutionized how users access multimedia content online. Most of these platforms provide a variety of ways for the user to interact with the different types of media: images, video, music. In addition to watching or listening to the media content, users can also engage with content in different ways, e.g., like, share, tag, or comment. Social media sharing platforms have become an important resource for scientific researchers, who aim to develop new indexing and retrieval algorithms that can improve users’ access to multimedia content. As a result, enhancing the experience provided by social media sharing platforms.

Historically, the multimedia research community has focused on developing multimedia analysis algorithms that combine visual and text modalities. Less highly visible is research devoted to algorithms that exploit an audio signal as the main modality. Recently, awareness for the importance of audio has experienced a resurgence. Particularly notable is Google’s release of the AudioSet, “A large-scale dataset of manually annotated audio events” [7]. In a similar spirit, we have developed the “Socially Significant Music Event“ dataset that supports research on music events [3]. The dataset contains Electronic Dance Music (EDM) tracks with a Creative Commons license that have been collected from SoundCloud. Using this dataset, one can build machine learning algorithms to detect specific events in a given music track.

What are socially significant music events? Within a music track, listeners are able to identify certain acoustic patterns as nameable music events.  We call a music event “socially significant” if it is popular in social media circles, implying that it is readily identifiable and an important part of how listeners experience a certain music track or music genre. For example, listeners might talk about these events in their comments, suggesting that these events are important for the listeners (Figure 1).

Traditional music event detection has only tackled low-level events like music onsets [4] or music auto-tagging [810]. In our dataset, we consider events that are at a higher abstraction level than the low-level musical onsets. In auto-tagging, descriptive tags are associated with 10-second music segments. These tags generally fall into three categories: musical instruments (guitar, drums, etc.), musical genres (pop, electronic, etc.) and mood based tags (serene, intense, etc.). The types of tags are different than what we are detecting as part of this dataset. The events in our dataset have a particular temporal structure unlike the categories that are the target of auto-tagging. Additionally, we analyze the entire music track and detect start points of music events rather than short segments like auto-tagging.

There are three music events in our Socially Significant Music Event dataset: Drop, Build, and Break. These events can be considered to form the basic set of events used by the EDM producers [1, 2]. They have a certain temporal structure internal to themselves, which can be of varying complexity. Their social significance is visible from the presence of large number of timed comments related to these events on SoundCloud (Figure 1,2). The three events are popular in the social media circles with listeners often mentioning them in comments. Here, we define these events [2]:

  1. Drop: A point in the EDM track, where the full bassline is re-introduced and generally follows a recognizable build section
  2. Build: A section in the EDM track, where the intensity continuously increases and generally climaxes towards a drop
  3. Break: A section in an EDM track with a significantly thinner texture, usually marked by the removal of the bass drum

Figure 1. Screenshot from SoundCloud showing a list of timed comments left by listeners on a music track [11].

Figure 1. Screenshot from SoundCloud showing a list of timed comments left by listeners on a music track [11].


SoundCloud

SoundCloud is an online music sharing platform that allows users to record, upload, promote and share their self-created music. SoundCloud started out as a platform for amateur musicians, but currently many leading music labels are also represented. One of the interesting features of SoundCloud is that it allows “timed comments” on the music tracks. “Timed comments” are comments, left by listeners, associated with a particular time point in the music track. Our “Socially Significant Music Events” dataset is inspired by the potential usefulness of these timed comments as ground truth for training music event detectors. Figure 2 contains an example of a timed comment: “That intense buildup tho” (timestamp 00:46). We could potentially use this as a training label to detect a build, for example. In a similar way, listeners also mention the other events in their timed comments. So, these timed comments can serve as training labels to build machine learning algorithms to detect events.

Figure 2. Screenshot from SoundCloud indicating the useful information present in the timed comments. [11]

Figure 2. Screenshot from SoundCloud indicating the useful information present in the timed comments. [11]

SoundCloud also provides a well-documented API [6] with interfaces to many programming languages: Python, Ruby, JavaScript etc. Through this API, one can download the music tracks (if allowed by the uploader), timed comments and also other metadata related to the track. We used this API to collect our dataset. Via the search functionality we searched for tracks uploaded during the year 2014 with a Creative Commons license, which results in a list of tracks with unique identification numbers. We looked at the timed comments of these tracks for the keywords: drop, break and build. We kept the tracks whose timed comments contained a reference to these keywords and discarded the other tracks.

Dataset

The dataset contains 402 music tracks with an average duration of 4.9 minutes. Each track is accompanied by timed comments relating to Drop, Build, and Break. It is also accompanied by ground truth labels that mark the true locations of the three events within the tracks. The labels were created by a team of experts. Unlike many other publicly available music datasets that provide only metadata or short previews of music tracks  [9], we provide the entire track for research purposes. The download instructions for the dataset can be found here: [3]. All the music tracks in the dataset are distributed under the Creative Commons license. Some statistics of the dataset are provided in Table 1.  

Table 1. Statistics of the dataset: Number of events, Number of timed comments

Event Name Total number of events Number of events per track Total number of timed comments Number of timed comments per track
Drop  435  1.08  604  1.50
Build  596  1.48  609  1.51
Break  372  0.92  619  1.54

The main purpose of the dataset is to support training of detectors for the three events of interest (Drop, Build, and Break) in a given music track. These three events can be considered a case study to prove that it is possible to detect socially significant musical events, opening the way for future work on an extended inventory of events. Additionally, the dataset can be used to understand the properties of timed comments related to music events. Specifically, timed comments can be used to reduce the need for manually acquired ground truth, which is expensive and difficult to obtain.

Timed comments present an interesting research challenge: temporal noise. The timed comments and the actual events do not always coincide. The comments could be at the same position, before, or after the actual event. For example, in the below music track (Figure 3), there is a timed comment about a drop at 00:40, while the actual drop occurs only at 01:00. Because of this noisy nature, we cannot use the timed comments alone as ground truth. We need strategies to handle temporal noise in order to use timed comments for training [1].

Figure 3. Screenshot from SoundCloud indicating the noisy nature of timed comments [11].

Figure 3. Screenshot from SoundCloud indicating the noisy nature of timed comments [11].

In addition to music event detection, our “Socially Significant Music Event” dataset opens up other possibilities for research. Timed comments have the potential to improve users’ access to music and to support them in discovering new music. Specifically, timed comments mention aspects of music that are difficult to derive from the signal, and may be useful to calculate song-to-song similarity needed to improve music recommendation. The fact that the comments are related to a certain time point is important because it allows us to derive continuous information over time from a music track. Timed comments are potentially very helpful for supporting listeners in finding specific points of interest within a track, or deciding whether they want to listen to a track, since they allow users to jump-in and listen to specific moments, without listening to the track end-to-end.

State of the art

The detection of music events requires training classifiers that are able to generalize over the variability in the audio signal patterns corresponding to events. In Figure 4, we see that the build-drop combination has a characteristic pattern in the spectral representation of the music signal. The build is a sweep-like structure and is followed by the drop, which we indicate by a red vertical line. More details about the state-of-the-art features useful for music event detection and the strategies to filter the noisy timed comments can be found in our publication [1].

Figure 4. The spectral representation of the musical segment containing a drop. You can observe the sweeping structure indicating the buildup. The red vertical line is the drop.

Figure 4. The spectral representation of the musical segment containing a drop. You can observe the sweeping structure indicating the buildup. The red vertical line is the drop.

The evaluation metric used to measure the performance of a music event detector should be chosen according to the user scenario for that detector. For example, if the music event detector is used for non-linear access (i.e., creating jump-in points along the playbar) it is important that the detected time point of the event falls before, rather than after, the actual event.  In this case, we recommend using the “event anticipation distance” (ea_dist) as a metric. The ea_dist is amount of time that the predicted event time point precedes an actual event time point and represents the time the user would have to wait to listen to the actual event. More details about ea_dist can be found in our paper [1].

In [1], we report the implementation of a baseline music event detector that uses only timed comments as training labels. This detector attains an ea_dist of 18 seconds for a drop. We point out that from the user point of view, this level of performance could already lead to quite useful jump-in points. Note that the typical length of a build-drop combination is between 15-20 seconds. If the user is positioned 18 seconds before the drop, the build would have already started and the user knows that a drop is coming. Using an optimized combination of timed comments and manually acquired ground truth labels we are able to achieve an ea_dist of 6 seconds.

Conclusion

Timed comments, on their own, can be used as training labels to train detectors for socially significant events. A detector trained on timed comments performs reasonably well in applications like non-linear access, where the listener wants to jump through different events in the music track without listening to it in its entirety. We hope that the dataset will encourage researchers to explore the usefulness of timed comments for all media. Additionally, we would like to point out that our work has demonstrated that the impact of temporal noise can be overcome and that the contribution of timed comments to video event detection is worth investigating further.

Contact

Should you have any inquiries or questions about the dataset, do not hesitate to contact us via email at: n.k.yadati@tudelft.nl

References

[1] K. Yadati, M. Larson, C. Liem and A. Hanjalic, “Detecting Socially Significant Music Events using Temporally Noisy Labels,” in IEEE Transactions on Multimedia. 2018. http://ieeexplore.ieee.org/document/8279544/

[2] M. Butler, Unlocking the Groove: Rhythm, Meter, and Musical Design in Electronic Dance Music, ser. Profiles in Popular Music. Indiana University Press, 2006 

[3] http://osf.io/eydxk

[4] http://www.music-ir.org/mirex/wiki/2017:Audio_Onset_Detection

[5] https://developers.soundcloud.com/docs/api/guide

[6] https://developers.soundcloud.com/docs/api/guide

[7] https://research.google.com/audioset/

[8] H. Y. Lo, J. C. Wang, H. M. Wang and S. D. Lin, “Cost-Sensitive Multi-Label Learning for Audio Tag Annotation and Retrieval,” in IEEE Transactions on Multimedia, vol. 13, no. 3, pp. 518-529, June 2011. http://ieeexplore.ieee.org/document/5733421/

[9] http://majorminer.org/info/intro

[10] http://www.music-ir.org/mirex/wiki/2016:Audio_Tag_Classification

[11] https://soundcloud.com/spinninrecords/ummet-ozcan-lose-control-original-mix

JPEG Column: 78th JPEG Meeting in Rio de Janeiro, Brazil

The JPEG Committee had its 78th meeting in Rio de Janeiro, Brazil. Relevant to its ongoing standardization efforts in JPEG Privacy and Security, JPEG organized a special session to explore how to support blockchain and distributed ledger technologies to past, ongoing and future JPEG family of standards. This is motivated by the fact that considering the potential impact of such technologies in the future of multimedia, standardization will be required to enable interoperability between different systems and services of imaging relying on blockchain and distributed ledger technologies.

Blockchain and distributed ledger technologies are behind the well-known crypto-currencies. These technologies can provide means for content authorship, or intellectual property and rights management control of the multimedia information. New possibilities can be made available, namely support for tracking online use of copyrighted images and ownership of the digital content.

IMG_3596_half

JPEG meeting session.

Rio de Janeiro JPEG meetings comprise mainly the following highlights:

  • JPEG explores blockchain and distributed ledger technologies
  • JPEG 360 Metadata
  • JPEG XL
  • JPEG XS
  • JPEG Pleno
  • JPEG Reference Software
  • JPEG 25th anniversary of the first JPEG standard

The following summarizes various activities during JPEG’s Rio de Janeiro meeting.

JPEG explores blockchain and distributed ledger technologies

During the 78th JPEG meeting in Rio de Janeiro, the JPEG committee organized a special session on blockchain and distributed ledger technologies and their impact on JPEG standards. As a result, the committee decided to explore use cases and standardization needs related to blockchain technology in a multimedia context. Use cases will be explored in relation to the recently launched JPEG Privacy and Security, as well as in the broader landscape of imaging and multimedia applications. To that end, the committee created an ad hoc group with the aim to gather input from experts to define these use cases and to explore eventual needs and advantages to support a standardization effort focused on imaging and multimedia applications. To get involved in the discussion, interested parties can register to the ad hoc group’s mailing list. Instructions to join the list are available on http://jpeg-blockchain-list.jpeg.org

JPEG 360 Metadata

The JPEG Committee notes the increasing use of multi-sensor images from multi-sensor devices, such as 360 degree capturing cameras or dual-camera smartphones available to consumers. Images from these cameras are shown on computers, smartphones, and Head Mounted Displays (HMDs). JPEG standards are commonly used for image compression and file format. However, because existing JPEG standards do not fully cover these new uses, incompatibilities have reduced the interoperability of their images, and thus reducing the widespread ubiquity, which consumers have come to expect when using JPEG files. Additionally, new modalities for interacting with images, such as computer-based augmentation, face-tagging, and object classification, require support for metadata that was not part of the original scope of JPEG.  A set of such JPEG 360 use cases is described in JPEG 360 Metadata Use Cases document. 

To avoid fragmentation in the market and to ensure wide interoperability, a standard way of interacting with multi-sensor images with richer metadata is desired in JPEG standards. JPEG invites all interested parties, including manufacturers, vendors and users of such devices to submit technology proposals for enabling interactions with multi-sensor images and metadata that fulfill the scope, objectives and requirements that are outlined in the final Call for Proposals, available on the JPEG website.

To stay posted on JPEG 360, please regularly consult our website at jpeg.org and/or subscribe to the JPEG 360 e-mail reflector.

JPEG XL

The Next-Generation Image Compression activity (JPEG XL) has produced a revised draft Call for Proposals, and intends to publish a final Call for Proposals (CfP) following its 79th meeting (April 2018), with the objective of seeking technologies that fulfill the objectives and scope of the Next-Generation Image Compression. During the 78th meeting, objective and subjective quality assessment methodologies for anchor and proposal evaluations were discussed and analyzed. As outcome of the meeting, source code for objective quality assessment has been made available.

The draft Call for Proposals, with all related info, can be found in JPEG website. Comments are welcome and should be submitted as specified in the document. To stay posted on the action plan for JPEG XL, please regularly consult our website at jpeg.org and/or subscribe to our e-mail reflector.

 

JPEG XS

Since its previous 77th meeting, subjective quality evaluations have shown that the initial quality requirement of the JPEG XS Core Coding System has been met, i.e. a visually lossless quality at a compression ratio of 6:1 for large majority of images under test has been met. Several profiles are now under development in JPEG XS, as well as transport and container formats. JPEG committee therefore invites interested parties – in particular coding experts, codec providers, system integrators and potential users of the foreseen solutions – to contribute to the furthering of the specifications in the above directions. Publication of the International Standard is expected for Q3 2018.

JPEG Pleno

JPEG Pleno activity is currently divided into Pleno Light Field, Pleno Point Cloud and Pleno Holography. JPEG Pleno Light Field has been preparing a third round of core experiments for assessing the impact of individual coding modules on the overall rate-distortion performance. Moreover, it was decided to pursue with collecting additional test data, and progress with the preparation of working documents for JPEG Pleno specifications Part 1 and Part 2.

Furthermore, quality modelling studies are under consideration for both JPEG Pleno Point Clouds, and JPEG Pleno Holography. In particular, JPEG Pleno Point Cloud is considering a set of new quality metrics provided as contributions to this work item. It is expected that the new metrics replace the current state of the art as they have shown superior correlation with subjective quality as perceived by humans. Moreover, new subjective assessment models have been tested and analysed to better understand the perception of quality for such new types of visual information.

JPEG Reference Software

The JPEG committee is pleased to announce that its first JPEG image coding specifications is now augmented by a new part, ISO/IEC 10918-7, that contains a reference software. The proposed candidate software implementations have been checked for compliance with 10918-2. Considering the positive results, this new part of the JPEG standard will continue to evolve quickly. 

RioJanView27332626_10155421780114370_2546088045026774482_n

JPEG meeting room window view during a break.

JPEG 25th anniversary of the first JPEG standard

JPEG’s first standard third and final 25th anniversary celebration is planned at its next 79th JPEG meeting taking place in La Jolla, CA, USA. The anniversary will be marked by a 2 hours workshop on Friday 13th April on current and emerging JPEG technologies, followed by a social event where past JPEG committee members with relevant contributions will be awarded.

Final Quote

“Blockchain and distributed ledger technologies promise a significant impact on the future of many fields. JPEG is committed to provide standard mechanisms to apply blockchain on multimedia applications in general and on imaging in particular. said Prof. Touradj Ebrahimi, the Convenor of the JPEG Committee.

 

About JPEG

The Joint Photographic Experts Group (JPEG) is a Working Group of ISO/IEC, the International Organisation for Standardization / International Electrotechnical Commission, (ISO/IEC JTC 1/SC 29/WG 1) and of the International Telecommunication Union (ITU-T SG16), responsible for the popular JBIG, JPEG, JPEG 2000, JPEG XR, JPSearch and more recently, the JPEG XT, JPEG XS, JPEG Systems and JPEG Pleno families of imaging standards.

The JPEG Committee meets nominally four times a year, in different world locations. The latest 77th meeting was held from 21st to 27th of October 2017, in Macau, China. The next 79th JPEG Meeting will be held on 9-15 April 2018, in La Jolla, California, USA.

More information about JPEG and its work is available at www.jpeg.org or by contacting Antonio Pinheiro or Frederik Temmermans (pr@jpeg.org) of the JPEG Communication Subgroup.

If you would like to stay posted on JPEG activities, please subscribe to the jpeg-news mailing list on http://jpeg-news-list.jpeg.org.  

Future JPEG meetings are planned as follows:

  • No 79, La Jolla (San Diego), CA, USA, April 9 to 15, 2018
  • No 80, Berlin, Germany, July 7 to13, 2018
  • No 81, Vancouver, Canada, October 13 to 19, 2018

 

 

MPEG Column: 121st MPEG Meeting in Gwangju, Korea

The original blog post can be found at the Bitmovin Techblog and has been updated here to focus on and highlight research aspects.

The MPEG press release comprises the following topics:

  • Compact Descriptors for Video Analysis (CDVA) reaches Committee Draft level
  • MPEG-G standards reach Committee Draft for metadata and APIs
  • MPEG issues Calls for Visual Test Material for Immersive Applications
  • Internet of Media Things (IoMT) reaches Committee Draft level
  • MPEG finalizes its Media Orchestration (MORE) standard

At the end I will also briefly summarize what else happened with respect to DASH, CMAF, OMAF as well as discuss future aspects of MPEG.

Compact Descriptors for Video Analysis (CDVA) reaches Committee Draft level

The Committee Draft (CD) for CDVA has been approved at the 121st MPEG meeting, which is the first formal step of the ISO/IEC approval process for a new standard. This will become a new part of MPEG-7 to support video search and retrieval applications (ISO/IEC 15938-15).

Managing and organizing the quickly increasing volume of video content is a challenge for many industry sectors, such as media and entertainment or surveillance. One example task is scalable instance search, i.e., finding content containing a specific object instance or location in a very large video database. This requires video descriptors which can be efficiently extracted, stored, and matched. Standardization enables extracting interoperable descriptors on different devices and using software from different providers, so that only the compact descriptors instead of the much larger source videos can be exchanged for matching or querying. The CDVA standard specifies descriptors that fulfil these needs and includes (i) the components of the CDVA descriptor, (ii) its bitstream representation and (iii) the extraction process. The final standard is expected to be finished in early 2019.

CDVA introduces a new descriptor based on features which are output from a Deep Neural Network (DNN). CDVA is robust against viewpoint changes and moderate transformations of the video (e.g., re-encoding, overlays), it supports partial matching and temporal localization of the matching content. The CDVA descriptor has a typical size of 2–4 KBytes per second of video. For typical test cases, it has been demonstrated to reach a correct matching rate of 88% (at 1% false matching rate).

Research aspects: There are probably endless research aspects in the visual descriptor space ranging from validation of the achieved to results so far to further improving informative aspects with the goal to increase correct matching rate (and consequently decreasing the false matching rating). In general, however, the question is whether there’s a need for descriptors in the era of bandwidth-storage-computing over-provisioning and the raising usage of artificial intelligence techniques such as machine learning and deep learning.

MPEG-G standards reach Committee Draft for metadata and APIs

In my previous report I introduced the MPEG-G standard for compression and transport technologies of genomic data. At the 121st MPEG meeting, metadata and APIs reached CD level. The former – metadata – provides relevant information associated to genomic data and the latter – APIs – allow for building interoperable applications capable of manipulating MPEG-G files. Additional standardization plans for MPEG-G include the CDs for reference software (ISO/IEC 23092-4) and conformance (ISO/IEC 23092-4), which are planned to be issued at the next 122nd MPEG meeting with the objective of producing Draft International Standards (DIS) at the end of 2018.

Research aspects: Metadata typically enables certain functionality which can be tested and evaluated against requirements. APIs allow to build applications and services on top of the underlying functions, which could be a driver for research projects to make use of such APIs.

MPEG issues Calls for Visual Test Material for Immersive Applications

I have reported about the Omnidirectional Media Format (OMAF) in my previous report. At the 121st MPEG meeting, MPEG was working on extending OMAF functionalities to allow the modification of viewing positions, e.g., in case of head movements when using a head-mounted display, or for use with other forms of interactive navigation. Unlike OMAF which only provides 3 degrees of freedom (3DoF) for the user to view the content from a perspective looking outwards from the original camera position, the anticipated extension will also support motion parallax within some limited range which is referred to as 3DoF+. In the future with further enhanced technologies, a full 6 degrees of freedom (6DoF) will be achieved with changes of viewing position over a much larger range. To develop technology in these domains, MPEG has issued two Calls for Test Material in the areas of 3DoF+ and 6DoF, asking owners of image and video material to provide such content for use in developing and testing candidate technologies for standardization. Details about these calls can be found at https://mpeg.chiariglione.org/.

Research aspects: The good thing about test material is that it allows for reproducibility, which is an important aspect within the research community. Thus, it is more than appreciated that MPEG issues such a call and let’s hope that this material will become publicly available. Typically this kind of visual test material targets coding but it would be also interesting to have such test content for storage and delivery.

Internet of Media Things (IoMT) reaches Committee Draft level

The goal of IoMT is is to facilitate the large-scale deployment of distributed media systems with interoperable audio/visual data and metadata exchange. This standard specifies APIs providing media things (i.e., cameras/displays and microphones/loudspeakers, possibly capable of significant processing power) with the capability of being discovered, setting-up ad-hoc communication protocols, exposing usage conditions, and providing media and metadata as well as services processing them. IoMT APIs encompass a large variety of devices, not just connected cameras and displays but also sophisticated devices such as smart glasses, image/speech analyzers and gesture recognizers. IoMT enables the expression of the economic value of resources (media and metadata) and of associated processing in terms of digital tokens leveraged by the use of blockchain technologies.

Research aspects: The main focus of IoMT is APIs which provides easy and flexible access to the underlying device’ functionality and, thus, are an important factor to enable research within this interesting domain. For example, using these APIs to enable communicates among these various media things could bring up new forms of interaction with these technologies.

MPEG finalizes its Media Orchestration (MORE) standard

MPEG “Media Orchestration” (MORE) standard reached Final Draft International Standard (FDIS), the final stage of development before being published by ISO/IEC. The scope of the Media Orchestration standard is as follows:

  • It supports the automated combination of multiple media sources (i.e., cameras, microphones) into a coherent multimedia experience.
  • It supports rendering multimedia experiences on multiple devices simultaneously, again giving a consistent and coherent experience.
  • It contains tools for orchestration in time (synchronization) and space.

MPEG expects that the Media Orchestration standard to be especially useful in immersive media settings. This applies notably in social virtual reality (VR) applications, where people share a VR experience and are able to communicate about it. Media Orchestration is expected to allow synchronizing the media experience for all users, and to give them a spatially consistent experience as it is important for a social VR user to be able to understand when other users are looking at them.

Research aspects: This standard enables the social multimedia experience proposed in literature. Interestingly, the W3C is working on something similar referred to as timing object and it would be interesting to see whether these approaches have some commonalities.


What else happened at the MPEG meeting?

DASH is fully in maintenance mode and we are still waiting for the 3rd edition which is supposed to be a consolidation of existing corrigenda and amendments. Currently only minor extensions are proposed and conformance/reference software is being updated. Similar things can be said for CMAF where we have one amendment and one corrigendum under development. Additionally, MPEG is working on CMAF conformance. OMAF has reached FDIS at the last meeting and MPEG is working on reference software and conformance also. It is expected that in the future we will see additional standards and/or technical reports defining/describing how to use CMAF and OMAF in DASH.

Regarding the future video codec, the call for proposals is out since the last meeting as announced in my previous report and responses are due for the next meeting. Thus, it is expected that the 122nd MPEG meeting will be the place to be in terms of MPEG’s future video codec. Speaking about the future, shortly after the 121st MPEG, Leonardo Chiariglione published a blog post entitled “a crisis, the causes and a solution”, which is related to HEVC licensing, Alliance for Open Media (AOM), and possible future options. The blog post certainly caused some reactions within the video community at large and I think this was also intended. Let’s hope it will galvanice the video industry — not to push the button — but to start addressing and resolving the issues. As pointed out in one of my other blog posts about what to care about in 2018, the upcoming MPEG meeting in April 2018 is certainly a place to be. Additionally, it highlights some conferences related to various aspects also discussed in MPEG which I’d like to republish here:

  • QoMEX — Int’l Conf. on Quality of Multimedia Experience — will be hosted in Sardinia, Italy from May 29-31, which is THE conference to be for QoE of multimedia applications and services. Submission deadline is January 15/22, 2018.
  • MMSys — Multimedia Systems Conf. — and specifically Packet Video, which will be on June 12 in Amsterdam, The Netherlands. Packet Video is THE adaptive streaming scientific event 2018. Submission deadline is March 1, 2018.
  • Additionally, you might be interested in ICME (July 23-27, 2018, San Diego, USA), ICIP (October 7-10, 2018, Athens, Greece; specifically in the context of video coding), and PCS (June 24-27, 2018, San Francisco, CA, USA; also in the context of video coding).
  • The DASH-IF academic track hosts special events at MMSys (Excellence in DASH Award) and ICME (DASH Grand Challenge).
  • MIPR — 1st Int’l Conf. on Multimedia Information Processing and Retrieval — will be in Miami, Florida, USA from April 10-12, 2018. It has a broad range of topics including networking for multimedia systems as well as systems and infrastructures.
 

Report from ACM Multimedia 2017 – by Benoit Huet

 

Best #SIGMM Social Media Reporter Award! Me? Really?? fig_huet_1

This was my reaction after being informed by the SIGMM Social Media Editors that I was one of the two recipients following ACM Multimedia 2017! #ACMMM What a wonderful idea this is to encourage our community to communicate, both internally and to other related communities, about our events, our key research results and all the wonderful things the multimedia community stands for!  I have always been surprised by how limited social media engagement is within the multimedia community. Your initiative has all my support! Let’s disseminate our research interest and activities on social media! @SIGMM #Motivated

fig_huet_2

The SIGMM flagship conference took place on October 23-27 at the Computer History Museum in Mountain View California, USA. For its 25th edition, the organizing committee had prepared an attractive program cleverly mixing expected classics (i.e. Best Paper session, Grand Challenges, Open Source software competition, etc…) and brand new sessions (such as Fast Forward and Thematic Workshops, Business Idea Venture, and the Novel Topics Track). In this last edition, the conference adopted a single paper length, removing the boundary between long and short papers. The TPC Co-Chairs and Area Chairs had the responsibility of directing accepted papers to either an oral session or a thematic workshop.

Thematic workshops took the form of poster presentations. Presenters were asked to provide a short video briefly motivating their work with the intention of making them available online for reference after the conference (possibly with a link to the full paper and the poster!). However, this did not come through as publication permissions were not cleared out in time, but the idea is interesting and should be taken into account for future editions. Fast forward (or Thematic workshop pitches) are short targeted presentations aimed at attracting the audience to the Thematic Workshop where the papers are presented (in the form of posters in this case). While such short presentations allow conference attendees to efficiently identify which poster are relevant to them, it is crucial for presenters to be well prepared and concentrate on highlighting one key research idea, as time is very limited. It also gives more exposure to poster. I would be in favor of keeping such sessions for future ACM Multimedia editions.

The 25th edition of ACM MM wasn’t short of keynotes. No less than 6 industry keynotes had punctuated each of the conference half day. The first keynote by Achin Bhowmik from Starkey focused on Audio as a mean to “Enhancing and Augmenting Human Perception with Artificial Intelligence”. Bill Dally from NVidia presented “Efficient Methods and Hardware for Deep Learning”, in short why we all need GPUs! “Building Multi-Modal Interfaces for Smartphones” was the topic presented by Injong Rhee (Samsung Electronics), Scott Silver (YouTube) discussed the difficulties in “Bringing a Billion Hours to Life” (referring to the vast quantities of videos uploaded and viewed on the sharing platform, and the long tail). Ed. Chang from HTC presented “DeepQ: Advancing Healthcare Through AI and VR” and demonstrated how healthcare is and will benefit from AR, VR and AI. Danny Lange from Unity Technologies highlighted how important machine learning and deep learning are in the game industry in ”Bringing Gaming, VR, and AR to Life with Deep Learning”.  Personally, I would have preferred a mix of industry/academic keynotes as I found some of the keynotes not targeting an audience of computer scientists.

Arnold W. M. Smeulders received the SIGMM Technical Achievement Award for his outstanding and pioneeringfig_huet_3 contribution defining and bridging the semantic gap in content based image retrieval (his lecture is here: https://youtu.be/n8kLxKNjQ0A). His talk was sharp, enlightening and very well received by the audience.

The @sigmm rising star award went to Dr Liangliang Cao for his contribution to large-scale multimedia recognition and social media mining.

The conference was noticeably flavored with trendy topics such as AI, Human augmenting technologies, Virtual and Augmented Reality, and Machine (Deep) Learning, as can be noticed from the various works rewarded.

The Best Paper award was given to Bokun Wang, Yang Yang, Xing Xu, Alan Hanjalic, Heng Tao Shen for their work on “Adversarial Cross-Modal Retrieval“.

Yuan Tian, Suraj Raghuraman, Thiru Annaswamy, Aleksander Borresen, Klara Nahrstedt, Balakrishnan Prabhakaran received the Best Student Paper award for the paper “H-TIME: Haptic-enabled Tele-Immersive Musculoskeletal Examination“.

The Best demo award went to “NexGenTV: Providing Real-Time Insight during Political Debates in a Second Screen Application” by Olfa Ben Ahmed, Gabriel Sargent, Florian Garnier, Benoit Huet, Vincent Claveau, Laurence Couturier, Raphaël Troncy, Guillaume Gravier, Philémon Bouzy and Fabrice Leménorel.

The Best Open source software award was received by Hao Dong, Akara Supratak, Luo Mai, Fangde Liu, Axel Oehmichen, Simiao Yu, Yike Guo for “TensorLayer: A Versatile Library for Efficient Deep Learning Development“.

The Best Grand Challenge Video Captioning Paper award went to “Knowing Yourself: Improving Video Caption via In-depth Recap“, by Qin Jin, Shizhe Chen, Jia Chen, Alexander Hauptmann.

The Best Grand Challenge Social Media Prediction Paper award went to Chih-Chung Hsu, Ying-Chin Lee, Ping-En Lu, Shian-Shin Lu, Hsiao-Ting Lai, Chihg-Chu Huang,Chun Wang, Yang-Jiun Lin, Weng-Tai Su for “Social Media Prediction Based on Residual Learning and Random Forest“.

Finally, the Best Brave New Idea Paper award was conferred to John R Smith, Dhiraj Joshi, Benoit Huet, Winston Hsu and Zef Cota for the paper “Harnessing A.I. for Augmenting Creativity: Application to Movie Trailer Creation“.

A few years back, the multimedia community was concerned with the lack of truly multimedia publications. In my opinion, those days are behind us. The technical program has evolved into a richer and broader one, let’s keep the momentum!

The location was a wonderful opportunity for many of the attendees to take a stroll down memory lane and see fig_huet_4computers and devices (VT100, PC, etc…) from the past thanks to the complementary entrance to the museum exhibitions. The “isolated” location of the conference venue meant going out for lunch breaks was out of the question given the duration of the lunch break. As a solution, the organizers catered buffet lunches. This resulted in the majority of the attendees interacting and mixing over the lunch break while eating. This could be an effective way to better integrate new participants and strengthen the community.  Both the welcome reception and the banquet were held successfully within Computer Museum. Both events offer yet another opportunity for new connections to be made and for further interaction between attendees. Indeed, the atmosphere of both occasions was relaxed, lively and joyful. 

All in all, ACM MM 2017 was another successful edition of our flagship conference, many thanks to the entire organizing team and see you all in Seoul for ACM MM 2018 http://www.acmmm.org/2018/ and follow @sigmm on Twitter!

Report from ACM Multimedia 2017 – by Conor Keighrey

conor1

My name is Conor Keighrey, I’m a PhD. candidate at the Athlone Institute Technology in Athlone, Co. Westmeath, Ireland.  The focus of my research is to understand the key influencing factors that affect Quality of Experience (QoE) in emerging immersive multimedia experiences, with a specific focus on applications in the speech and language therapy domain. I am funded for this research by the Irish Research Council Government of Ireland Postgraduate Scholarship Programme. I’m delighted to have been asked to present this report to the SIGMM community as a result of my social media activity at ACM Multimedia Conference.

Launched in 1993, the ACM Multimedia (ACMMM) Conference held its 25th anniversary event in the Mountain View, California. The conference was located in the heart of Silicon Valley, at the inspirational Computer History Museum.

Under five focal themes, the conference called for multimedia papers which focused on topics relating to multimedia: Experience, Systems and Applications, Understanding, Novel Topics, and Engagement.

Keynote addresses were delivered by high-profile industry leading experts from the field of multimedia. These talks provided insight into the active development from the following experts:

  • Achin Bhowmik (CTO & EVP, Starkey, USA)
  • Bill Dally (Senior Vice President and Chief Scientist, NVidia, USA)
  • Injong Rhee (CTO & EVP, Samsung Electronics, Korea)
  • Edward Y. Chang (President, HTC, Taiwan)
  • Scott Silver (Vice President, Google, USA)
  • Danny Lange (Vice President, Unity Technologies, USA)

Some keynote highlights include Bill Dally’s talk on “Efficient Methods and Hardware for Deep Learning”. Bill provided insight into the work NVidia are doing with neural networks, the hardware which drives them, and the techniques the company are using to make them more efficient. He also highlighted how AI should not be thought of as a mechanism which replaces, but empower humans, thus allowing us to explore more intellectual activities.

Danny Lange of Unity Technologies discussed the application of the Unity game engine to create scenarios in which machine learning models can be trained. His presentation entitled “Bringing Gaming, VR, and AR to Life with Deep Learning” described the capture of data for self-driving cars to prepare for unexpected occurrences in the real world (e.g. pedestrians activity or other cars behaving in unpredicted ways).

A number of the Keynotes were captured by FXPAL (an ACMMM Platinum Sponsor) and are available here.

With an acceptance rate of 27.63% (684 reviewed, 189 accepted), the main track at ACMMM showcased a diverse collection of research from academic institutes around the globe. An abundance of work was presented in the ever-expanding area of deep/machine learning, virtual/augmented/mixed realities, and the traditional multimedia field.

conor2

The importance of gender equality and diversity with respect to advancing careers of women in STEM has never been greater. Sponsored by SIGMM, the Women/Diversity in MM lunch took place on the first day of ACMMM. Speakers such as Prof. Noel O’Conner discussed the significance of initiatives such as Athena SWAN (Scientific Women’s Academic Network) within Dublin City University (DCU). Katherine Breeden (Pictured left), an Assistant Professor in the Department of Computer Science at Harvey Mudd College (HMC), presented a fantastic talk on gender balance at HMC. Katherine’s discussion highlighted the key changes which have occurred resulting in more women than men graduating with a degree in computer science at the college.

Other highlights from day 1 include a paper presented at the Experience 2 (Perceptual, Affect, and Interaction) session, chaired by Susanne Boll (University of Oldenburg). Researchers from the National University of Singapore presented the results of a multisensory virtual cocktail (Vocktail) experience which was well received. 

 

conor3Through the stimulation of 3 sensory modalities, Vocktails aim to create virtual flavor, and augment taste experiences through a customizable interactive drinking utensil. Controlled by a mobile device, participants of the study experienced augmented taste (electrical stimulation of the tongue), smell (micro air-pumps), and visual (RGB light projected onto the liquid) stimulus as they used the system. For more information, check out their paper entitled “Vocktail: A Virtual Cocktail for Pairing Digital Taste, Smell, and Color Sensations” on the ACM Digital Library.

Day 3 of the conference included a session entitled Brave New Ideas. The session presented a fantastic variety of work which focused on the use of multimedia technologies to enhance or create intelligent systems. Demonstrating AI as an assistive tool and winning the Best Brave New Idea Paper award, a paper entitled “Harnessing A.I. for Augmenting Creativity: Application to Movie Trailer Creation” (ACM Digital Library) describes the first-ever human machine collaboration for creating a real movie trailer. Through multi-modal semantic extraction, inclusive of audio-visual, scene analysis, and a statistical approach, key moments which characterize horror films were defined. As a result of this, the AI selected 10 scenes from a feature length film which were further developed alongside a professional film maker to finalize an exciting movie trailer. Officially released by 20th Century Fox, the complete AI trailer for the horror movie “Morgan” can be viewed here.

A new addition to the last ACMMM edition year has been the inclusion of thematic workshops. Four individual workshops (as outlined below) provided opportunity for papers which could not be accommodated within the main track to be presented to the multimedia research community. A total of 495 papers were reviewed from which 64 were accepted (12.93%). Authors of accepted papers presented their work via on-stage thematic workshop pitches, which were followed by poster presentations on Monday the 23rd and Friday the 27th. The workshop themes were as follows:

  • Experience (Organised by Wanmin Wu)
  • Systems and Applications (Organised by Roger Zimmermann & He Ma)
  • Engagement (Organised by Jianchao Yang)
  • Understanding (Organised by Qi Tian)

Presented as part of the thematic workshop pitches, one of the most fascinating demos at the conference was a body of work carried out by Audrey Ziwei Hu (University of Toronto). Her paper entitled “Liquid Jets as Logic-Computing Fluid-User-Interfaces” describes a fluid (water) user interface which is presented as a logic-computing device. Water jets form a medium for tactile interaction and control to create a musical instrument known as a hydraulophone.

conor4Steve Mann (Pictured left) from Stanford University, who is regarded as “The Father of Wearable Computing”, provided a fantastic live demonstration of the device. The full paper can be found on the ACM Digital Library, and a live demo can be seen here.

In large scale events such ACMMM, the importance of social media reporting/interaction has never been greater. More than 250 social media interactions (tweets, retweets, and likes) were monitored using the #SIGMM and #ACMMM hashtags, as outlined by the SIGMM Records prior to the event. Descriptive (and multimedia enhanced) social media reports provide a chance for those who encounter an unavoidable schedule overlap, and an opportunity to gather some insight into alternative works presented at the conference.

From my own perspective (as a PhD. student), the most important aspect of social media interaction is that reports often serve as a conversational piece. Developing a social presence throughout the many coffee breaks and social events during the conference is key to the success of building a network of contacts within any community. As a newcomer this can often be a daunting task, recognition of other social media reporters offers the perfect ice-breaker, providing opportunity to discuss and inform each other of the on-going work within the multimedia community. As a result of my own online reporting, I was recognized numerous times throughout the conference. Staying active on social media often leads to the development of a research audience, and social media presence among peers. Engaging in such an audience is key to the success of those who wish to follow a path in academia/research.

Building on my own personal experience, continued attendance to SIGMM conferences (irrespective of paper submission) has so many advantages. While the predominant role of a conference is to disseminate work, the informative aspect of attending such events is often overlooked. The area of multimedia research is moving at a fast pace, and thus having the opportunity to engage directly with researchers in your field of expertise is of upmost importance. Attendance to ACMMM and other SIGMM conferences, such ACM Multimedia Systems, has inspired me to explore alternative methodologies within my own respective research. Without a doubt, continued attendance will inspire my research as I move forward.

ACM Multimedia ‘18 (October 22nd – 26th) – The diverse landscape of modern skyscrapers mixed with traditional Buddhist temples, and palaces that is Seoul, South Korea, will be host to the 26th Annual ACMMM. The 2018 event will without a doubt present a variety of work from the multimedia research community. Regular paper abstracts are due on the 30th of March (Full manuscripts are due on the 8th of April). For more information on next year’s ACM Multimedia conference check out the following link: http://www.acmmm.org/2018