Gender Diversity in SIGMM: We’ll Just Leave This Here As Well


1. Introduction and Background

SIGMM is the Association for Computing Machinery’s (ACM) Special Interest Group (SIG) in Multimedia, one of 36 SIGs in the ACM family.  ACM itself was founded in 1947 and is the world’s largest educational and scientific society for computing, uniting computing educators, researchers and professionals. With almost 100,000 members worldwide, ACM is a strong force in the computing world and is dedicated to advancing the art, science, engineering, and application of information technology.

SIGMM has been operating for nearly 30 years and sponsors 5, soon to be 6, major international conferences each year as well as dozens of workshops and an ACM Transactions Journal.  SIGMM sponsors several Excellence and Achievement Awards each year, including awards for Technical Achievement, Rising Star, Outstanding PhD Thesis, TOMM best paper, and Best TOMM Associate Editor award. SIGMM funds student travel scholarships to almost all our conferences with nearly 50 such student travel grants at the flagship MULTIMEDIA conference in Seoul, Korea, in 2018.  SIGMM has two active chapters, one in the Bay Area of San Francisco and one in China. It has a very active online activity with social media reporters at our conferences, a regular SIGMM Records newsletter, and a weekly news digest.  At our flagship conference, SIGMM sponsors Women and diversity lunches, Doctoral Symposiums, and a newcomers’ welcome breakfast.  SIGMM also funds special initiatives based on suggestions/proposals from the community as well as a newly-launched conference ambassador program to reach out to other ACM SIGs for collaborations across our conferences.

It is generally accepted that SIGMM has a diversity and inclusion problem which exists at all levels, but we have now realized this and have started to take action.  In September 2017 ACM SIGARCH produced the first of a series of articles on gender diversity in the field of Computer Architecture. SIGARCH members looked at their numbers of representation of women in SIGARCH conferences over the previous 2 years and produced the first of a set of reports entitled “Gender Diversity in Computer Architecture: We’re Just Going to Leave This Here”.


This report generated much online debate and commentary, including at the ACM SIG Governing Board (SGB) meetings in 2017 and in 2018.

At a SIGMM Executive Committee meeting in Mountain View, California in October 2017, SIGMM agreed to replicate the SIGARCH study to examine and measure, the (lack of) gender diversity at SIGMM-sponsored Conferences.  We issued a call offering funding support to do this, but there were no takers, so I did this myself, from within my own research lab.

2. Baselines for Performance Comparison

Before jumping into the numbers it is worth establishing a baseline to measure against. As an industry-wide figure, 17-24% of Computer Science undergrads at US R1 institutions are female as are 17% of those with technical roles at large high-tech companies that report diversity. I also looked at the female representation within some of the other ACM SIGs. While we must accept that inclusiveness and diversity is not just about gender but also about race, ethnicity, nationality, even about institution, we don’t have data on these other aspects so I focus just on gender diversity.

So how does SIGMM compare to other SIGs? Let’s look at SIG memberships using data provided by ACM.

The best (most balanced or least imbalanced) SIGs are CSE (Computer Science Education) with 25% female, Computer Human Interaction (CHI) also with 25% female from among those declaring a gender, though CHI is probably better because it has a greater percentage of undeclared gender, thus a lower proportion of males. The worst SIGs (most imbalanced or least balanced) are PLAN (Programming Languages) with 4% female, and OPS (operating systems) with 5% female.


The figures for SIGMM show 9% female membership with 17% unknown or not declaring which means that among the declared members it is just below 11%. Among the other SIGs this makes us closest to AI (Artificial Intelligence) and to IR (Information Retrieval), though SIGIR has a larger number of members with gender undeclared.


Measuring this against overall ACM memberships we find that ACM members are 68% male, 12% female and 20% undeclared. This makes SIGMM quite mid-table compared to other SIGs, but we’re all doing badly and we all have an imbalance. Interestingly, the MULTMEDIA Conference in 2018 in Seoul, Korea had 81% male, 18% female and 1% other/undeclared attendees, slightly better than our memberships ratio but still not good.

3. Gender Balance at SIGMM Conferences

We [1] carried out a desk study for the 3 major SIGMM conferences, namely MULTIMEDIA with an average attendance of almost 800, the International Conference on Multimedia Retrieval (ICMR) with 230 attendees at the last conference and Multimedia Systems (MMSys) with about 130 attendees. For each of the last 5 years we trawled through the conference websites, extracting the names/affiliations of the organizing committees, the technical program committees and the invited keynote speakers.  We did likewise for the SIGMM award winners. This required us determining gender for over 2,700 people and although there were duplicates as the same people can recur on the program committees for multiple years and over multiple conferences. Some of these were easy like “John” and “Susanne”, but these were few so for the others we searched for them on the web. If we were still searching after 5 minutes, we gave up. [2]

[1] This work was carried out by Agata Wolski, a Summer intern student, and I, during Summer 2018.

[2] The data gathered from this activity is available on request from

The figures for each of these annual conferences for a 5-year period for MULTIMEDIA, for a 4-year period for ICMR and for a 3-year period for MMSys, are shown in the following sequence of charts, first showing the percentages and then the raw numbers, for each conference.







So what do the figures mean in comparison to each other and to our baseline?

The results tell us the following:

  • Almost all the percentages for female participation in the organisation of all SIGMM conferences are above the SIGMM membership figure of 9% which is really closer to 11% when discounting those SIGMM members with gender unassigned yet we know the number of female SIGMM members is much already smaller compared to the 17% female in technology companies and the almost 18% female ACM members when discounting unassigned genders.
  • Even if we were to use 17% to 18% figures as our baseline, our female participation in SIGMM conference organisation is less than that baseline, meaning our female SIGMM members are not appearing in organisational and committee roles as per our membership pro rates would indicate they should.
  • While each of our conferences fall below these pro rata figures, none of the three conferences are particularly worse than the others.

4. Initiatives Elsewhere to Redress Gender Imbalance

I then examined some of the actions that are carried out elsewhere and that SIGMM could implement, and started by looking at other ACM SIGs.  There I found that some of the other SIGs do some of the following:

  • women and diversity events at conferences (breakfasts or lunches, like SIGMM does)
  • Women-only networking pre-conference meals at conferences
  • Women-only technical programme events like N2Women
  • Formation of mentoring group (using Slack) for informal mentoring
  • Highlighting the roles and achievements of women on social media and in newsletters
  • Childcare and companion travel grants for conference attendance

I then looked more broadly at other initiatives and found the following:

  • gender quotas
  • accelerator programs like Athena Swan
  • female-only events like workshops
  • reports like this which act as spotlights

When we put these all together there are three recurring themes which appear across various initiatives:

  1. Networking .. encouraging us to be part of a smaller group within a larger group. This is a natural human trait of us being tribal, we like to belong to groups starting with our family but also the people we have lunch with, go to yoga classes with, go on holidays with, we each have multiple sometimes non-overlapping groups or tribes that we like to be part of. One such group is the network of minority/women that gets formed as a result of some of the activities.
  2. Peer-to-peer buddying .. again there is a natural human trait whereby older siblings (sisters) tend to help younger ones throughout life, from when we are very young and right throughout life.  The buddying activity reflects this and gives a form of satisfaction to the older or senior buddy, as well as practical benefit to the younger or more junior buddy.
  3. Role models .. there are several initiatives which try to promote role models as those kinds of people that we ourselves can try to aspire to be.  More often that not, it is the very successful people and the high flyers who are put into these positions of role models whereas in practice not everyone actually wants to aspire to be a high flyer.  For many people success in their lives means something different, something less lofty and aspirational and when we see high flying successful people promoted as role models our reaction can be the opposite. We can reject them because we don’t want to be in their league and as a result we can feel depressed and regard ourselves as under-achievers, thus defeating the purpose of having role models in the first place.

5. SIGMM Women’s / Diversity Lunch at MULTIMEDIA 2018

At the ACM MULTIMEDIA Conference in Seoul, Korea in October 2018 SIGMM once again organised a women’s / diversity lunch and about 60 people attended, mostly women.


At the event I gave a high level overview of the statistics presented earlier in this report, and then in order to gather feedback from the audience we held a moderated discussion with PadLet used to gather feedback. PadLet is an online bulletin board used to display information (text, images or links) which can be contributed anonymously from an audience. Attendees at the lunch scanned a QR code on their smartphones which opened a browser and allowed them to post comments on the big screen in response to a topic being discussed during the meeting.

The first topic discussed was “What brings you to the MULTMEDIA Conference?

  • The answers (anonymous comments) posted included that many are here because they are presenting papers or posters, many want to do networking and to share ideas, to help build the community of like-minded researchers, some are attending in order to meet old friends .. and these are the usual reasons for attending a conference.

For the second topic we asked “What excites you about multimedia as a topic, how did you get into the area?

  • The answers included the interaction between computer vision and language, the novel applications around multimodality, the multidisciplinary nature and the practical nature of the subject, and the diversity of topics and the people attending.

The third topic was “What is more/less important for you … networking, role models or peer buddies?

  • From the answers to this, networking was almost universally identified as the most important, and as a follow-on from that, interacting with peers

Finally we asked “Do you know of an initiative that works, or that you would like to see at SIGMM event(s)?

  • A variety of suggestions were put forward including holding hackathons, funding undergraduate students from local schools to attend the conference, an ACM award for women only, ring-fenced funding for supporting women only, training for reviewing, and a lot of people wanted mentoring and mentor matching.

6. SIGMM Initiatives

So what will we do in SIGMM?

  • We will continue to encourage networking at SIGMM sponsored conferences. We will fund lunches like the ones at the MULTIMEDIA Conference. We also started a newcomers breakfast at the MULTIMEDIA Conference in 2018 and we will continue with this.
  • We will ensure that all our conference delegates can attend all conference events at all SIGMM conferences without extra fees. This was a SIGMM policy identified in a review of SIGMM conference some years ago but it has slipped.
  • We will not force but we will facilitate peer-to-peer buddying through the networking events at our conferences and through this we will indirectly help you identify your own role models.
  • We will appoint a diversity coordinator to oversee the women / diversity activities across our SIGMM events and this appointee will be a full member of the SIGMM Executive Committee.
  • We will offer an opportunity for all members of our SIGMM community attending our sponsored conferences, as part of their conference registration, to indicate their availability and interest in taking on an organisational role in SIGMM activities, including conference organisation and/or reviewing. This will provide for us a reserve of people from whom we can draw on their expertise and their services and we can do so in a way which promotes diversity.

These may appear to be small-scale and relatively minor because we are not getting to the roots of what causes the bias and we are not inducing change to counter the causes of the bias. However these are positive steps, steps in the right direction, and we will now have the gender and other bias issues permanently on our radars.

Report from the SIGMM Emerging Leaders Symposium 2018

The idea of a symposium to bring together the bright new talent within the SIGMM community and to hear their views on some topics within the area and on the future of Multimedia, was first mooted in 2014 by Shih-Fu Chang, then SIGMM Chair. That lead to the “Rising Stars Symposium” at the MULTIMEDIA Conference in 2015 where 12 invited speakers made presentations on their work as a satellite event to the main conference. After each presentation a respondent, typically an experienced member of the SIGMM community, gave a response or personal interpretation of the presentation. The format worked well and was very thought-provoking, though some people felt that a shorter event which could be more integrated into the conference, might work better.

For the next year, 2016, the event was run a second time with 6 invited speakers and was indeed more integrated into the main conference. The event skipped a year in 2017, but was brought back for the MULTIMEDIA Conference in 2018 and this time, rather than invite speakers we decided to have an open call with nominations, to make selection for the symposium a competitive process. We also decided to rename the event from Rising Stars Symposium, and call it the “SIGMM Emerging Leaders Symposium”, to avoid confusion with the “SIGMM Rising Star Award”, which is completely different and is awarded annually.

In July 2018 we issued a call for applications to the “Third SIGMM Emerging Leaders Symposium, 2018” which was to be held at the annual MULTIMEDIA Conference in Seoul, Korea, in October 2018. Applications were received and were evaluated by a panel consisting of the following people, and we thank them for volunteering and for their support in doing this.

Werner Bailer, Joanneum Research
Guillaume Gravier, IRISA
Frank Hopfgartner, Sheffield University
Hayley Hung, Delft University, (a previous awardee)
Marta Mrak, BBC

Based on the assessment panel recommendations, 4 speakers were included in the Symposium, namely:

Hanwang Zhang, Nanyang Technological University, Singapore
Michael Riegler, Simula, Norway
Jia Jia, Tsinghua University, China
Liqiang Nie, Shandong University, China

The Symposium took place on the last day of the main conference and was chaired by Gerald Friedland, SIGMM Conference Director.


Towards X Visual Reasoning

By Hanwang Zhang (Nanyang Technological University, Singapore)

For decades, we are interested in detecting objects and classifying them into a fixed vocabulary of lexicon. With the maturity of these “low-level” vision solutions, we are hunger for a “higher-level” representation of the visual data, so as to extract visual knowledge rather than merely bags of visual entities, allowing machines to reason about human-level decision-making. In particular, we wish an “X” reasoning, where X means eXplainable and eXplicit. In this talk, I first reviewed a brief history of symbolism and connectionism, which alternatively promote the development of AI in the past decades. In particular, though the deep neural networks — the prevailing incarnation of connectionism — have shown impressive super-human performance in various tasks, they still lag behind us in high-level reasoning. Therefore, I propose the marriage between symbolism and connectionism to take the complementary advantages of them, that is, the proposed X visual reasoning. Second, I introduced the two building blocks of X visual reasoning: visual knowledge acquisition by scene graph detection and X neural modules applied on the knowledge for reasoning. For scene graph detection, I introduced our recent progress on reinforcement learning of the scene dynamics, which can help to generate coherent scene graphs that respect visual context. For X neural modules, I discussed our most recent work on module design, algorithms, and applications in various visual reasoning tasks such as visual Q&A, natural language grounding, and image captioning. At last, I visioned some future directions towards X visual reasoning, such as using meta-learning and deep reinforcement learning for more dynamic and efficient X neural module compositions.

Professor Ramesh Jain mentioned that a truly X reasoning should consider the potential human-computer interaction that may change or digress a current reasoning path. This is crucial because human intelligence can reasonably respond to interruptions and incoming evidences.

We can position X visual reasoning in the recent trend of neural-symbolic unification, which gradually becomes our consensus towards a general AI. The “neural”’ is good at representation learning and model training, and the “symbolic” is good at knowledge reasoning and model explanation. One should bear in mind that the future multimedia system should take the complementary advantages of the “neural-symbolic”.

BioMedia – The Important Role of Multimedia Research for Healthcare

by Michael Riegler (SimulaMet & University of Oslo, Norway)

With the recent rise of machine learning, analysis of medical data has become a hot topic. Nevertheless, the analysis is still often restricted to a special type of images coming from radiology or CT scans. However, there are continuously vast amounts of multimedia data collected both within the healthcare systems and by the users using devices such as cameras, sensors and mobile phones.

In this talk I focused on the potential of multimedia data and applications to improve healthcare systems. First, a focus on the various data was given. A person’s health is contained in many data sources such as images, videos, text and sensors. Medical data can also be divided into data with hard and soft ground truth. Hard ground truth means that there are procedures that verify certain labels of the given data (for example a biopsy report for a cancerous tissue sample). Soft ground truth is data that was labeled by medical experts without a verification of the outcome. Different data types also come with different levels of security. For example activity data from sensors have a low chance to help to identify the patient whereas speech, social media, GPS come with a higher chance of identification. Finally, it is important to take context into account and results should be explainable and reproducible. This was followed by a discussion about the importance of multimodal data fusion and context aware analysis supported by three example use cases: Mental health, artificial reproduction and colonoscopy.

I also discussed the importance of involving medical experts and patients as users. Medical experts and patients are two different user groups, with different needs and requirements. One common requirement for both groups is the need for explanation about how the decisions were taken. In addition, medical experts are mainly interested in support during their daily tasks, but are not very interested in, for example, huge amounts of sensor data from patients because the increase amount of work. They have a preference on interacting with the patients than with the data. Patients on the other hand usually prefer to collect a lot of data and get informed about their current status, but are more concerned about their privacy. They also usually want that medical experts take as much data into account as possible when making their assessments.

Professor Susanne Boll mentioned that it is important to find out what is needed to make automatic analysis accepted by hospitals and who is taking the responsibility for decisions made by automatic systems. Understandability and reproducibility of methods were mentioned as an important first step.

The most relevant messages of the talk are that the multimedia community has the diverse skills needed to address several challenges related to medicine. Furthermore, it is important to focus on explainable and reproducible methods.

Mental Health Computing via Harvesting Social Media Data

By Jia Jia, Tsinghua University, China

Nowadays, with the rapid pace of life, mental health is receiving widespread attention. Common symptoms like stress, or clinical disorders like depression, are quite harmful, and thus it is of vital significance to detect mental health problems before they lead to severe consequences. Professional mental criteria like the International Classification of Diseases (ICD-10 [1]) and the Diagnostic and Statistical Manual of Mental Disorders (DSM [2]) have defined distinguishing behaviors in daily lives that help diagnosing disorders. However, traditional interventions based on face-to-face interviews or self-report questionnaires are expensive and hysteretic. The potential antipathy towards consulting psychiatrists exacerbates these problems.

Social media platforms, like Twitter and Weibo, have become increasingly prevalent for users to express themselves and interact with friends. The user-generated content (UGC) shared in such platforms may help to better understand the real-life state and emotion of users in a timely manner, making the analysis of the users’ mental wellness feasible. Underlying these discoveries, research efforts have also been devoted for early detection of mental problems.

In this talk, I focused on the timely detection of mental wellness, focusing on typical mental problems: stress and depression. Starting with binary user-level detection, I expanded the research by considering the trigger and the severity of the mental problems, involving different social media platforms that are popular in different cultures. I presented my recent progress from three prespectives:

  1. Through self-reported sentence pattern matching, I constructed a series of large-scale well-labeled datasets in the field of online mental health analysis;
  2. Based on previous psychological research, I extracted multiple groups of discriminating features for detection and presented several multi-modal models targeting at different contexts. I conducted extensive experiments with my models, demonstrating significantly better performance as compared to the state-of-the-art methods; and
  3. I investigated in detail the contribution per feature, of online behaviors and even cultural differences in different contexts. I managed to reveal behaviors not covered in traditional psychological criteria, and provided new perspectives and insights for current and future research.

My developed mental health care applications were also demonstrated in the end.

Dr. B. Prabhakaran indicated that mental health understanding is a difficult problem, even for trained doctors, and we will need to work with psychiatrist sooner than later. Thanks to his valuable comments, regarding possible future directions, I envisage the use of augmented / mixed reality to create different immersive “controlled” scenarios where human behavior can be studied. I consider for example to create stressful situations (such as exams, missing a flight, etc.), for better understanding depression. Especially for depression, I plan to incorporate EEG sensor data in my studies.



Towards Micro-Video Understanding

By Liqiang Nie, Shandong University, China

We are living in the era of ever-dwindling attention span. To feed our hunger for quick content, bite-sized videos embracing the philosophy of “shorter-is-better”, are becoming popular with the rise of micro-video sharing services. Typical services include Vine, Snapchat, Viddy, and Kwai. Micro-videos like a wildfire are very popular and taking over the content and social media marketing space, in virtue of their value in brevity, authenticity, communicability, and low-cost. Micro-videos can benefit lots of commercial applications, such as brand building. Despite their value, the analysis and modeling of micro-videos is non-trivial due to the following reasons:

  1. micro-videos are short in length and of low quality;
  2. they can be described by multiple heterogeneous channels, spanning from social, visual, and acoustic to textual modalities;
  3. they are organized into a hierarchical ontology in terms of semantic venues; and
  4. there are no available benchmark dataset on micro-videos.

In my talk, I introduced some shallow and deep learning models for micro-video understanding that are worth studying and have proven effective:

  1. Popularity Prediction. Among the large volume of micro-videos, only a small portion of them will be widely viewed by users, while most will only gain little attention. Obviously, if we can identify in advance the hot and popular micro-videos, it will benefit many applications, like the online marketing and network reservation;
  2. Venue Category Estimation. In a random sample over 2 million Vine videos, I found that only 1.22% of the videos are associated with venue information. Including location information about the videos can benefit multifaceted aspects, such as footprints recording, personalized applications, and other location-based services, it is thus highly desired to infer the missing geographic cues;
  3. Low quality sound. As the quality of the acoustic signal is usually relatively low, simply integrating acoustic features with visual and textual features often leads to suboptimal results, or even adversely degrades the overall quality.

In the future, I may try some other meaningful tasks such as micro-video captioning or tagging and detection of unsuitable content. As many micro-videos are annotated with erroneous words, namely the topic tags or descriptions are not well correlated to the content, this negatively influences other applications, such as textual query search. It is common that users upload many violence and erotic videos. At present, the detection and alert tasks mainly rely on labor-intensive inspection. I plan to create systems that automatically detect erotic and violence content.

During the presentation, the audience asked about the datasets used in my work. In my previous work, all the videos come from Vine, but this service has been closed. The audience wondered how I will build the dataset in the future. As there are many other micro-video sites, such as Kwai and Instagram, I hence can obtain sufficient data from them to support my further research.

Opinion Column: Survey on ACM Multimedia

For this edition of the Opinion Column, happening in correspondence with ACM Multimedia 2018, we launched a short community survey regarding their perception of the conference. We prepared the survey together with senior members of the community, as well as the organizers of ACM Multimedia 2019. You can find the full survey here.


Overall, we collected 52 responses. The participant sample was slightly skewed towards more senior members of the community: around 70% described themselves are full, associate or assistant professors. Almost 20% were research scientists from industry. Half of the participants were long-term contributors of the conference, having attended more than 6 editions of ACM MM, however only around a quarter of the participants had attended the last edition of MM in Seoul, Korea.

First, we asked participants to describe what ACM Multimedia means for them, using 3 words. We aggregated the responses in the word cloud below. Bigger words correspond to words with higher frequency. Most participants associated MM with prestigious and high quality content, and with high diversity of topics and modalities. While recognizing its prestige, some respondents showed their interest in a modernization of the MM focus.


Next, we asked respondents “What brings you to ACM Multimedia?”, and provided a set of pre-defined options including “presenting my research”, “networking”, “community building”,  “ACM MM is at the core of my scientific interests” and “other” (free text). 1 on 5 participants selected all options as relevant to their motivation behind attending Multimedia. The large majority of participants (65%) declare to attend ACM Multimedia to present research and do networking. By inspecting the free-text answers in the “other” option, we found that some people were interested in specific tracks, and that others see MM as a good opportunity to showcase research to their graduate students.

The next question was about paper submission. We wanted to characterize what pushes researchers to submit to ACM multimedia. We prepared 3 different statements capturing different dimensions of analysis, and asked participants to rate them on a 5-point scale, from “Strongly disagree” (1), to “Strongly agree” (5).

The distribution of agreement for each question is shown in the plot below. Participants tend to neither disagree nor agree about Multimedia as the only possible venue for their papers (average agreement score 2.9); they generally disagreed with the statement “I consider ACM Multimedia mostly to resubmit papers rejected from other venues” (average score 2.0), and strongly agreed on the idea of MM as a premier conference (average score 4.2).


One of the goals of this survey was to help the future Program Chairs of MM 2019 understand the extent to which participants agree with the reviewers’ guidelines that will be introduced in the next edition of the conference. To this end, we invited respondents to express their agreement with a fundamental point of these guidelines: “Remember that the problem [..] is expected to involve more than a single modality, or [..] how people interpret and use multimedia. Papers that address a single modality only and also fail to contribute new knowledge on human use of multimedia must be rejected as out of scope for the conference”.  Around 60% agreed or strongly agreed with this statement, while slightly more than 25% disagreed or strongly disagreed. The remaining 15% had no opinion about the statement.

We also asked participants to share with us any further comment regarding this last question or ACM MM in general. People generally approved the introduction of these reviewing guidelines, and the idea of multiple modalities and human perception and applications of multimedia. Some suggested that, given the re-focusing implied by this new reviewing guidelines, the instructions should be made more specific i.e. chairs should clarify the definition of “involve”: how multimodal should the paper be?

Others encouraged to clarify even further the broader scope of ACM Multimedia, defining its position with respect to other multimedia conferences (MMsys, MMM), but also with computer vision conferences such as CVPR/ECCV (and avoid conference dates overlapping).

Some comments proposed to rate papers based on the impact on the community, and on the level of innovation even in  a single modality, as forcing multiple modalities could “alienate” community members.

Beyond reviewing guidelines, a major theme emerging from the free-text comments was about diversity in ACM Multimedia. Several participants called for more geographic diversity in participants and paper authors. Some also noted that more turn-over in the organizing committees should be encouraged. Finally, most participants brought up the need for more balance in MM topics: it was brought up that, while most accepted papers are under the general umbrella of “Multimedia Content Understanding”, MM should encourage in the future more paper about systems, arts, and other emerging topics.

With this bottom-up survey analysis, we aimed to give voice to the major themes that the multimedia community cares about, and hope to continue doing so in the future editions of this column. We would like to thank all researchers and community members who gave their contribution by shaping and filling this survey, and allowed us to get a broader picture of the community perception of ACM MM!

An interview with Géraldine Morin

Visit in Chicago during my Ph.D. in the US.

Please describe your journey into research from your youth up to the present. What foundational lessons did you learn from this journey? Why were you initially attracted to multimedia?

My journey into research was not such a linear path (or ’straight path’ as some French institutions put it —a criteria for them to hire)… I started convinced that I wanted to be a high school math teacher. Since I was accepted in a Math and CS engineering school after a competitive exam, I did accept to study there, working in parallel towards a pure math degree.
The first year, I did manage to follow both curricula (taking two math exams in September), but it was quite a challenge and the second year I gave up on the math degree to keep following the engineering curricula.
I finished with a master degree in applied Math (back then fully included in the engineering curricula) and really enjoyed working on the Master thesis (I did my internship in Kaiserslautern, Germany) so I decided to apply for a Ph.D. grant.
I made it into the Ph.D. program in Grenoble and liked my Ph.D. topic in geometric modelling but had a hard time with my advisor there.
So I decided after two years to give up, (passed a motorcycle driving licence) and went on teaching Math in high school for a year (also passed the teacher examination). Encouraged by my former German Master thesis advisor, I then applied for a Ph.D. program at Rice University in the US to work with Ron Goldman, a researcher whose work and papers I really liked. I got the position and really enjoyed doing research there.
After a wedding, a kid, and finishing the Ph.D. (in that order) I had moved to Germany to live with my husband and found a Postdoc position in Berlin for one year. I applied then to Toulouse, where I have stayed since. In Toulouse, I was hired in a Computer Vision research group, where a subgroup of people were tackling problems in multimedia, and offered me the chance to be the 3D-person of their team :)

I learned that a career, or research path, is really shaped by the people you meet on your way, for good or bad. Perseverance for something you enjoy is certainly necessary, and not staying in a context that do not fit you is also important! I am glad I did start again after giving up at first, but also do not regret my choice to give up either.

Research topic, and research areas, are important and a good match with your close collaborators is also very relevant to me. I really enjoy the multimedia community for that matter. The people are open minded and curious, and very encouraging… At multimedia conferences I always feel that my research is valued and relevant to the field (in the other communities, CG or CV, I sometimes get a remark like, ‘oh well, I guess you are not really doing C{G|V}’ …). Multimedia also has a good balance between theory and practice, and that’s fun !

Visit in Chicago during my Ph.D. in the US.

Visit in Chicago during my Ph.D. in the US.


Tell us more about your vision and objectives behind your current roles? What do you hope to accomplish and how will you bring this about?

I just took the responsibility of a department, while we are changing the curricula. This is a lot of organisation and administrative work, but also forces me to have a larger vision of how the field of computer science is evolving and what is important to teach. Interestingly, we prepare our student for jobs that do not exist yet ! This new challenge for me, also makes me realise how important it is to keep time for research, and the open-mindedness I get from my research activity.

Can you profile your current research, its challenges, opportunities, and implications?

As I mentioned before, currently, my challenge is to be able to keep on being active in research. I follow up on two paths: first in geometric modeling, trying to bridge the gap between my current interest in skeleton based models and two hot topics that are 3D printing, and machine learning.
The second is to continue working in multimedia, in distributing 3D content in a scalable way.
Concerning my implication, I am also currently co-heading the French geometric modeling group, and I very much appreciate to promote our research community, and contribute to keep it active and recognised.

How would you describe the role of women especially in the field of multimedia?

I have participated in my first women in MM meeting in ACM, and very much appreciated it. I have to admit I was not really interested in women targeted activities before I did participate in my first women workshop (WiSH – Women in SHape) in 2013, that brought groups on women to collaborate during one week… that was a great experience, that made me realise that, despite the fact that I really enjoy working with my -almost all male- colleagues, it was also fun and very inspiring to work with women groups. Moreover, being questioned by younger colleagues about the ability for a woman to have a family and faculty job, I now think that my good experience as a faculty and mother of 3 should be shared when needed.

How would you describe your top innovative achievements in terms of the problems you were trying to solve, your solutions, and the impact it has today and into the future?

My first contributions were in a quite theoretical field : during my Ph.D. I proposed to use analytic functions in a geometric modeling context. That raised some convergence issues that I managed to prove.
Later, I really enjoyed working with collaborators and proposing a shared topic with my colleague Romulus who worked on streaming, we started in 2006 to work on 3D streaming; that led us to collaborating with Wei Tsang Ooi for the National University of Singapore and for more than 12 years, we have been now advancing some innovative solutions for the distribution of 3D content, working on adapted 3D models for me, and system solutions for them… implying along the way new colleagues. Along the way, we won the best paper award for my Ph.D. student paper in the ACM MM in 2008 (I am very proud of that —despite the fact that I could not attend the conference, I gave birth between submission and conference ;).

Over your distinguished career, what are your top lessons you want to share with the audience?

A very simple one: Enjoy what you do! and work will be fun.
For me, I am amazed thinking over new ideas always remain so exciting :)

What is the best joke you know? :)

hard one !

Jogging in the morning to N Seoul Tower for sunrise, ACM-MM 2018.

Jogging in the morning to N Seoul Tower for sunrise, ACM-MM 2018.


If you were conducting this interview, what questions would you ask, and then what would be your answers?

I have heard there are very detailed studies, especially in the US about difference between male and female behaviour.
It seems that being aware of these helps. For example, women tend to judge themselves harder that men do…
(that’s not really a question and answer, more a remark :p )

Another try:
Q: What would make you feel confident/helps you get over challenges ?
A: I think I lack self confidence, and I always ask for a lot of feedback from colleagues (for examples for dry runs).
If I get good feedback, it boosts my confidence, if I get worst feedback, it helps me improve… I win both ways :)



Assoc. Prof. Géraldine Morin: 

Je suis Maître de conférences à l’ENSEEIHT, l’une des écoles de l’Institut National Polytechnique de Toulouse de l’Université de Toulouse, et j’effectue ma recherche à l’IRIT (UMR CNRS 5505). Avant de m’installer à Toulouse, j’étais Grenobloise et j’ai été diplomée de l’ENSIMAG (diplôme d’ingénieur) et de l’ Université Joseph Fourier (D.E.A. de mathématiques appliquées) ainsi qu’une licence de maths purs que j’ai suivi en parallèle à ma première année d’école d’ingénieur. J’ai ensuite fait une thèse en Modélisation Géométrique aux Etats-Unis à (Rice University) (“Analytic Functions for Computer Aided Geometric Design”) sous la direction de Ron Goldman. Ensuite, j’ai fait un postdoc d’un an en géométrie algorithmique, à la Freie Universität de Berlin.

Predicting the Emotional Impact of Movies

Affective video content analysis aims at the automatic recognition of emotions elicited by videos. It has a large number of applications, including mood based personalized content recommendation [1], video indexing [2], and efficient movie visualization and browsing [3]. Beyond the analysis of existing video material, affective computing techniques can also be used to generate new content, e.g., movie summarization [4], personalized soundtrack recommendation to make user-generated videos more attractive [5]. Affective techniques can furthermore be used to enhance the user engagement with advertising content by optimizing the way ads are inserted inside videos [6].

While major progress has been achieved in computer vision for visual object detection, high-level concept recognition, and scene understanding, a natural further step is the modeling and recognition of affective concepts. This has recently received increasing interest from research communities, e.g., computer vision and machine learning, with an overall goal of endowing computers with human-like perception capabilities.

Efficient training and benchmarking of computational models, however, require a large and diverse collection of data annotated with ground truth, which is often difficult to collect, and particularly in the field of affective computing. To address this issue we created the LIRIS-ACCEDE dataset. In contrast to most existing datasets that contain few video resources and have limited accessibility due to copyright constraints, LIRIS-ACCEDE consists of videos with a large content diversity annotated along emotional dimensions. The annotations are made according to the expected emotion of a video, which is the emotion that the majority of the audience feels in response to the same content. All videos are shared under Creative Commons licenses and can thus be freely distributed without copyright issues. The dataset (the videos, annotations, features and protocols) are publicly available, and it is currently composed of a total of six collections.

Predicting the Emotional Impact of Movies

Credits and license information: (a) Cloudland, LateNite Films, shared under CC BY 3.0 Unported license at, (b) Origami, ESMA MOVIES, shared under CC BY 3.0 Unported license at, (c) Payload, Stu Willis, shared under CC BY 3.0 Unported license at, (d) The room of Franz Kafka, Fred. L’Epee, shared under CC BY-NC-SA 3.0 Unported license at, (e) Spaceman, Jono Schaferkotter & Before North, shared under CC BY-NC 3.0 Unported License license at

Dataset & Collections

The LIRIS-ACCEDE dataset is composed of movies and excerpts from movies under Creative Commons licenses that enable the dataset to be publicly shared. The set contains 160  professionally made and amateur movies, with different movie genres such as horror, comedy, drama, action and so on. Languages are mainly English, with a small set of Italian, Spanish, French and others subtitled in English. The set has been used to create the six collections that are part of the dataset. The two collections that were originally proposed are the Discrete LIRIS-ACCEDE collection, which contains short excerpts of movies, and the Continuous LIRIS-ACCEDE collection, which comprises long movies. Moreover, since 2015, the set has been used for tasks related to affect/emotion at the MediaEval Benchmarking Initiative for Multimedia Evaluation [7], where each year it was enriched with new data, features and annotations. Thus, the dataset also includes the four additional collections dedicated to these tasks.

The movies are available together with emotional annotations. When dealing with emotional video content analysis, the goal is to automatically recognize emotions elicited by videos. In this context, three types of emotions can be considered: intended, induced and expected emotions[8]. The intended emotion is the emotion that the film maker wants to induce in the viewers. The induced emotion is the emotion that a viewer feels in response to the movie. The expected emotion is the emotion that the majority of the audience feels in response to the same content. While the induced emotion is subjective and context dependent, the expected emotion can be considered objective, as it reflects the more-or-less unanimous response of a general audience to a given stimulus[8]. Thus, the LIRIS-ACCEDE dataset focuses on the expected emotion. The representation of emotions we are considering is the dimensional one, based on valence and arousal. Valence is defined on a continuous scale from most negative to most positive emotions, while arousal is defined continuously from calmest to most active emotions [9]. Moreover, violence annotations were provided in the MediaEval 2015 Affective Impact of Movies collection, while fear annotations were provided in the MediaEval 2016 and 2017 Emotional Impact of Movies collections.

Discrete LIRIS-ACCEDE collection A total of 160 films from various genres split into 9,800 short clips with valence and arousal annotations. More details below.
Continuous LIRIS-ACCEDE collection A total of 30 films with valence and arousal annotations per second. More details below.
MediaEval 2015 Affective Impact of Movies collection A subset of the films with labels for the presence of violence, as well as for the felt valence and arousal. More details below.
MediaEval 2016 Emotional Impact of Movies collection A subset of the films with score annotations for the expected valence and arousal. More details below.
MediaEval 2017 Emotional Impact of Movies collection A subset of the films with valence and arousal values and a label for the presence of fear for each 10 second segment, as well as precomputed features. More details below.
MediaEval 2018 Emotional Impact of Movies collection A subset of the films with valence and arousal values for each second, begin-end times of scenes containing fear, as well as precomputed features. More details below.

Ground Truth

The ground truth for the Discrete LIRIS-ACCEDE collection consists of the ranking of all video clips along both valence and arousal dimensions. These rankings were obtained thanks to a pairwise video clips comparison protocol that has been designed to be used through crowdsourcing (with CrowdFlower service). Thus, for each pair of video clips presented to raters, they had to select the one which conveyed most strongly the given emotion in terms of valence or arousal. The high inter-annotator agreement that was achieved reflects that annotations were fully consistent, despite the large diversity of our raters’ cultural backgrounds. Affective ratings (scores) were also collected for a subset of the 9,800 movies in order to cross-validate the crowdsourced annotations. The affective ratings also made learning of Gaussian Processes for Regression possible, to model the noisiness from measurements and map the whole ranked LIRIS-ACCEDE dataset into the 2D valence-arousal affective space. More details can be found in [10].

To collect the ground truth for the continuous and MediaEval 2016, 2017 and 2018 collections, which consisted of valence and arousal scores for every movie second, French annotators had to continuously indicate their level of valence and arousal while watching the movies using a modified version of the GTrace annotation tool [16] and a joystick. Each annotator continuously annotated one subset of the movies considering the induced valence, and another subset considering the induced arousal. Thus, each movie was continuously annotated by three to five different annotators. Then, the continuous valence and arousal annotations from the annotators were down-sampled by averaging the annotations over windows of 10 seconds with a shift of 1 second overlap (i.e., yielding 1 value per second) in order to remove any noise due to unintended movements of the joystick. Finally, the post-processed continuous annotations were averaged in order to create a continuous mean signal of the valence and arousal self-assessments, ranging from -1 (most negative for valence, most passive for arousal) to +1 (most positive for valence, most active for arousal). The details of this process are given in [11].

The ground truth for violence annotation, used in the MediaEval 2015 Affective Impact of Movies collection, was collected as follows. First, all the videos were annotated separately by two groups of annotators from two different countries. For each group, regular annotators labeled all the videos, which were then reviewed by master annotators. Regular annotators were graduate students (typically single with no children) and master annotators were senior researchers.  Within each group, each video received 2 different annotations, which were then merged by the master annotators into the final annotation for the group. Finally, the achieved annotations from the two groups were merged and reviewed once more by the task organizers. The details can be found in [12].

The ground truth for fear annotations, used in the MediaEval 2017 and 2018 Emotional Impact of Movies collections, was generated using a tool specifically designed for the classification of audio-visual media allowing to perform annotation while watching the movie (at the same time). The annotations have been realized by two well-experienced team members of NICAM [17], both of them trained in classification of media. Each movie was annotated by one annotator reporting the start and stop times of each sequence in the movie expected to induce fear.


Through its six collections, the LIRIS-ACCEDE dataset constitutes a dataset of choice for affective video content analysis. It is one of the largest dataset for this purpose, and is regularly enriched with new data, features and annotations. In particular, it is used for the Emotional Impact of Movies tasks at MediaEval Benchmarking Initiative for Multimedia Evaluation. As all the movies are under Creative Commons licenses, the whole dataset can be freely shared and used by the research community, and is available at

Discrete LIRIS-ACCEDE collection [10]
In total 160 films and short films with different genres were used and were segmented into 9,800 video clips. The total time of all 160 films is 73 hours 41 minutes and 7 seconds, and a video clip was extracted on average every 27s. The 9,800 segmented video clips last between 8 and 12 seconds and are representative enough to conduct experiments. Indeed, the length of extracted segments is large enough to get consistent excerpts allowing the viewer to feel emotions, while being small enough to make the viewer feel only one emotion per excerpt.

The content of the movie was also considered to create homogeneous, consistent and meaningful excerpts that were not meant to disturb the viewers. A robust shot and fade in/out detection was implemented to make sure that each extracted video clip started and ended with a shot or a fade. Furthermore, the order of excerpts within a film was kept, allowing the study of temporal transitions of emotions.

Several movie genres are represented in this collection of movies, such as horror, comedy, drama, action, and so on. Languages are mainly English with a small set of Italian, Spanish, French and others subtitled in English. For this collection the 9,800 video clips are ranked according to valence, from the clip inducing the most negative emotion to the most positive, and to arousal, from the clip inducing the calmest emotion to the most active emotion. Besides the ranks, the emotional scores (valence and arousal) are also provided for each clip.

Continuous LIRIS-ACCEDE collection [11]
The movie clips for the Discrete collection were annotated globally, for which a single value of arousal and valence was used to represent a whole 8 to 12-second video clip. In order to allow deeper investigations into the temporal dependencies of emotions (since a felt emotion may influence the emotions felt in the future), longer movies were considered in this collection. To this end, a selection of 30 movies from the set of 160 was made such that their genre, content, language and duration were diverse enough to be representative of the original Discrete LIRIS-ACCEDE dataset. The selected videos are between 117 and 4,566 seconds long (mean = 884.2s ± 766.7s SD). The total length of the 30 selected movies is 7 hours, 22 minutes and 5 seconds. The emotional annotations consist of a score of expected valence and arousal for each second of each movie.

MediaEval 2015 Affective Impact of Movies collection [12]
This collection has been used as the development and test sets for the MediaEval 2015 Affective Impact of Movies Task. The overall use case scenario of the task was to design a video search system that used automatic tools to help users find videos that fitted their particular mood, age or preferences. To address this, two subtasks were proposed:

  • Induced affect detection: the emotional impact of a video or movie can be a strong indicator for search or recommendation;
  • Violence detection: detecting violent content is an important aspect of filtering video content based on age.

The 9,800 video clips from the Discrete LIRIS-ACCEDE section were used as development set, and an additional 1100 movie clips were proposed for the test set. For each of the 10,900 video clips, the annotations consist of: a binary value to indicate the presence of violence, the class of the excerpt for felt arousal (calm-neutral-active), and the class for felt valence (negative-neutral-positive).

MediaEval 2016 Emotional Impact of Movies collection [13]
The MediaEval 2016 Emotional Impact of Movies task required participants to deploy multimedia features to automatically predict the emotional impact of movies, in terms of valence and arousal. Two subtasks were proposed:

  • Global emotion prediction: given a short video clip (around 10 seconds), participants’ systems were expected to predict a score of induced valence (negative-positive) and induced arousal (calm-excited) for the whole clip;
  • Continuous emotion prediction: as an emotion felt during a scene may be influenced by the emotions felt during the previous scene(s), the purpose here was to consider longer videos, and to predict the valence and arousal continuously along the video. Thus, a score of induced valence and arousal were to be provided for each 1s-segment of each video.

The development set was composed of the Discrete LIRIS-ACCEDE part for the first subtask, and the Continuous LIRIS-ACCEDE part for the second subtask. In addition to the development set, a test set was also provided to assess participants’ methods performance. A total of 49 new movies under Creative Commons licenses were added. With the same protocol as the one used for the development set, 1,200 additional short video clips were extracted for the first subtask (between 8 and 12 seconds), while 10 long movies (from 25 minutes to 1 hour and 35 minutes) were selected for the second subtask (for a total duration of 11.48 hours). Thus, the annotations consist of a score of expected valence and arousal for each movie clip used for the first subtask, and a score of expected valence and arousal for each second of the movies for the second subtask.

MediaEval 2017 Emotional Impact of Movies collection [14]
This collection was used for the MediaEval 2017 Emotional Impact of Movies task. Here, only long movies were considered, and the emotion was considered in terms of valence, arousal and fear. The following two subtasks were proposed for which the emotional impact had to be predicted for consecutive 10-second segments, which slid over the whole movie with a shift of 5 seconds:

  • Valence/Arousal prediction: participants’ systems were supposed to predict a score of expected valence and arousal for each consecutive 10-second segment;
  • Fear prediction: the purpose here was to predict whether each consecutive 10-second segments was likely to induce fear or not. The targeted use case was the prediction of frightening scenes to help systems protecting children from potentially harmful video content. This subtask is complementary to the valence/arousal prediction task in the sense that the mapping of discrete emotions into the 2D valence/arousal space is often overlapped (for instance, fear, disgust and anger are overlapped since they are characterized with very negative valence and high arousal).

The Continuous LIRIS-ACCEDE collection was used as the development test for both subtasks. The test set consisted of a selection of new 14 new movies under Creative Commons licenses other than the selection of the 160 original movies. They are between 210 and 6,260 seconds long. The total length of the 14 selected movies is 7 hours, 57 minutes and 13 seconds. In addition to the video data, general purpose audio and visual content features were also provided, including Deep features, Fuzzy Color and Texture Histogram, Gabor features. The annotations consist of a valence value, an arousal value and a binary value for each 10-second segment to indicate if the segment was supposed to induce fear or not.

MediaEval 2018 Emotional Impact of Movies collection [15]
The MediaEval 2018 Emotional Impact of Movies task is similar to the one of 2017. However, in this case, more data was provided and a prediction of the emotional impact needed to be made for every second in movies rather than for 10-second segments as before. The two subtasks were:

  • Valence and Arousal prediction: participants’ systems had to predict a score of expected valence and arousal continuously (every second) for each movie;
  • Fear detection: the purpose here was to predict beginning and ending times of sequences inducing fear in movies. The targeted use case was the detection of frightening scenes to help systems protecting children from potentially harmful video content.

The development set for both subtasks consisted of the movies from the Continuous LIRIS-ACCEDE collection, as well as from the test set of the MediaEval 2017 Emotional Impact of Movies collection, i.e. 44 movies for a total duration of 15 hours and 20 minutes. The test set consisted of 12 other movies selected from the set of 160 movies, for a total duration of 8 hours and 56 minutes. Like the 2017 collection, in addition to the video data, general purpose audio and visual content features were also provided. The annotations consist of valence and arousal values for each second of the movies (for the first subtasks) as well as the beginning and ending times of each sequence in movies inducing fear (for the second subtask).


This work was supported in part by the French research agency ANR through the VideoSense Project under the Grant 2009 CORD 026 02 and through the Visen project within the ERA-NET CHIST-ERA framework under the grant ANR-12-CHRI-0002-04.


Should you have any inquiries or questions about the dataset, do not hesitate to contact us by email at: emmanuel dot dellandrea at ec-lyon dot fr.


[1] L. Canini, S. Benini, and R. Leonardi, “Affective recommendation of movies based on selected connotative features”, in IEEE Transactions on Circuits and Systems for Video Technology, 23(4), 636–647, 2013.
[2] S. Zhang, Q. Huang, S. Jiang, W. Gao, and Q. Tian. 2010, “Affective visualization and retrieval for music video”, in IEEE Transactions on Multimedia 12(6), 510–522, 2010.
[3] S.Zhao, H.Yao, X.Sun, X.Jiang, and P. Xu., “Flexible presentation of videos based on affective content analysis”, in Advances in Multimedia Modeling, 2013.
[4] H. Katti, K. Yadati, M. Kankanhalli, and C. Tat-Seng, “Affective video summarization and story board generation using pupillary dilation and eye gaze”, in IEEE International Symposium on Multimedia (ISM), 2011.
[5] R.R. Shah,Y. Yu, and R. Zimmermann, “Advisor: Personalized video soundtrack recommendation by late fusion with heuristic rankings”, in ACM International Conference on Multimedia, 2014.
[6] K. Yadati, H. Katti, and M. Kankanhalli, “Cavva: Computational affective video-in-video advertising”, in IEEE Transactions on Multimedia 16(1), 15–23, 2014.
[8] A. Hanjalic, “Extracting moods from pictures and sounds: Towards truly personalized TV”, in IEEE Signal Processing Magazine, 2006.
[9] J.A. Russell, “Core affect and the psychological construction of emotion”, in Psychological Review, 2003.
[10] Y. Baveye, E. Dellandrea, C. Chamaret, and L. Chen, “LIRIS-ACCEDE: A Video Database for Affective Content Analysis,” in IEEE Transactions on Affective Computing, 2015.
[11] Y. Baveye, E. Dellandrea, C. Chamaret, and L. Chen, “Deep Learning vs. Kernel Methods: Performance for Emotion Prediction in Videos,” in 2015 Humaine Association Conference on Affective Computing and Intelligent Interaction (ACII), 2015.
[12] M. Sjöberg, Y. Baveye, H. Wang, V. L. Quang, B. Ionescu, E. Dellandréa, M. Schedl, C.-H. Demarty, and L. Chen, “The mediaeval 2015 affective impact of movies task,” in MediaEval 2015 Workshop, 2015.
[13] E. Dellandrea, L. Chen, Y. Baveye, M. Sjoberg and C. Chamaret, “The MediaEval 2016 Emotional Impact of Movies Task”, in Working Notes Proceedings of the MediaEval 2016 Workshop, Hilversum, The Netherlands, October 20-21, 2016.
[14] E. Dellandrea, M. Huigsloot, L. Chen, Y. Baveye and M. Sjoberg, “The MediaEval 2017 Emotional Impact of Movies Task”, in Working Notes Proceedings of the MediaEval 2017 Workshop, Dublin, Ireland, September 13-15, 2017.
[15] E. Dellandréa, M. Huigsloot, L. Chen, Y. Baveye, Z. Xiao and M. Sjöberg, “The MediaEval 2018 Emotional Impact of Movies Task”, Working Notes Proceedings of the MediaEval 2018 Workshop, Sophia Antipolis, France, October 29-31, 2018.
[16] R. Cowie, M. Sawey, C. Doherty, J. Jaimovich, C. Fyans, and P. Stapleton, “Gtrace: General trace program compatible with emotionML”, in Humaine Association Conference on Affective Computing and Intelligent Interaction (ACII), 2013.

SIGMM Records: News, Statistics, and Call for Contributions & Suggestions


A new editorial team has committed to lead the ACM SIGMM Records since the issue of January 2017. The goal is to consolidate the Records as a primary source of information and a communication vehicle for the multimedia community. With these objectives in mind, the Records were re-organized around three main categories (Open Science, Information, and Opinion), for which specific sections and columns were created (more details in

statistics october 2018

Since then, all sections and columns have provided relevant and high-quality contributions, with a higher impact than anticipated. Since the new epoch of the Records, apart from new columns, two additional initiatives have been incorporated:

  • Best social media reporter: It was decided to award the SIGMM members with the most intense and valuable posts on Social Media during the SIGMM conferences. The selected Best Social Media Reporters are asked to provide a post-conference report to be published in the Records, and get a free registration to one of the upcoming SIGMM conferences. Up to now, the awardees have been: Miriam Redi (ICMR 2017), Christian Timmerer (MMSYS 2017), Benoit Huet and Conor Keighrey (MM 2017), Cathal Gurrin (ICMR 2018) and Gwendal Simon (MMSYS 2018). The criteria for the awards are specified here:
  • Section on QoE: Starting in the third issue of 2018 (September 2018), the Records include a new section on QoE, edited by Tobias Hoßfeld and Christian Timmerer. You can find here the introduction column: 

Apart from the recurrent sections, the community has as well contributed with relevant feature articles. Some examples include the article about the flow of ideas around SIGMM conferences by Lexing Xie, the article about ACM Fellows in SIGMM by Alan Smeaton, the SIGMM Annual Report (2018) by the Chairs, and an article about data driven statistics and trends in SIGMM conferences by David Ayman Shamma.

Finally, the editorial team is also working on infrastructural aspects together with ACM. First, an effective communication protocol with the ACM Digital Library has been established, enabling the publication of the issues and individual contributions in HTML format. SIGMM has indeed been pioneering in adopting the HTML format in the publication of articles. Second, the process for migrating the Records website to an ACM server and domain has started, and should be completed before the end of the year.

Pablo Cesar, the editor-in-chief, presented the new team, structure and impact at ACM MM 2017 and will update the community during ACM MM2018.

pablo_acm mm


Reach of the SIGMM Records

Since August 2018, we have been collecting statistics about visitors and visits to the Records website, and making use of Social Media for disseminating the contributions and news. In these 13 months, the daily number of visitors have ranged approximately between 100 and 400, being this variation strongly influenced by the publication of Social Media posts promoting published contents. In these last 13 months, more than 80000 visitors and nearly 500000 visits (i.e. clicks) have been registered.

The top 3 countries with highest number of visitors are US (>19000), China (>10000) and Germany (nearly 7000), and the top 10 all surpass 2000 visitors. Likewise, the top 3 posts with highest impact, in terms of number of visits are listed in Table 1.

Table 1. Top 3 posts on the Records website with highest impact

Post Publication Date Number of Visits
Impact of the New @sigmm Records September 2017 3051 visits
Standards Column: JPEG and MPEG May 2017 1374 visits
Practical Guide to Using the YFCC100M and MMCOMMONS on a Budget October 2017 786 visits

Finally, the top 3 referring sites (i.e., external websites from which visitors have clicked an URL to access the Records website) are Facebook (around 2500 references), Google (around 2500 references) and Twitter (>700 references). According to this, it seems clear that the social media strategy implemented by the editorial team is positively impacting the Records.

Regarding Social Media, two @sigmm channels are being used: a Facebook page and a Twitter account (@sigmm). The number of followers is still not high in Facebook (47), but it has significantly increased in Twitter (247) compared to the previous report. However, the impact of the posts on these platforms, in terms of reach, likes and shares is noteworthy. In Facebook, there are posts that have reached more than 1000 users, and in Twitter there are many tweets with tens of re-tweets and likes.


Our mission is to keep improving and consolidate the Records, and we are very open to getting extra help and feedback. So, if you would like to become member of our team, or simply have suggestions or ideas, please drop us a line!

We hope you are enjoying every new edition of the Records.

The Editorial team

Impact of the New @sigmm Records

The SIGMM Records have renewed, with the ambition of continue being a useful resource for the multimedia community. The intention is to provide a forum for (open) discussion and to become a primary source of information (and of inspiration!).

The new team ( has committed to lead the Records in the coming years, gathering relevant contributions in the following main clusters:

The team has also revitalized the presence of SIGMM on Social Media. SIGMM accounts on Facebook and Twitter have been created for disseminating relevant news, events and contributions for the SIGMM community. Moreover, a new award has been approved: the Best Social Media Reporters from each SIGMM conference will get a free registration to one of the SIGMM conferences within a period of one year. The award criteria are specified at

The following paragraphs detail the impact of all these new activities in terms of increased number of visitors and visits to the Records website (Figure 1), and broaden reach. All the statistics presented below started to be collected since the publication of the June issue (July 29th 2017).

Figure 1. Number of visitors and visits since the publication of the June issue

Figure 1. Number of visitors and visits since the publication of the June issue

Visitors and Visits to the Records website

The daily number of visitors ranges approximately between 100 and 400. It has been noticed that this variation is strongly influenced by the publication of Social Media posts promoting contents published on the website. In the first month (since July 29th, one day after the publication of the issue), more than 13000 visitors were registered, and more than 20000 visitors have been registered until now (see Table 1 for detailed statistics). The number of visits to the different posts and pages of the website accumulates up to more than 100000. The top 5 countries with highest number of visitors are also listed in Table 2. Likewise, the top 3 posts with highest impact, in terms of number of visits and of Social Media shares (via the Social Media icons recently added in the posts and pages of the website) are listed in Table 3. As an example, the daily number of visits to the main page of the June issue is provided in Figure 2, with a total number of 224 visits since its publication.

Finally, the top 3 referring sites (i.e., external websites from which visitors have clicked an URL to access the Records website) are Facebook (>700 references), Google (>300 references) and Twitter (>100 references). So, it seems that Social Media is helping to increase the impact of the Records. More than 30 users have accessed the Records website through the SIGMM website ( as well.

Table 1. Number of visitors and visits to the SIGMM Records website

Period Visitors
Day ~100-400
Week ~2000-3000
Month ~8000-13000
Total (Since July 29th)

20012   (102855 visits)

Table 2. Top 5 countries in terms of number of visitors

Rank Country Visitors
1 China 3339
2 United States 2634
3 India 1368
4 Germany 972
5 Brazil 731

Table 3. Top 3 posts on the Records website with highest impact

Post Date Visits Shares

Interview to Prof. Ramesh Jain

29/08/2017 619 103
Interview to Suranga Nanayakkara 13/09/2017 376 15
Standards Column: JPEG and MPEG 28/7/2017 273 44

Figure 1. Visits to the main page of the June issue since its publication (199 visits)

Figure 2. Visits to the main page of the June issue since its publication (199 visits)

Impact of the Social Media channels

The use of Social Media includes a Facebook page and a Twitter account (@sigmm). The number of followers is still not high (27 followers in Facebook, 88 followers in Twitter), which is natural with recently created channels. However, the impact of the posts on these platforms, in terms of reach, likes and shares is noteworthy. Tables 4 and 5 lists the top 3 Facebook posts and tweets, respectively, with highest impact up to now.

Table 4. Top 3 Facebook posts with highest impact

Post Date Reach (users) Likes Shares
>10K visitors in 3 weeks 21/08/2017 1347 7 4
Interview to Suranga Nanayakkara 13/09/2017 1297 89 3
Interview to Prof. Ramesh Jain 30/08/2017 645 28 4

Table 5. Top 3 tweets with highest impact

Post Date Likes Retweets
Announcing the publication of the June issue 28/07/2017 7 9
Announcing the availability of the official @sigmm account 8/09/2017 8 9
Social Media Reporter Award: Report from ICMR 2017 11/09/2017 5 8

Awarded Social Media Reporters

The Social Media co-chairs, with the approval of the SIGMM Executive Committee, have already started the processes of selecting the Best Social Media Reporters from the latest SIGMM conferences. In particular, the winners have been Miriam Redi  from ICMR 2017 (her post-summary of the conference is available at: and Christian Timmerer for MMSYS 2017 (his post-summary of the conference is available at: Congratulations!

The Editorial Team would like to take this opportunity to thank all the SIGMM members who use Social Media channels to share relevant news and information from the SIGMM community. We are convinced it is a very important service for the community.

We will keep pushing to improve the Records and extend their impact!

The Editorial Team.

@sigmm Records: serving the community

The SIGMM Records are renewing, with the continued ambition of being a useful resource for the multimedia community. We want to provide a forum for (open) discussion, but also to become the primary source of information for our community.

Firstly, I would like to thank Carsten who was run, single-handed, the whole records for many many years. We all agree that he has done an amazing job, and that his service deserves our gratitude, and possibly some beers, when you meet him at conferences and meetings.

As you are probably aware, a number of changes in the records are underway. We want your opinions and suggestions to make this resource the best it can be. Hence, we need your help to make this a success, so please drop us a line if you want to join the team.

The two main visible changes are:

We have a new amazing team to lead the records in the coming years. I am so glad to have their help:

We have reorganized the records and its structure, in three main clusters:

More changes to come. Stay tuned!

Pablo (Editor in Chief) + Carsten and Mario (Information Directors)

Pablo CesarDr. Pablo Cesar leads the Distributed and Interactive Systems group at Centrum Wiskunde & Informatica (CWI) in the Netherlands. Pablo’s research focuses on modeling and controlling complex collections of media objects (including real-time media and sensor data) that are distributed in time and space. His fundamental interest is in understanding how different customizations of such collections affect the user experience. Pablo is the PI of Public Private Partnership projects with Xinhuanet and ByBorre, and very successful EU-funded projects like 2-IMMERSE, REVERIE and Vconect. He has (co)-authored over 100 articles. He is member of the editorial board of, among others, ACM Transactions on Multimedia (TOMM). Pablo has given tutorials about multimedia systems in prestigious conferences such as ACM Multimedia, CHI, and the WWW conference. He acted as an invited expert at the European Commission’s Future Media Internet Architecture Think Tank and participates in standardisation activities at MPEG (point-cloud compression) and ITU (QoE for multi-party tele-meetings). Webpage:


Carsten GriwodzDr. Carsten Griwodz is Chief Research Scientist at the Media Department of theNorwegian research company Simula Research Laboratory AS, Norway, and professor at the University of Oslo. He is also co-founder of ForzaSys AS, a social media startup for sports. He is steering committee member of ACM MMSys and ACM/IEEE NetGames. He is associate editor of the IEEE MMTC R-Letter and was previously editor-in-chief of the ACM SIGMM Records and editor of ACM TOMM.



photo_mario_montagudDr. Mario Montagud (@mario_montagud) was born in Montitxelvo (Spain). He received a BsC in Telecommunications Engineering in 2011, an MsC degree in “Telecommunication Technologies, Systems and Networks” in 2012 and a PhD degree in Telecommunications (Cum Laude Distinction) in 2015, all of them at the Polytechnic University of Valencia (UPV). During his PhD degree and after completing it, he did 3 research stays (accumulating 18 months) at CWI (The National Research Institute for Mathematics and Computer Science in the Netherlands). He also has experience as a postdoc researcher at UPV. His topics of interest include Computer Networks, Interactive and Immersive Media, Synchronization, and QoE (Quality of Experience). Mario is (co-) author of over 50 scientific and teaching publications, and has contributed to standardization within the IETF (Internet Engineering Task Force). He is member of the Technical Committee of several international conferences (e.g., ACM MM, MMSYS and TVX), co-organizer of the international MediaSync Workshop series, and member of the Editorial Board of international journals. He is also lead editor of “MediaSync: Handbook on Multimedia Synchronization” (Springer, 2017) and Communication Embassador of ACM SIGCHI (Special Interest Group on Computer-Human Interaction). Webpage:


Dear Member of the SIGMM Community, welcome to the last issue of the SIGMM Records in 2013.

The editors of the Records have taken to a classical reporting approach, and you can read here the first of series of interviews. In this issue, Cynthia Liem is interview by Mathias Lux, and explains about the Phenicx project.

We have received a report from the first international competition on game-based learning applications, and also our regular column reporting from the 106th MPEG meeting that was held in Geneva. Our open source column presents libraries and tools for threading and visualizing a large video collection in this issue, a set of tools that will be useful for many in the community. Beyond that, you also read about two PhD thesis.

Among the announcements are several open positions, and a long list of calls for paper. The long list of calls is achieved by a policy change in SIGMM. After several years that have seen our two public mailing lists, and, flooded by calls for papers, the board and online services editors have decided to change the posting policy. Both lists are now closed for public submissions of calls for paper and participation. Instead, calls must be submitted through the SIGMM Records web page, and will be distributed on the mailing list in a weekly digest. We hope that the members of the SIG appreciate this service, and that those of us who have filtered emails for years feel that this is a more appropriate policy.

With those news, we invite you to read on in this issue of the Records.

The Editors
Stephan Kopf, Viktor Wendel, Lei Zhang, Pradeep Atrey, Christian Timmerer, Pablo Cesar, Mathias Lux, Carsten Griwodz


Dear Member of the SIGMM Community, welcome to the third issue of the SIGMM Records in 2013.

On the verge of ACM Multimedia 2013, we can already present the receivers of SIGMM’s yearly awards, the SIGMM Technical Achievement Award, the SIGMM Best Ph.D. Thesis Award, the TOMCCAP Nicolas D. Georganas Best Paper Award, and the TOMCCAP Best Associate Editor Award.

The TOMCCAP Special Issue on the 20th anniversary of ACM Multimedia is out in October, and you can read both the announcement, and find each of the contributions directly through the TOMCCAP Issue 9(1S) table of contents.

That SIGMM has established a strong foothold in the scientific community can also be seen by the Chinese Computing Federation’s rankings of SIGMM’s venues. Read the article to get even more motivation for submitting your papers to SIGMM’s conferences and journal.

We are also reporting from SLAM, the international workshop on Speech, Language and Audio in Multimedia. Not a SIGMM event, but certainly of interest to many SIGMMers who care about audio technology.

You find also two PhD thesis summaries, and last but most certainly not least, you find pointers to the latest issues of TOMCCAP and MMSJ, and several job announcements.

We hope that you enjoy this issue of the Records.

The Editors
Stephan Kopf, Viktor Wendel, Lei Zhang, Pradeep Atrey, Christian Timmerer, Pablo Cesar, Mathias Lux, Carsten Griwodz