An interview with David Ayman Shamma




Describe your journey into computing from your youth up to the present. What foundational lessons did you learn from this journey? Why were you initially attracted to multimedia?

I’ve always been curious about solving problems.  Not so much the answer but actually I like to know how a problem can be broken down into parts, abstracted, and reasoned with—which often drives us to think about abstraction (is there a non-specific instance of this problem), theory (is there some known literature from the mathematical or social sciences that will help us frame what’s happening, and analogy (can we solve this because its structure is like another problem?).  My education included classes in psychology, philosophy, math, and engineering; eventually I realized Computer Science and specifically Artificial Intelligence embodied everything I was looking for: understanding people, modeling problems, and building new systems.

Interestingly enough, as an undergrad I took a job in an art department at the local state college as a technician; my job was to keep their Macs running with Adobe products. While I was there, I was allowed to audit studio art classes.  I began to see how artistic and creative processes were influenced by the tools we have—be it a 1:50 D-76 bath with fiber based paper in a darkroom or masking layers in Photoshop.  This connection between creative and constructive processes carried into my work at NASA’s Center for Mars Exploration where I worked on diagrammatic knowledge tools and then into my Ph.D on community driven Multimedia systems. It was around this time that I saw ACM Multimedia 2004 had a call for technical papers in the Interactive Arts.  Since then I’ve been active in the community, mostly focused on the Arts track but as my work began to include social computing in 2009 I started to think about hybrid social-visual systems.  In 2013, I was the Technical Program Co-chair, and  we started to look critically at the broad technical areas, the review process, and started some inclusion and diversity initiatives.

The main foundational lesson for me is to continue asking the right questions, even if you’re branching stemming out of some smaller, under-represented area or track.  In many cases, you’ll find new exciting research questions.  That said, I found I need to couple this with a personal understanding of the outside domain; only then can a truly functional hybrid system work; it’s not enough to look at divergent sources as just a big bag of the same data—pixels, tags, comments, clicks, they all carry an explicit or tacit semantic implication; respect that.

Tell us more about your vision and objectives behind your current roles? What do you hope to accomplish and how will you bring this about?

My Ph.D. dealt with social computing and community semantics: the objects in a photo carry a broader semantic conversation context of the online site sharing that photo. When I graduated, I joined an industry research lab. I spent 10 years there through a few organizational shifts. In my last 4 years there I founded the HCI Research group with a charter on investigating what our research meant to people.  My group’s research spanned across several domains: multimedia, computer vision, information visualisation, social computing, ethnography, and physical computing; this gave me deep perspective across many areas.  Personally, understanding how things are connected and what those connections meant became a focus of my research.  Data is created for a reason and structured link data can carry a tacit semantic that helps us understand people and tasks in the world. Lately, I’ve been thinking about physical spaces where people interact and create content. What sort of camera do you have on you? How does it change your practice of photography? What sensors might be in your clothes or in the world? These questions have been part of my current focus at Centrum Wiskunde & Informatica.  We’ve been working with a Dutch fashion designer in Amsterdam investigating how fashion and technology can be used in various situational tasks and environments through instrumenting clothing and creating structured data to understand people’s activity and flocking.  What’s exciting beyond the research is connecting goals of a fashion designer and computer science research; it’s an exciting bridge to create. Once all the fabric and sensors are accounted for, it becomes a social computing problem again…that’s where I like to live, creating bridges.

Can you profile your current research, its challenges, opportunities, and implications?

Now more than ever, we are a function of our own data.  Data drives much of computing today, be it data science or machine learning driven.  I like to emphasize how we collect and label data as it has direct consequences on what we can analyze, predict, and create.  For many, this means harvesting data for use.  For me, it means understanding how people act, behave, and communicate through those signals.  For example, at CSCW 2016 I published some work where we looked at the browsing behavior of millions of people on Flickr which we matched into a relatively small set of editorial judgements to surface high quality geo-tagged weather photos.  The alternate approach, which they did attempt at first, was to just train a neural net to find photos of storms or lightning or sunny days. While that’s recall optimistic, the editors were quick to point out everyone takes crummy photos of lightning so conventional approaches didn’t work. My research took a different approach, instead of training generic aesthetics into the system, we modeled a community-centric approach. Using the tacit aesthetic judgments from the Flickr community, we couple the structured link data with CNN to surface high quality photos.It’s not a case of active learning, in fact, it’s a supervised model where that supervision comes from implicit community actions and explicit editorial judgements.  We have some similar work to be published at CHI 2017 later this year where we were surfacing deviant/abuse images on Tumblr; a task that was even harder as the image may not be representative of such behavior, so the social-visual system was a necessity.

Taking you interest in AI and fashion into account, I am wondering what you generally think about the current hype on deep learning and in context to the fashion research. Do you think AI based systems will ever be able to understand context which is an important factor in fashion?

You know, I remember when DeepBlue beat Kasparov back in the 90s and while it was great, I didn’t think much of it as an AI victory (nor did IBM if I recall). The recent win by AlphaGo  is different and something amazing.  I don’t think it’s hype as things work and work well—however we still face many of the same limitations. With regard to fashion, it’s a great time to be excited about AI. I mean we see solutions to many of the older research and fashion issues (like point your camera at someone and find the clothes they are wearing to buy online) but I think smart electronics, AI and fashion is the new sweet spot.  There have been many advancements in textiles like pixel to stitch knitting and small electronics make for a fun new playground for AI, sensors, and IoT. We’re just now starting to explore how clothes and fashion can sense, detect, and respond to people and to the environment.  I get what you’re saying by AI hype and that’s another discussion, but right now I’m excited to build the next generation of wearable tech.

How generalizable is data from sources like Flickr? For example, are your insights on Flickr also valid in non-western countries?

I certainly have had reviewers ask me how generalizable research is because it used Flickr data or Yelp data or Twitter data or whatever; I see it as the hallmark of a bad review.  On one hand, there is no sense to believe that any slice of a specific social media dataset should be generalizable. People act differently on Flickr than they do on Instagram or on Snapchat.  The application/website dictates an interaction, and really that’s what we are studying—as a research community we need to move beyond just studying naive pixels and examine what it’s doing.  Ok, if you’re just looking for indoor vs outdoor shots in Yelp photos, then maybe.  But have you ever tried to find a restaurant in Japan versus Italy versus America? Store fronts look completely different. Internationalization is rarely studied by multimedia researchers and I think multimedia mediated cultural communication is more important than website generalization. 

I think it would be very interesting if you could also answer about what do you think is the role or responsibility of multimedia researchers in context of all the fake news/alternative new debate. Do you think we should focus on it?

In 2009, I began publishing work on doing multimedia summarization from using aggregated Twitter feeds from the Obama McCain debate. Back then, people really really wanted to tweet and it was a narrow interest community.  A few years later, during the Egyptian of 2011, I ran my methods against the Twitter firehose and saw some mis-information (like a bus on fire that was reported which was actually from another country years ago). Delayed information is a systemic problem, where something happened hours or days ago and it gets propagated as fresh information. I don’t believe we had widespread purposeful propagation of misinformation (least not like what we see in today’s world). So today, we have misplaced information, delayed information, fake/alt information and the field of multimedia is ripe to handle this problem. For example, take a fake news story with a photo.  Has the photo been altered to retell a story? Is the photo from a different news story? Are there clusters of other news sources that contradict? There’s a whole world of multimedia problems, many of which large companies are struggling to get a grip on, in finding fake news, but the hard problem will be the explanation. Identifying fake is half of the problem, explaining to people why it’s fake is the other.  News, now more than ever, is highly visual (photos/video) and social; dealing with a plurality of signals is the core of multimedia research.

In this context do you think that fake news are a problem of social network platforms or should newspapers also be investigated?

Can you name a news source that does not rely on social network platforms?  Conversely, have you seen Twitter deliver news?  Their streaming video with tweet interfaces speaks to research we did 10 years back.  I don’t think we can decouple the two, but we’ve seen how social media sites tend to amplify things by propagating clickable content.  So for a news agency, it starts with the title and snippet of a story and it’s related photo.  But then there’s also the face news agencies gaming the social sites.  There’s been some great work from UW cracking the problem, but I think it’s time for multimedia research to step up here as visual content always carries more engagement.

How would you describe the role of women especially in the field of multimedia?

Diversity of all types—gender, nationality, race—is critically important to the future of multimedia research.  When I was on the TPC for Multimedia in 2013 I did some data analytics of the past several years of the conference series; the gender stats were abysmal.  We worked hard to increase the gender diversity in the area chairs and in the conference.  To the former, following some advice from Maria Klawe I heard in a lecture maybe 10 years prior, we pushed on topic diversity for the conference.  The idea here is legacy areas can carry legacy diversity problems; so newer areas (social computing, affect, crowdsourcing, music, etc.) are more likely to have better gender leadership ratios.  It was the correct approach and we doubled the number of women in leadership roles in the ACs but still there was much room to grow.  We coupled this with finding corporate support for a womens & diversity lunch—a practice that I’m happy the conference has continued.  Diversity brings an expanded set of ideas, methods, and approaches in research.  We’ve come a ways since 2013 and I’m very happy to see the 2017 program also similarly expand its diversity but we have a very long way to go to catch up to some other SIGs.

How would you describe your top innovative achievements in terms of the problems you were trying to solve, your solutions, and the impact it has today and into the future?

Impact happens where research connects to people. For me, it’s usually revolves around creative practice in multimedia.  How online broadcasters DJing house and hip-hop connect with their audience online and how does it differ from when they are in a club?  If you have an iPad and an iPhone and want to take a picture, when do you reach for the iPad to take the photo?  If you’re posting a photo to Instagram, what filter will you use to enhance the photo?  The most valuable research include method, system, and people. Let’s take that last one as an example.  One could build a prediction model to automatically apply filters based on a training set of what got likes and the types of transformation but would that change people’s creative practice?  We found people enjoyed the process of selection (despite usually picking the same filter over and over again). So the question becomes how do we optimize the experience without hindering it.

In my time as Director of Research at Flickr, we enjoyed looking at the full stack: data, machine learning, engineering, visualization, and all the components that affect people and media experience. We knew there was an advantage to easily dive into 13 billion photos and 100 million people but felt, even inside a corporation, there should be more open data for all researchers.  This lead to the creation of the YFCC100M ( 100 million Creative Commons images in a single dataset for open research.  Beyond the data itself, we found ourselves reviewing small technical Creative Commons details to ensure legal and privacy concerns were met but still opening the data for wide academic and corporate use.  The impact has been incredible.  Outside of the multimedia and computer vision communities, in the first year since release we’ve seen published work using our dataset from the HCI, Data Science, and Visualization communities and even were featured by the Library of Congress.  All driven by the idea to share data we felt was too locked up; fortunately Flickr, Creative Commons, and Yahoo Legal shared our vision and we’ll look to see more impact to come.

Over your distinguished career, what are your top lessons you want to share with the audience?

Really nothing happens in a vacuum. Partnerships and collaborations make things interesting as they make one malleable and push one to think full stack. This is shaped by my 10 years in an industry lab, connecting with academia through hosting interns, collaborative work, and sponsorships really fueled my work.  I’d say still a good 70% of our work was internally driven but that 30% outreach was really valuable.  Now at an academic lab, I’m doing the reverse.  We partnered with a fashion designer to keep connected to their goals and their problems while we think about the wearable and social Internet of Things.  It’s great to think without constraints but really adapting to the real world and thinking end-to-end is a critical driver for me.  At the end of the day, I want to use it. Build what you love and make it real.  This was easier when I was at a corporation, but there are still plenty of ways to collaborate depending on scope. And really think full stack in system and evaluation.  You’ll find yourself evaluating your work on multiple levels from F-1 metrics to Likert scale surveys. What we do is develop new systems and methods but work with real impact will affect applications and design. My favorite research (of mine or others) always critically engages with the bigger picture.

Since you are active researcher in both US and in Europe, what do you think are the main differences? What is positive and what is negative? And what could we learn from each other?

I did a semester sabbatical at the Keio-NUS CUTE center in Singapore a few years back, so it’s not my first dive outside of industry.  I’m reminded in La Nausée Sartre wrote that anyplace you live feels the same after two weeks; the idea being once you get back to job and life, it becomes the same again. I can’t say I quite agree in this case. The move from an industry lab in California to an academic one in the Netherlands was a bit of a culture and cadence shift.  After almost a year, it’s clear to me that it’s the pace as we share research culture.  We tend to sprint constantly in industry and the sprinting seems to come and go in the academic. Each style has it’s pros and cons; there’s been times I wanted everyone to be running and times I was happy I could dive into something because we weren’t running. I don’t think it’s something to enumerate positive and negative points, just a different state of being.  I’m not sure why I gave you an existential response either.

An interview with Judith Redi

Describe your journey into computing from your youth up to the present. What foundational lessons did you learn from this journey? Why were you initially attracted to multimedia?

Dr. Judith Redi

Dr. Judith Redi

My path to multimedia was, let’s say, non-linear. I grew up in the Italian educational system, which up until university, is somewhat biased towards social sciences and humanities. My family was not one of engineers/scientists either, and never really encouraged me to look at the technical side of things. Basically, I was on a science-free educational diet until university. On the other hand, my hometown used to host the headquarters of Olivetti (may remember fancy typewriters and early personal computers?). This meant that at a very young age I had a PC at home and at school, and could use it (as a “user” on the other side of the systems we develop; I had no clue about programming).

When the time came to choose a major at university, I decided to turn the tables, a bit as a provocative action towards my previous education/mind-set, and a bit because I was fascinated by the perspective of being able to design and build future technologies. So, I picked computer engineering, perhaps inspired by my hometown technological legacy. I immediately got fascinated by artificial intelligence, and its potential to make machines more human-like (I still tell all my bachelor students that they should have a picture of Turing on their desk or above their bed). I specialized in machine learning and applied it to cryptanalysis within my master thesis. I won a scholarship to continue that research line in a PhD project at the University of Genoa. And then Philips came along, and multimedia with it.

At the time (2007), Philips was still manufacturing displays, and to stay ahead of the competition, they had to make sure their products would deliver to users the highest possible visual quality. They had algorithms to enhance image quality, but needed a system able to understand how much enhancement was needed, and of which type (sharpening? De-noising?), based on the analysis on the incoming video signal. They wanted to try a machine-learning approach to this issue, and referred to my group for collaboration. I picked up the project immediately: the goal was to model human vision (or at least the processes underlying visual quality perception), which implied not only developing new intelligent systems at the intersection between Signal Processing and Machine Learning, but also to learn more about the users of these systems, their perception and cognition. It was the fact that it would allow me to adopt a user-centred approach, closing the loop back to my social science-oriented education, that made multimedia so attractive to me. So, I left cyber-security, embraced Multimedia, and never left since.

One Philips internship, a best PhD thesis award and a Postdoc later, I am still fascinated by this duality. Much has changed in multimedia delivery, with the shift from linear TV to on-demand content consumption, video streaming accounting for 70% of the internet traffic nowadays, and the advent of Ultra High Definition solutions. User expectations in terms of Quality of Experience (QoE) increase by the day, and they are not only affected by the amount of disruptions (due to encoding, untrustworthy transmissions, rendering inaccuracies) in the delivered video, but also relate to content semantics and popularity, user affective state, environment and social context. The role of these factors on QoE is yet to be understood, let alone modelled. This is what I am working on at TU Delft, and is a long term plan, so I guess I won’t be leaving multimedia any time soon.

I’d say it’s too early for me to draw “foundational lessons” worth sharing from my journey. I guess there are a few things, though, that I figured out along the years, and that may be worthwhile mentioning:

  1. Seemingly reckless choices may be the best decisions you have ever made. Change is scary, but can pay off big time.

  2. Luck exists but hard work is a much safer bet

  3. Keep having fun doing your research. If you’re not having fun anymore, see point (1).

Tell us more about your vision and objectives behind your current roles? What do you hope to accomplish and how will you bring this about?

As a researcher, I have been devoting most of my efforts to understanding multimedia experiences and steer their optimization (or improvement) towards a higher user satisfaction (with the delivery system). On the longer term, I want broaden this scope, to make an even bigger impact on people’s life: I want to go beyond quality of experience and multimedia enjoyment, and target the optimization (or at least improvement) of users’ well-being.

For the past four years, I have been working with Philips Research on an Ambient Assisted Living system able to (1) sense the mood of a user in a room and (2) adapt the lighting in the room to alleviate negative moods (e.g., sadness, or anxiety), when sensed. We were able to show that the system can successfully counter negative moods in elderly users (see our recent PLoS One publication if you are interested), without the need of human intervention. The thing is, negative affective states are experienced by elderly (but by younger people too, according to recent findings) quite often, and most times, a fellow human (relative, friend, caretaker) is not available to comfort the person. My vision is to build systems that, based on the unobtrusive sensing of users’ affective states, can act upon the detection of negative states and relieve the user just as a human would do.

I want to design “empathic technology”, able to provide empathic care, whenever human care is not within reach. Challenges are multiple here. First, (long-term) affective states (such as mood, which is more constant and subtle than emotion) are to be sensed. (Wearable) sensors, cameras, or also interaction with mobile devices and social media can provide relevant information here. Empathic care can then be conveyed through ambient intelligence solutions, but also by creative industries products, ranging from gaming to intelligent clothing, to, of course, Multimedia technology (think about empathic recommender systems, or videotelephony systems that are optimized to maximize the affective charge of the communication). This type of work is highly multidisciplinary (involving multimedia systems, affective computing, embedded systems and sensors, HCI and certainly psychology), and the low-hanging fruits are not many. But I’d like this to be my contribution to make the world a better place, and I am ready to take up the challenge.

Can you profile your current research, its challenges, opportunities, and implications?

Internet-based video fruition has been reality for a while, yet it is constantly growing. Cisco’s forecasts see video delivery to account for 79% of the overall internet consumer traffic by 2018 (this is equivalent to one million minutes of video crossing IP networks every second). As the media fruition grows, so do user expectations in terms of Quality of Experinece (see the recent Conviva reports!). And, future multimedia will have to be optimized for multiple, more immersive (plenoptic, HDRi, ultra-high definition) devices, both fixed and mobile. Moore’s law and broadband speed alone won’t do the job. Resources and delivery mechanisms have to be optimized on a more application- and user-specific basis. To do so, it will be essential to be able to measure (unobtrusively) the extent to which the user deems the video experience to be of a high quality.

In this context, my work aims to (1) understand the perceptual, cognitive and affective processes underlying user appreciation for multimedia experiences, and (2) model these processes in order to automatically assess the delivered QoE, and, when applicable, enhance it. It is important here to bear in mind that multimedia quality of experience cannot be considered to depend solely on the presence (absence) of visual/auditory impairments introduced by technology limitations (e.g., packet loss errors or blocking artifacts from compression). Although that’s been the most common approach to QoE assessment and optimization, it is not sufficient anymore. The appearance of social media and internet-based delivery has challenged the way media are consumed: we don’t deal with passive observers anymore, but with users that select specific videos, to be delivered on specific devices, in any type of context. Elements such as semantics, user personality, preferences and intent, and socio- cultural context of fruition come into play, that have never been investigated (let alone modelled) for delivery optimization. My research focuses on integrating these elements in QoE estimation, to enable effective, personalized optimization.

The challenges are countless: user and context characteristics have to be quantified and modelled, to be then integrated with the video content analysis to deliver a final quality assessment, representing the experience as it would be perceived by that user, in that context, given that specific video. Before that, which user and context factors impact QoE is to be determined (to date, there is not even agreement on a taxonomy of these factors). Adaptive streaming protocols make it possible to implement user- and context- aware delivery strategies, the willingness of users to share personal data publicly can lead to more accurate user models, and especially crowdsourcing and crowdsensing can support the systematic study of the influence that context and user factors have on the overall QoE.

How would you describe the role of women especially in the field of multimedia?

Just like for their male colleagues (would you ask them to describe the role of men in multimedia?), the role of women in multimedia is:

  1. to push the boundaries of science, knowledge and practice in the field, doing amazing research that will make the world a better place
  2. to train new generations of brilliant engineers and scientists that will keep doing amazing research to make the world an even better place and
  3. serve the community as professionals and leaders to steer the future amazing research that will go on making the wold better and better.

I’d say the first two points are covered. The third, instead, may be implemented a bit better in practice, as there is a general lack of representativeness of women at a leadership level. The reasons for this are countless. They go from the lack of incoming talent (traditionally girls are not attracted to STEM subjects, perhaps for socio-cultural reasons), to the so-called leaking pipeline, which sees talented women leaving demanding yet rewarding careers too early, to an underlying presence of the impostor syndrome, that sometimes prevents women from putting their name forward for given roles. The solution is not necessarily in quotas (although I understand the reasoning behind the need for quotas, I think they are actually making women’s life more difficult – there is an underlying feeling that “women have it all easy these days” that makes work relationships more suspicious and ends up making women have to work three times as hard to show that they actually deserve what they accomplished), but rather in coaching and dedicated sponsorship of talent since the early stages.

How would you describe your top innovative achievements in terms of the problems you were trying to solve, your solutions, and the impact it has today and into the future?

The methods that I developed for subjective image quality assessment have been adopted within Philips research and their evolution to video quality assessment is now under evaluation of the Video Quality Experts Group to be advised as an alternative methodology to the standard ACR and paired comparison. The research that I carried out on the suitability of crowdsourcing for subjective QoE testing and adaptation of traditional lab-based experimental designs to crowdtesting is now included in the Qualinet white paper on Best practices for crowdsourced QoE, and has helped in better understanding the potential of this tool for QoE research (and the risks involved in its use). This research is also currently feeding new ITU-T recommendations on the subject. The models that I developed for objective QoE estimation have been published in top journals and pose the basis for a more encompassing and personalized QoE optimization.

Over your distinguished career, what are your top lessons you want to share with the audience?

Again, I am not sure whether I am yet in the position of giving advice and/or sharing lessons, but here are a couple of things:

  1. Be patient and long-sighted. Going for research that pays off on the short term is very appealing, especially when you are challenged with job insecurity (been there, done that). But it is not a sustainable strategy, you can’t make the world a better place with your research if you don’t have a long term vision, where all the pieces fit together towards a final goal. And on the long term, it’s not fun either.

  2. Be generous. Science is supposed to move forward as a collaborative effort. That’s why we talk about a “scientific community”. Be generous in sharing your knowledge and work (open access, datasets, code). Be generous in providing feedback, to your peers (be constructive in your reviews!) and to students. Be generous in helping out fellow scientists and early stage researchers. True, it is horribly time consuming. But it is rewarding, and makes our community tighter and stronger.

For girls, watch Sheryl Sandberg’s TED talk, do participate to the Grace Hopper Celebration of Women in Computing, don’t be afraid to come to the ACMMM women’s lunches, they are a lot of fun. Actually, these are good tips for boys too.

For the rest just watch The last lecture of Randy Pausch because he said it all already and much better than I could ever do.

If you were conducting this interview, what questions would you ask, and then what would be your answers?

Q: Why should one attend the ACMMM women’s lunch?

A: If you are a female junior member of the community, do attend because it will give you the opportunity to chat with senior women who have been around for a while, and can tell you all about how they got where they are (most precious advice, trust me). If you are a female senior member of the community, do attend because you could meet some young, talented researcher that needs some good tips from you, and you should not keep all your valuable advice for yourself :). If you are a male member of the community, you should attend because we really need to initiate some constructive dialogue on how to deal with the problem of low female representation in the community (because it is a problem, see next question). Being this a community problem (and not a problem of females only), we need all members of the community to discuss it.

Q: Why do we need more women in Multimedia?

A: Read this or this, or just check the Wikipedia page on women in STEM.

An interview with Klara Nahrstedt

Michael Riegler (MR): Describe your journey into computing from your youth up to the present. What foundational lessons did you learn from this journey? Why were you initially attracted to multimedia?

Prof. Klara Nahrstedt

Prof. Klara Nahrstedt

Klara Nahrstedt (KN): From my youth I have been attracted and interested in mathematics, physics and other sciences. However, since most of my family were electrical and computer engineers, I was surrounded by engineering gadgets and devices, and one of them was a very early computer, able to answer various quiz questions about the world. I liked this new device with its many potentials. Therefore, my interests and my family’s influence guided me towards an educational journey between science and engineering. I did my undergraduate studies in Mathematics and my Diploma work in Numerical Analysis, at the Humboldt University zu Berlin in East Germany. And after the Berlin Wall came down in 1989, my educational journey led me to the Computer and Information Science Department at the University of Pennsylvania in Philadelphia where I did my PhD degree and studied multimedia systems and networking.

My interest in multimedia came during my time at the Institute for Informatik, where I worked as a research programmer. This was the time after my Diploma Degree and after my System Administrator job at the Computer Center of the Ministry of Agriculture in East Berlin. This was the time when Europe, in contrast to USA, invested heavily in the new ISO-defined X.25-based digital networking technology, and with it in the new X.400 email system and its applications. One of the very interesting discussions at the time was to transport via email not only text messages, but also digital audio and images as messages. I wanted to be part of the discussion, since I believed that a picture (image) is worth 1000 words and auditory interfaces would be easier for users to enter messages than text messages. I wanted to help develop solutions that would enable transport of these multi-modal media, and my long journey into multimedia systems and networks started. After I joined University of Pennsylvania, as part of my PhD work, I was exposed to the research in the GRASP laboratory where researchers studied computer vision algorithms and cameras, mounted on robots. As a researcher interested in networking and multimedia, it was very natural for me to explore the integrated multimedia networking problems for tele-robotic applications and enable video and control information to be transported from remote robots to operators and to visualize what the remote robot was doing. Since my PhD the journey into deep understanding of multimedia systems and networks continues as new knowledge, technologies, applications, and users emerge.

The foundational lessons that I learned from this journey are: (1) acquire very strong fundamental knowledge in science and humanities very early independent what future opportunities, jobs, interests, and circumstances guide you towards; (2) work hard and believe in yourself; and (3) keep continuously learning.

MR: Tell us more about your vision and objectives behind your current roles? What do you hope to accomplish and how will you bring this about?

KN: During my professional life, I had three different roles: researcher, educator and provider of professional services in different functions.

  • As a researcher, my vision and objective are to provide theoretical and practical cyber-solutions that enable people to communicate seamlessly and trustworthy with each other and with their physical environments.
  • As an educator, my vision and objective were and are to educate as best I can the next generation of undergraduate and graduate students who are very well prepared to tackle the numerous new challenges in the fast changing human-cyber-physical environments.
  • In the space of professional services, I served in various roles as the member of numerous program committees, and organizing member and/or chair, co-chair, editor of IEEE and ACM professional venues, as the chair of ACM Special Interest Group on Multimedia (SIGMM), and as the member of various departmental and college committees, and now as the Director of the Interdisciplinary Research Unit, the Coordinated Science Laboratory (CSL) in the College of Engineering at the University of Illinois at Urbana-Champaign. In each of the administrative and service roles, my vision and objective are to provide high quality service to the community if it is a high quality technical program at a conference or journal, fair and balanced allocation of resources that would advance the mission of SIGMM, or a broad support of interdisciplinary work in CSL.

I hope to achieve the vision and objectives of my research, educational and professional service activities via hard work, continuous learning, willingness to listen to others, and a very strong collaboration with others, especially my students, colleagues and staff members that I interact with.

MR: Can you profile your current research, its challenges, opportunities, and implications?

KN: My current research moves in three different directions which have some commonalities, but also differences. The major commonality of my research is in aiming to solve the underlying joint performance and trust issues in resource management of multi-modal systems and networking that we find in the current human-cyber-physical systems. The three different directions of my research are: (a) 3D teleimmersive systems for tele-health, (b) trustworthy cyber-physical systems such as power-grid, oil and gas, and (c) trustworthy and timely cloud-based cyber-infrastructures for scientific instruments such as distributed microscopes.

In all of these challenges and directions, the challenges are in providing real-time acquisition, distribution, analysis and retrieval of multi-modal data in conjunction with providing security, reliability and safety.

The opportunities in the areas of human-cyber-physical systems in health, and critical infrastructures are enormous as people are aging, physical infrastructures are being fully stressed, and multimedia devices are challenging every societal cyber-infrastructure by generating Big Data in terms of their volume, velocity and variety.

We are living in truly exciting times as the digital systems are getting more and more complex. The implications are that we have a lot of work to do and solve many challenges as a multimedia system and networking community in collaboration with many other communities. It is very clear that a single computing community is not able to solve the many problems that are coming upon us in the space of multi-modal human-cyber-physical systems. Inter and cross-disciplinary research is the call of the day.

MR: How would you describe the role of women especially in the field of multimedia?

KN: “Difficult” comes to my mind. The number of women in multimedia computing is small and in multimedia systems and networks even smaller. I wish that the role and visibility of women in multimedia technology field would be greater when it comes to IEEE and ACM awards, conference leadership roles, editorial boards memberships, participations in SIGMM technical challenges, and other visible events and roles. Multimedia technology became such a ubiquitous base for numerous application fields including education, training, entertainment, health care, social work which have very strong representations of women in general. Hence, I believe that women in multimedia should play even more of a crucial role in the future than today, especially in innovation, leadership, and interconnection of multimedia computing technologies with the above mentioned application fields.

MR: How would you describe your top innovative achievements in terms of the problems you were trying to solve, your solutions, and the impact it has today and into the future?

KN: My top innovative achievements range from bringing a much better understanding into the field of Quality of Service (QoS) Management and Quality of Service Routing for multimedia systems and networks, to developing novel real-time and trusted resource management architectures and protocols for complex multi-modal applications, systems and networks such as the 3D teleimmersion, energy-efficient mobile multimedia, and trustworthy smart grid, to name few. My QoS research impact can be seen in current wide area wired and wireless networks and systems. The impact of the research management algorithms, architectures and systems that I and my research group have developed can be seen throughout the Microsoft, Google, HP, and IBM solutions where my graduate and undergraduate students took on an employment and brought with them research results and knowledge that then made their ways into multimedia applications, systems and network products.

MR: Over your distinguished career, what are your top lessons you want to share with the audience?

KN: The top lessons that I would like to share are: be patient, honest, open-minded, and fair; don’t give up; be humble but don’t be shy to “toot your own horn” when appropriate; listen what others have to say; and be respectful to others since everybody has something to contribute to the community and society in his/her own way.


An interview with Wallapak Tavanapong

MR: Describe your journey into computing from your youth up to the present.

Wallapak Tavanapong

Wallapak Tavanapong

Pak: I started learning about computing quite late. I did not know what a computer was until I joined a B.S. degree program in Computer Science at Thammasat University, Thailand, and learned the foundation there. After finishing the degree, I joined the M.S. program in Computer Science at the University of Central Florida (UCF), Orlando, Florida, USA. UCF was a great learning place for me. I had a wonderful advisor, Prof. Kien A. Hua, good classes, and great friends. My research at the time was video-on-demand, which was a hot topic then. After my Ph.D., I joined the Department of Computer Science at Iowa State University in 1999 as an Assistant Professor and was promoted to a Full Professor recently.

Iowa State University is a great place for my career. In the beginning, I continued on with the research in video-on-demand and multimedia caching. In 2003, my colleagues, Profs. JungHwan Oh, Piet C. de Groen, Johnny Wong, and I began investigating automated content analysis of endoscopic video for improving quality of the procedure. At the time, few works exist and mostly were on automated detection of polyp appearance in images. Our approach is to automatically analyze an entire procedure, calculate detailed objective metrics that reflect quality of inspection for the entire procedure, and provide real-time feedback to assist the endoscopist to improve the quality. We co-founded EndoMetric Corporation to transfer the technology into practice. I am glad that this research area receives much more attention now both in academia and industry. I am glad that our work has some influence on later work. In 2013, I began new interdisciplinary research and education initiatives in political informatics and computation communication and advertising.

MR: What foundational lessons did you learn from this journey?


First, never give up when facing difficulty. Second, there are several paths toward good research. I am more attracted to research problems in a different discipline. I like to create a new computing research problem out of vague problem descriptions in other disciplines. I love interdisciplinary research.

MR: Why were you initially attracted to multimedia?


My initial interest was in database research. As data began to come in different media types, extension to multimedia was natural.

MR: Tell us more about your vision and objectives behind your current roles? What do you hope to accomplish and how will you bring this about?


First, I’d like to see my research helps to prevent or reduce suffering from cancer for many. To achieve this goal, I need to do more to push my technology into practice. Second, I’d like to see computational thinking integrated into science and math curriculum in elementary schools in the US and other countries soon. Over the past five years, I have been engaging in our departmental K-12 outreach activities, coaching K-12 kids and interested K-12 teachers in computational thinking. I’d like to see more women in computer science and computing fields. In our K-12 outreach program, we found that young girls started losing their interest in science as early as the fifth grade. So, I hope to get them interested in computing early in the third grade. Last, I’d like to see that my interdisciplinary work with political scientists and communication scholars leads to a national social multimedia repository that is useful for social scientists and the public to learn about decision making in public policies that affect many lives.

MR: Can you profile your current research, its challenges, opportunities, and implications?

Pak: My top two projects are

  • Reconstruction of a virtual colon from 2D colonoscopic images:

    The human colon is a complex tubular structure with multiple twists and turns. A good colon exam increases early detection of colorectal cancer. I’d like to provide a 3D colon inspection map during the procedure for the endoscopist to know which areas inside the colon that they might have missed. There are many challenges. The most critical one is that commonly used endoscopes are not equipped with 3D camera positioning technology. I am working to add low-cost hardware equipment that provides some position information. I will utilize the position together with content analysis of endoscopic images to reconstruct the virtual colon. The work has a potential to increase the polyp detection rate during colonoscopy, preventing deaths and reducing pain and suffering.

  • Multimedia information system for political science and communication:

    This system would help answering research questions in political science and communication that could not have been answered before because of the sheer volume, variety, and velocity of data. Specifically, my team is working on understanding how states learn about policies from one another, how news reporters carry information from state legislatures to the public, how a public policy is influenced, etc. This is an application domain that lends itself to multimedia research, ranging from the underlying data management technology, automated content analysis of multiple media types and sources: web and video online ads, TV ads, state bills and laws, and tweets by political figures, to visualization of the resulting knowledge from the analysis.

MR: How would you describe the role of women especially in the field of multimedia?

Pak: I think the role of women in multimedia is same as men. But our number is much lower. We need to increase the number of women in the field. I believe that we need to get young girls interested in computing as early as elementary school.

MR: How would you describe your top innovative achievements in terms of the problems you were trying to solve, your solutions, and the impact it has today and into the future?

Pak: I would say that my top achievement so far is in the idea and the realization of real-time computer-aided analysis and feedback to improve quality of colonoscopy. We were the first to investigate this problem. There are several challenges, for instance, defining what to analyze that reflect quality as seen by the domain experts, coming up with effective algorithms to compute the quality measurements, showing that the automated measurement indeed improves quality, making the automated analysis real-time, effective, and low cost to be used in practice, deploying the technology for daily use in hospitals and clinics.

My technology has already saved a couple of lives and I would like it to do more in the future. I have seen more researchers in academia and industry get into this research area, which is great. We need more researchers and developers in multimedia and healthcare to help medical professions improve quality of care via automation.

MR: Over your distinguished career, what are your top lessons you want to share with the audience?

Pak: Never give up. Find good mentors who care about you, believe in you, and give you different perspectives. A peer mentor is great. I learn a lot from my colleagues. Find a research problem you are passionate about. Last, when realizing that there is a problem, do not complain, look for a good solution, and fix it.


An Interview with Cynthia Liem: The PHENICX Project


The PHENICX project is supported by the European Commission, FP7 (Seventh Framework Programme, STREP project, ICT-2011.8.2 ICT for access to cultural resources, grant agreement No 601166). The project is running for a year now and Cynthia Liem is involved since the initial planning and proposal writing. Currently, she is a work package leader in the project, and part of the overall project coordination team in the role of dissemination coordinator.

Partners in the project are Universitat Pompeu Fabra, Barcelona, ES; Delft University of Technology, NL; Johannes Kepler University Linz, AT; Austrian Research Institute for Artificial Intelligence, Vienna, AT; Video Dock BV, Amsterdam, NL; Royal Concertgebouw Orchestra, Amsterdam, NL; and Escola Superior de Música de Catalunya, Barcelona, ES. More information on the project can be found at

Q: What is the goal and scope of the PHENICX project?

PHENICX is about music and concert experiences. We want to use multimedia technologies to enhance the experience of a concert and make it more interesting and accessible for broad audiences. In this, we mainly focus on classical music.

Basically, the project has two sides. First of all, there is a content analysis side, in which we analyze concert performance data in a broad sense. We do not only look at an audio stream, but also e.g. at videos, gesture information, and social commenting information from people who attended concerts. Besides multiple modalities, we also try to take into account multiple perspectives: think of multiple cameras and microphones registering an orchestra, but also of multiple types of people (a conductor, orchestra musicians, or just your personal friends) speaking about a concert. Finally, a concert really is a multilayered phenomenon, with lots of things going on at the same time in which one could be potentially interested. The particular notes being played from a score are part of a larger structural whole; and while 130 individuals may be playing at the same time in a symphony orchestra, they form sub-groups which all have a particular role in the musical narrative and instrumental mix.

On the other side, it’s about the experience, about getting and keeping users from different consumer groups engaged. This is not just targeted at live attendance scenarios in the concert hall, but also for scenarios in which people attend concerts off-site through a live stream, or want to relive a concert on-demand after its performance. While for the content analysis part, we mostly focus on signal-oriented research topics, for this experience part we strongly look into topics such as recommendation, visualization and interaction. For example, how can you make the whole multilayered aspect of music more tangible? This can for example be done with automated score-following, through more simplified visualizations, but also by contrasting a particular performance against other existing performances of the same piece.

Our mission to broaden audiences for the classical music genre can be seen as a way of cultural heritage preservation using ICT. In the end, we really hope to see digital technology affecting culture consumption in a positive way. [As a concrete example, our partners Video Dock and the Royal Concertgebouw Orchestra already are working on a commercial tablet app called RCO Editions. The technologies we work on in PHENICX can really help in making the production of the app more scalable, expanding its feature set, and optimizing its user experience.

Q: Are there special organizational challenges?

In the project there are seven partners, four of them being academic partners. The three non-academic partners are major players in different parts of the music stakeholder spectrum, but have less experience with academic projects – especially the Royal Concertgebouw Orchestra, which really is involved for the first time in a large academic technology project. So in communicating and working with each other, there is always some translation needed between partners with different background and project experience levels. This is a very interesting organizational challenge in which we always try to find an optimal balance between different stakeholders.

Another potential challenge is language. Especially in the first year, we have been running a lot of focus groups to validate use cases. But while we have grown completely accustomed to using English in our daily academic work, as soon as you wish to interact with realistic local potential users of your technology in all project partner countries, you can’t take for granted these users have full expressive command of English (the younger generation typically does, but you don’t want to only reach them). And music is a very attractive topic for general public dissemination, since it’s a concrete part in many people’s lives; but once again, to make full use of this opportunity, you may have to look beyond English. So we’re having some dedicated organizational activities on that, working to also hold some studies and get some publicity material available in local languages.

Q: What is your personal relation to the project?

Well, I wrote a significant part of the proposal, so in that sense have a considerable relation to the project … but, at least as importantly, my musician background creates a strong personal link to this project. Having degrees in computer science and classical piano performance, I’m really interested in the interface between these two: working with music and digital data, using data technologies to improve on what you can learn and do with music – and PHENICX definitely is about this. So I’m very actively trying to use this double background for the project. It is especially useful for communication and dissemination: I can talk to people at the more musical side, many of which do not have extensive technical backgrounds, but also to those at the more technical side, who do not always have an extensive music background.

Funnily enough, the project also affected views I had from my own musicianship. The Royal Concertgebouw Orchestra is one of the most famous orchestras in the world. If you’re a music student in Holland, you can be backstage and engage with people from many national orchestras, but only the lucky few will manage to get even in the neighborhood of this particular orchestra. Now I’m having this connecting role in the project between academics and music stakeholders, and the orchestra became a project partner, I suddenly find myself being in their office quite often. I would never have expected that!

Besides that, with our work on user requirements and focus groups, I really managed to be in contact with actual audience. In our focus groups, we asked people why they liked going to concert performances, and we frequently heard people responding they valued feeling isolated from external influences in the concert hall, to have themselves being swept away by the music. Probably because a concert hall is a bit of a working space for me, I had totally forgotten this escapism aspect of concert attendance. So here, the project really made me aware of my own professional biases and ‘put me back on the ground’.

Q: Would you ever write an EU project proposal again?

Well, yes, I would, definitely with a consortium and project as inspiring as PHENICX. But I hope that next time I’ll have a bit more time than the three weeks in which we raced to completing the PHENICX proposal. 😉

Curriculum Vitae:

Cynthia Liem obtained her BSc and MSc degrees in Media and Knowledge Engineering (Computer Science) with honors at Delft University of Technology, The Netherlands, and currently is a PhD student at the Multimedia Information Retrieval Lab of the same university, working under the supervision of Prof. Alan Hanjalic. Besides, she holds Bachelor and Master of Music degrees in classical piano performance from the Royal Conservatoire in The Hague. Her research interests are strongly motivated by her background in both engineering and music and concentrate around multimedia content analysis for the music information retrieval domain.

From this background, she has been very active in getting music on the multimedia research agenda, particularly at the ACM Multimedia Conference, where she first initiated and served as the main organizer of the ACM MIRUM workshop (2011, 2012). This led to her becoming a co-chair of a dedicated ‘Music & Audio’ area at ACM MM 2013, and currently the more broadened ‘Music, Speech, and Audio Processing in Multimedia’ area for ACM MM 2014. She also was a main initiator of the EU FP7 PHENICX project (2013 – 2016), in which she now serves as work package leader and dissemination coordinator.

She is the recipient of several international scholarships and awards, including the Lucent Global Science Scholarship in 2005, the Google Anita Borg Scholarship in 2008, the Google European Doctoral Fellowship in Multimedia in 2010 (which partially supports her PhD research work), and the UfD Best PhD Candidate Award at Delft University of Technology in 2012. Besides her ongoing academic and musical activities, Cynthia has interned at Bell Labs Europe Netherlands, Philips Research, Google UK and Google Research, Mountain View, USA.

The interviewer, Mathias Lux, is a Associate Professor at the Institute for Information Technology (ITEC) at Klagenfurt University, where he has been since 2006. He received his M.S. in Mathematics in 2004 and his Ph.D. in Telematics in 2006 from Graz University of Technology. Before joining Klagenfurt University, he worked in industry on web-based applications, as a junior researcher at a research center for knowledge-based applications, and as research and teaching assistant at the Knowledge Management Institute (KMI) of Graz University of Technology. In research, he is working on user intentions in multimedia retrieval and production, visual information retrieval, and serious games. In his scientific career he has (co-) authored more than 60 scientific publications, has served in multiple program committees and as reviewer of international conferences, journals, and magazines, and has organized several scientific events. He is also well known for managing the development of the award-winning and popular open source tools Caliph & Emir and LIRE for visual information retrieval.

An interview with Aljosa Smolic


Dr. Aljosa Smolic joined Disney Research Zürich, Switzerland in 2009, as Senior Research Scientist and Head of the “Advanced Video Technology” group. Before he was Scientific Project Manager at the Fraunhofer Institute for Telecommunications, Heinrich-Hertz-Institut (HHI), Berlin, also heading a research group. He has been involved in several national and international research projects, where he conducted research in various fields of video processing, video coding, computer vision and computer graphics and published more than 100 refereed papers in these fields. In current projects he is responsible for research in 2D video, 3D video and free viewpoint video processing and coding.

Q: What is your main area of research?

A: I’m working on video processing in a general sense and visual computing. I’m interested in everything related to pixel processing like camera systems, processing visual information, perception and computational systems that are creating high quality output for the user.

Q: What got you interested in this area in the first place?

A: In my studies in electrical engineering I was focusing on audio processing, in a sense that if I wouldn’t become a rock star, I still could be an audio engineer. Then I got the opportunity to work at Fraunhofer HHI on Image Processing, where I turned my signal processing interests from audio to image processing, and that’s how I ended up here.

Q: Does your research & work influence your private life a lot, like owning a stereoscopic TV, taking a lot of videos and photos, etc.?

A: Yes, in a sense that I’m very critical on any type of visual information. I’m also very picky watching television and I notice all the small imperfections. I have an expert view on cinema, any type of multimedia presentation and audio. On the other hand I don’t create too much content myself. I don’t have a special camera and I don’t do too much of filming. And I don’t have too much of fancy 3D equipment for myself at home.

Q: Speaking of 3D equipment at home … Obviously 3D TV home equipment didn’t start off too well. Do you think 3D TV will rise again in say 10-15 years, or will we skip towards the “holodeck”?

A: The holodeck … I formulated that as my long term research question, so I’m still working on it and it’s still a long way. We are not yet there and stereo or 3D TV at home didn’t reach the broad adoption that many people thought of two or three years ago. I believe TV is a more difficult thing than for instance home cinema on Blue-Ray. I think business & technology based on 3D Blue-Ray disk work well. You can buy content, which is very well produced to be consumed in a situation very similar to watching a movie in a cinema. But I think it’s more difficult to adopt stereoscopic technology for the classic TV watching experience, which should be more social. The quality of the content should be better, and the need to wear glasses is not that accepted for watching TV.

Q: What are possible technological advances between now and the holodeck? Does something like Illumiroom (a project from Microsoft Research, that projects peripheral content around a screen) or higher resolutions like 4K will have an impact?

A: Things like Illumiroom and Philips Ambilight are all a step towards the holodeck as much as stereoscopic TV was. I believe there are a lot of more steps in different directions necessary in order to get a 3D immersive experience. Regarding higher resolutions, I’m not so enthusiastic about 4K. As from what I saw so far the difference between HD and 4k is very subtle. Only under very specific conditions and very specific distances you are able to perceive any difference. So I don’t think it matters that much and I don’t see that 4K will have that much of an impact over HD.

I rather look forward to HDR. I’ve seen a few demos which offered an impressive level of experience.

Those displays are starting to become available in professional and consumer markets.

Q: If you would re-start your PhD right now, would you end up in the same field or do you think there is another research direction that is more interesting to you right now?

A: I don’t know … I could always do theoretical physics and go to CERN to try to create black holes, which is always an option. The other option would be to work more on the rock star career. Well, but I’m pretty happy where I ended up right now.

Curriculum Vitae:

Dr. Aljoša Smolić joined Disney Research Zurich, Switzerland in 2009, as Senior Research Scientist and Head of the “Advanced Video Technology” group. Before he was Scientific Project Manager at the Fraunhofer Institute for Telecommunications, Heinrich-Hertz-Institut (HHI), Berlin, also heading a research group. He has been involved in several national and international research projects, where he conducted research in various fields of video processing, video coding, computer vision and computer graphics and published more than 100 referred papers in these fields. In current projects he is responsible for research in 2D video, 3D video and free viewpoint video processing and coding. He received the Dipl.-Ing. Degree in Electrical Engineering from the Technical University of Berlin, Germany in 1996, and the Dr.-Ing. Degree in Electrical Engineering and Information Technology from Aachen University of Technology (RWTH), Germany, in 2001. Dr. Smolic received the “Rudolf-Urtlel-Award” of the German Society for Technology in TV and Cinema (FKTG) for his dissertation in 2002. He is Area Editor for Signal Processing: Image Communication and served as Guest Editor for the Proceedings of the IEEE, IEEE Transactions on CSVT, IEEE Signal Processing Magazine, and other scientific journals. He chaired the MPEG ad hoc group on 3DAV pioneering standards for 3D video. In this context he also served as one of the Editors of the Multi-view Video Coding (MVC) standard. Since many years he is teaching full lecture courses on Multimedia Communications and other topics, now at ETH Zurich.

Dr. Mathias Lux is a Senior Assistant Professor at the Institute for Information Technology (ITEC) at Klagenfurt University, where he has been since 2006. He received his M.S. in Mathematics in 2004 and his Ph.D. in Telematics in 2006 from Graz University of Technology. Before joining Klagenfurt University, he worked in industry on web-based applications, as a junior researcher at a research center for knowledge-based applications, and as research and teaching assistant at the Knowledge Management Institute (KMI) of Graz University of Technology. In research, he is working on user intentions in multimedia retrieval and production, visual information retrieval, and serious games. In his scientific career he has (co-) authored more than 60 scientific publications, has served in multiple program committees and as reviewer of international conferences, journals, and magazines, and has organized several scientific events. He is also well known for managing the development of the award-winning and popular open source tools Caliph & Emir and LIRE for visual information retrieval.