An interview with Prof. Alan Smeaton

Prof. Alan Smeaton in 2017.

A young Alan Smeaton before the start of his career.

The young Alan Smeaton before the start of his career.

Please describe your journey into computing from your youth up to the present. What foundational lessons did you learn from this journey? Why were you initially attracted to multimedia?

I started a University course in Physics and Mathematics and in order to make up my credits I needed to add another subject so I chose Computer Science, which was then a brand new topic in the Science Faculty.  Maybe it was because the class sizes were small so the attention we got was great, or maybe I was drawn to the topic in some other way but I dropped the Physics and took the Computer Science modules instead and I never looked back.  I was fortunate in that my PhD supervisor was Keith van Rijsbergen who is one of the “fathers” of information retrieval and who had developed the probabilistic model of IR. Having him as my supervisor was the first lucky thing to have happened to me in my research. His approach was to let me make mistakes in my research, to go down cul-de-sacs and discover them myself, and as a result I emerged as a more rounded, streetwise researcher and I’ve tried to use the same philosophy with my own students.  

For many years after completing my PhD I was firmly in the information retrieval area. I hosted the ACM SIGIR Conference in Dublin in the mid 1990s and was Program Co-Chair in 2003, and workshops, tutorials, etc. chair in other years. My second lucky break in my research career happened in 1991 when Donna Harman of NIST asked me if I’d like to join the program committee of a new initiative she was forming called TREC, which was going to look at information retrieval on test collections of documents and queries but in a collaborative, shared framework.  I jumped at the opportunity and got really involved in TREC in those early years through the 1990s. In 2001 Donna asked me if I’d chair a new TREC track that she wanted to see happen, doing content analysis and search on digital video which was then emerging and in which our lab was establishing a reputation for novel research.  Two years later that TREC activity had grown so big it was spawned off as a separate activity and TRECVid was born, starting formally in 2003 and continuing each year since then. That’s my third lucky break.

Sometime in the early 2000s I went to my first ACM MULTMEDIA conference because of my leading of TRECVid, and I loved it. The topics, the openness, the collaborations, the workshops, the intersection of disciplines all appealed to me and I don’t think I’ve missed an ACM MULTIMEDIA Conference since then.

Talking about ACM MULTIMEDIA, this year emerged some critics that there was no female keynote speaker. What do you think about this and how do you see the role of women in research and especially in the field of multimedia?

The first I heard of this was when I saw it on the conference website and that is when I realised it and I don’t agree with it. I will be proposing several initiatives to the Executive Committee of SIGMM to improve the gender balance and diversity in our sponsored conferences, covering invited panel speakers, invited keynote speakers, raising the importance of the women’s lunch event at the ACM MULTIMEDIA conference starting with this year.  I will also propose including a role for a Diversity Chair in some of the SIGMM sponsored events.  I’ve learned a lot in a short period of time from colleagues in ACM SIGCHI whom I reached out to for advice, and I’ve looked at the practices and experiences of conferences like ACM CHI, ACM UIST, and others.  However these are just suggestions at the moment and need to be proposed and approved by the SIGMM Executive so I can’t say much more about them yet, but watch this space.

Tell us more about your vision and objectives behind your current roles? What do you hope to accomplish and how will you bring this about?

I hold a variety of roles in my Professional work. As a Professor and teacher I am responsible for delivering courses to first year first semester undergraduates which I love doing because these are the fresh-faced students just arriving at University. I also teach at advanced Masters level and that’s something else I love, albeit with different challenges. As a Board member of the Irish Research Council I help oversee the policies and procedures for Council’s funding of about 1,400 researchers from all disciplines in Ireland. I’m also on the inaugural Scientific Committee of COST, the EU funding agency which funds networking of researchers across more than 30 EU countries and further field. Each year COST funds networking activities for over 40,000 researchers across all disciplines, which is a phenomenal number and my role on the Scientific Committee is to oversee the policies and procedures and help select those areas (called Actions) that get funded.  

Apart from my own research team and working with them as part of the Insight Centre for Data Analytics, and the work I do each year in TRECVid, the other major responsibility I have is as Chair of ACM SIGMM, a role I took up in July 2017, just 2 months ago.  While I had a vision of what I believed should happen in SIGMM and I wrote some of this in my candidature statement (can be found at the bottom of the interview), since assuming the role and realising what SIGMM is like “from the inside” I am seeing that vision and objectives evolve as I learn more. Certainly there are some fundamentals like welcoming and supporting early career researchers, broadening our reach to new communities both geographical and in terms of research topics, ensuring our conferences maintain their very high standards, and being open to new initiatives and ideas, these fundamentals will remain as important.  We expect to announce a new annual conference in multimedia for Asia shortly and that will be added to the 4 existing annual events we run.   In addition I am realising that we need to increase our diversity, gender being one obvious instance of that but there are others.  Finally, I think we need to constantly monitor what is our identity as a community of researchers linked by the bond of working in Multimedia. As the area of Multimedia itself evolves, we have to lead and drive that evolution, and change with it.

I know that may not seem like a lot of aspiration without much detail but as I said earlier, that’s because I’m only in the role a couple of months and the details of these need to be worked out and agreed with the SIGMM Executive Committee, not just me alone, and that will happen over the next few months.

Prof. Alan Smeaton in 2017.

Prof. Alan Smeaton in 2017.

That multimedia evolves is an interesting statement. I often heard people discussing about the definition of multimedia research and they are quite diverse. What is your “current” definition of multimedia research?

The development of every technology has a similar pathway. Multimedia is not a single technology but a constellation of technologies but it has the same kind of pathway. It starts from a blue skies idea that somebody has, like lets put images and sound on computers, and then it becomes theoretical research perhaps involving modelling in some way. That then turns into basic research about the feasibility of the idea and gradually the research gets more and more practical. Somewhere along the way, not necessarily from the outset, applications of the technology are taken into consideration and that is important to sustain the research interest and funding. As applications for the technology start to roll out, this triggers a feedback loop with more and more interest directed back towards the theory and the modelling, improving the initial ideas and taking them further, pushing boundaries of the implementations and making the final applications more compelling, cheaper, faster, greater reach, more impact, etc.  Eventually, the technology may get overtaken by some new blue skies idea leading to some new theories and some new feasibilities and practical applications. Technology for personal transport is one such example with horse-drawn carriages leading to petrol-driven cars and as we are witnessing, into other forms of autonomous, electric-power vehicles.

Research into multimedia is in the mid-life stage of the cycle. We’re in that spiral where new foundational ideas, new theories, new models for those theories, new feasibility studies, new applications, and new impacts, are all valid areas to be working in, and so the long answer to your question about my definition of multimedia research is that it is all of the above.

At the conference people often talk about their experience that their research got criticized for being too applied which seems to be a general problem of multimedia hearing it from so many. Based on your experience in national and international funding panels it would be interesting hear your opinion about this issue and how researchers in the multimedia community could tackle it.

I’ve been there too, so I understand what they are talking about.  Within our field of multimedia we cover a broad church of research topics, application areas, theories and techniques and to say a piece of work is too applied is an inappropriate criterion for it not to be appreciated.  

“Too applied” should not be confused with research impact as research impact is something completely different.  Research impact refers to when our research contributes or generates some benefit outside of academic or research circles and starts to influence the economy or society or culture. That’s something we should all aspire to as members of our society and when it happens it is great. Yet not all research ideas will develop into technologies or implementations that have impact.  Funding agencies right across the world now like to include impact as part of their evaluation and assessment and researchers are now expected to include impact assessment as part of funding proposals.

I do have concerns that for really blue skies research the eventual impact cannot really be estimated. This is what we call high risk / high return and while some funding agencies like the European Research Council actively promote such high risk exploratory work, other agencies tend to go for the safer bet. Happily, we’re seeing more and more of the blue skies funding like the Australian Research Council’s and the Irish Research Council’s Laureate schemes

Can you profile your current research, its challenges, opportunities, and implications?

This is a difficult question for me to answer since the single most dominant characteristics of my research are that it is hugely varied and it is based on a large number of collaborations with researchers in diverse areas. I am not a solo researcher and while I respect and admire those who are, I am at the opposite end of that spectrum. I work with people.

For example today, as I write this interview, is been a busy day for me in terms of research.  I’ve done a bit of writing on a grant proposal I’m working on which proposes using data from a wearable electromyography coupled with other sensors, in determining the quality of a surgical procedure.  I’ve reviewed a report from a project I’m part of which uses low-grade virtual reality in a care home for people with dementia.  I’ve looked at some of the sample data we’ve just got where we’re applying our people-counting work to drone footage of crowds. I wrote a section of a paper describing our work on human-in-the-loop evaluation of video captioning and I met a Masters student who is doing work on propensity modelling for a large bank, and now at the end of the day I’m finishing this interview. That’s an atypical day for me but the range of topics is not unusual.  

What are the challenges and opportunities in this … well it is never difficult to get motivated because the variety of work makes it so interesting, so the challenge is in managing them so that they each get a decent slice of time and effort. Prioritisation of work tasks is a life skill which is best learned the hard way, it is something we can’t teach and while to some people it comes naturally for most of us it is something we need to be aware of.  So if I have a takeaway message for the young researcher it is this … always try to make your work interesting and to explore interesting things because then it is not a chore, it becomes a joy.

This was an very inspiring answer and I think described perfectly how diverse and interesting multimedia research is. Thinking about the list of your projects you describe it seems that all of them address societal important challenges (health care, security, etc.) How important do you think it is to address problems that are helpful for the society and do you think that more researchers in the field of multimedia should follow this path?

I didn’t deliberately set out to address societal challenges in my work and I don’t advocate that everyone should do so in all their work. The samples of my work I mentioned earlier just happen to be like that but sometimes it is worth doing something just because it is interesting even though it may end up as a cul-de-sac. We can learn so much from going down such cul-de-sacs both for ourselves as researchers, for our own development, as well as contributing to knowledge that something does not work.

In your whole interview so far you did not mention A.I. or deep learning. Could you please share your view on this hot topic and its influence on the multimedia community (if possible positive and negative aspects)?

Imagine, a whole conversation on multimedia without mentioning deep learning, so far !  Yes indeed it is a hot topic and there’s a mad scramble to use and try it for all kinds of applications because it is showing such improvement in many tasks and yes indeed it has raised the bar in terms of the quality of some tasks in multimedia, like concept indexing from visual media. However those of us around long enough will remember the “AI Winter” from a few decades ago, and we can’t let this great breakthrough raise expectations that we and others may have about what we can do with multi-modal and multimedia information.

So that’s the word of caution about expectations, but when this all settles down a bit and we analyse the “why” behind the success of deep learning we will realise that the breakthrough is as a result of closer modelling of our own neural processes. Early implementations of our own neural processing was in the form of  multi-connected networks, and things like the Connection Machine were effectively unstructured networks. What deep learning is doing is it is applying structure to the network by adding layers. Going forward, I believe we will turn more and more to neuroscience to inform us about other more sophisticated network structures besides layers, which reflect how the brain works and, just as today’s layered neural networks replicate one element we will use other neural structures for even more sophisticated (and deeper) learning.

ACM candidature statement:

I am honored to run for the position of Chair of SIGMM. I am an active member of ACM since I hosted the SIGIR conference in Dublin in 1994 and have served in various roles for SIGMM events since the early 2000s.

I see two ways in which we can maintain and grow SIGMM’s relevance and importance. The first is to grow collaborations we have with other areas. Multimedia technologies are now a foundation stone in many application areas, from digital humanities to educational technologies, from gaming to healthcare. If elected chair I will seek to reach out to other areas collaboratively, whereby their multimedia problems become our challenges, and developments in our area become their solutions.

My second priority will be to support a deepening of collaborations within our field. Already we have shown leadership in collaborative research with our Grand Challenges, Videolympics, and the huge leverage we get from shared datasets, but I believe this could be even better.
By reaching out to others and by deepening collaborations, this will improve SIGMM’s ability to attract and support new members while keeping existing members energised and rejuvenated, ensuring SIGMM is the leading special interest group on multimedia.

An interview with Prof. Ramesh Jain

Prof. Ramesh Jain in 2016.

Prof. Ramesh Jain in the year 2016.

Prof. Ramesh Jain in 2016.

Please describe your journey into computing from your youth up to the present. What foundational lessons did you learn from this journey? Why you were initially attracted to multimedia?

I am luckier than most people in that I have been able to experience really diverse situations in my life. Computing was just being introduced at Indian Universities when I was a student, so I never had a chance to learn computing in a classroom setting.  I took a few electronics courses as part of my undergraduate education, but nothing even close to computing.  I first used computers during my doctoral studies at the Indian Institute of Technology, Kharagpur, in 1970.  I was instantly fascinated and decided to use this emerging technology in the design of sophisticated control systems.  The information I picked up along the way was driven by my interests and passion.

I grew up in a traditional Indian Ashram, with no facilities for childhood education, so this was not the first time I faced a lack of formal instruction.  My father taught me basic reading, writing, and math skills and then I took a school placement exam.  I started school at the age of nine in fifth grade.

During my doctoral days, two areas fascinated me: computing and cybernetics.  I decided to do my research in digital control systems because it gave me a chance to combine computing and control.  At the time, the use of computing was very basic—digitizing control signals and understanding the effect of digitalization.  After my PhD, I became interested in artificial intelligence and entered AI through pattern recognition.  

In my current research, I am applying cybernetics to health.  Computing has finally matured enough that it can be applied in real control systems that play a critical role in our lives.  And what is more important to our well-being than our health?

The main driver of my career has been realizing that ultimately I am responsible for my own learning. Teachers are important, but ultimately I learn what I find interesting.  The most important attribute in learning is a person’s curiosity and desire to solve problems.  

Something else significantly impacted my thinking in my early research days.  I found that it is fundamental to accept ignorance about a problem and then examine concepts and techniques from multiple perspectives.  One person’s or one research paper’s perspective is just that—an opinion.  By examining multiple perspectives and relating those to your experiences, you can better understand a problem and its solutions.

Another important lesson is that problems or concepts are often independent of the academic and other organisational walls that exist.  Interesting problems always require perspectives, concepts, and technologies from different academic disciplines. Over time, it’s then necessary to create to new disciplines, or as Thomas Kuhn called them new paradigms [Kuhn 62].

In the late 1980s, much of my research was addressing different aspects of computer vision.  I was frustrated by the slow progress in computer vision.  In fact, I coauthored a paper on this topic that became quite controversial [Jain 91].  It was clear that computer vision could be central to computing in the real world, such as in industry, medical imaging, and robotics, but it was unable to solve any real problems.  Progress was slow.  

While working on object recognition, it became increasingly obvious to me that images alone do not contain enough information to solve the vision problem.  Projection of real-world images to a photograph results in a loss of information that can only be recovered by combining information from many other sources, including knowledge in many different forms, metadata, and other signals.  I started thinking that our goal should be to understand the real world using sensors and other sources of knowledge, not just images.  I felt that we were addressing the wrong problem—understanding the physical world using only images.  The real problem is to understand the physical world.  The physical world can only be understood by capturing correlated information.  To me, this is multimedia: understand the physical world using multiple disparate sensors and other sources of information.

This is a very good definition of multimedia. In this context, what do you think is the future of multimedia research in general?

Different aspects of physical world must be captured using different types of sensors. In early days, multimedia concerned itself with the two most dominant human senses:vision and hearing. As the field is advancing, we must deal with every type of sensor that is developed to capture information in different applications. Multimedia must become the area that processes disparate data in context to convert it to information.

Taking into account that you are working with AI for such a long time, what do you think about the current trend of deep learning and how it will develop?

Every field has its trends. Learning is definitely a very important step in AI and has attracted attention from early days. However, it was known that reasoning and search play equally important role in AI. Ultimately problem solving depends on recognizing real world objects and patterns and here learning plays key role. To design successful deep systems, learning needs to be combined with search and reasoning.

Prof. Ramesh Jain at an early stage of his career (1975).

Prof. Ramesh Jain at an early stage of his career (1975).

Please tell us more about your vision and objectives behind your current roles. What do you hope to accomplish, and how will you bring this about?

One thing that is of great interest to every human is their health.  Ironically, technology utilization in healthcare is not as pervasive as in many other fields.  Another intriguing fact about technology and health is that almost all progress in health is due to advances in technology, but barriers to using technology are also the most overwhelming in health.  I experienced the terrifying state of healthcare first hand while going through treatment for gastro-esophageal cancer in 2004.  It became clear to me during my fight with cancer that technology could revolutionize most aspects of treatment—from diagnosis to guidance and operationalization of patient care and engagement—but it was not being used.  During that period, it became clear to me that multimodal data leading to information and knowledge is the key to success in this and many other fields.  That experience changed my thinking and research.

Ancient civilizations observed that health is not the absence of disease; disease is a perturbation of a healthy state.  This wisdom was based on empirical observations and resulted in guidelines for healthy living that includes diet, sleep, and whole-body exercise, such as yoga or tai chi.  Now is the time to develop scientific guidelines based on the latest evolving knowledge and technology to maximize periods of overall health and minimize suffering during diseases in human lives.  It seems possible to raise life expectancy to 100+ years for most people.  I want to cross the 100-year threshold myself and live an active life until my last day.  I am working toward making that happen.

Technology for healthcare is increasingly a popular topic.  Data is at the center of healthcare, and new areas like precision health and wellness are becoming increasingly popular. At the University of California, Irvine (UCI), we’ve created a major effort to bring together researchers from Information and Computer Sciences, Health Sciences, Engineering, Public Health, Nursing, Biology, and others fields who are adopting a novel perspective in an effort to build technology that empowers people. From this perspective, we adopt a cybernetics approach to health.  This work is being done at the UCI’s Institute for Future Health, of which I am the founding director.  

At the Institute for Future Health, currently we are building a community that will do academic research as well as work closely with industry, local communities, hospitals, and start-up companies. We will also collaborate with global researchers and practitioners interested in this approach.  There is significant interest from several institutions in several countries to collaborate and pursue this approach.

This is very interesting and relevant! Do you think that the multimedia community will be open for such a direction or since it is so important and societal relevant would it be good to built a new research community around this idea?

As you said, this is the most important research direction I have been involved in and most challenging. And this is an important direction in itself — this needs to happen using all tech and other resources.

Since I can not wait for any community to be ready to address this, I started building a community to address Future Health. But, I believe that this could be the most relevant application for multimedia technology as well as the techniques from multimedia are very relevant to this area.

Exciting problem because the time is right to address this area.

Do you think that the multimedia community has the right skills to address medical multimedia problems and how could the community be encouraged into that direction?

Multimedia community is better equipped than any other community to deal with diverse types of data. New tools will be required for new challenges, but we already have enough tools and techniques to address many current challenges. To do this, however, the community has to become an open forward looking community going beyond visual information to consider all other modes that are currently ignored under ‘meta data’. All data is data and contributes to information.

Can you profile your current research and its challenges, opportunities, and implications?

I am involved in a research area that is one of the most challenging and that has implications for every human.

The most exciting aspect of health is that it is truly a multimodal data-intensive operation.  As discussed by Norbert Wiener in his book Cybernetics [Wiener 48] about 75 years ago, control and communication processes in machines and animals are similar and are based on information.  Until recently, these principles formed the basis for understanding health, but they can now be used to control health as well.  This is exciting for everybody, and it motivates me to work hard and make something happen. For others, but also for me.

We can discuss some fundamental components of this area from a cybernetics/information perspective:

Creating individual health model:  Each person is unique.  Our bodies and lives are determined by two major factors:  genetics and lifestyle.  Until recently, personal genome information was difficult to obtain, and personal lifestyle information was only anecdotally collected.  This century is different. Personal genomic, in fact all Omics, data is becoming easier to get and more precise and informative. And mobile phones, wearables, the Internet of Things (IoTs) around us, and social media are all coming together to quantitatively determine different aspects of our lifestyles as well as many bio-markers.

This requires combining multimodal data from different sources, which is a challenge. By collecting all such lifestyle data, we can start assembling a log of information—a kind of multimodal lifelog on turbo charge—that could be used to build a model of a person using event mining tools.  By combining genomic and lifestyle data, we can form a complete model of a person that contains all detailed health-related information.

Aggregating individual health models to population disease models:  Current disease models rely on limited data from real people.  Until recently, it was not possible to gather all such data. As discussed earlier, the situation is rapidly changing.  Once data is available for individual health models, it could be sliced and diced to formulate disease models for different populations and demographics.  This will be revolutionary.

Correlating health and related knowledge to actions for each individual and for society: Cybernetics underlies most complex engineering real-time systems.  The concept of feedback used generate a correct signal to be applied to a system to take it from the current state to a desired state is essential in all real-time control systems.  Even for the human body, homeostasis uses similar principles.  Can we use this to guide people in their lifestyle choices and medical compliance?  

Navigation systems are a good example of how an old, tedious problem can become extremely easy to use.  Only 15 years ago, we needed maps and a lot of planning to visit new places.  Now, mobile navigation systems can anticipate upcoming actions and even help you correct your mistakes gracefully, in real time.  They can also identify traffic conditions and suggest the best routes.

If technology can do this for navigation in the physical world, can we develop technology to help us select appropriate lifestyle decisions and do so perpetually?  The answer is obviously yes.  By compiling all health and related knowledge, determining your current personal health situation and surrounding environmental situations, and using your past chronicle to log your preferences, it can provide you with suggestions that will make your life not only more healthy but also more enjoyable.

This is our dream at the Institute for Future Health.

Future Health: Perpetual enhancement of health by managing lifestyle and environment.

Future Health: Perpetual enhancement of health by managing lifestyle and environment.

4) How would you describe your top innovative achievements in terms of the problems you were trying to solve, your solutions, and the impact it has today and into the future?

I am lucky to have been active for more than four decades and to have had the opportunity to participate in research and entrepreneurial activities in multiple countries at the best organizations. This gave me a chance to interact with the brightest young people as well as seasoned creative visionaries and researchers.  Thus, it is difficult for me to decide what to list.  I will adopt a chronological approach to answer your question.

Working in H.H. Nagel’s research group in Hamburg Germany, I got involved in developing an approach to motion detection and analysis in 1976.  We wrote the first papers on video analysis that worked with traffic video sequences and detected and analyzed the motion of cars, pedestrians, and other objects.  Our paper at IJCAI 1977 [Jain 77] was remarkable in showing these results at a time when digitizing a picture was a chore lasting minutes and the most powerful computer could not store a full video frame in its memory.  Even today, the first step in many video analysis systems is differencing, as proposed in that work.

Many bright people contributed powerful ideas in computer vision from my groups.  E. North Coleman was possibly the first person to propose Photometric Stereo in 1981 [Coleman].  Paul Besl’s work on segmentation using surface characteristics and 3D object recognition made a significant impact [Besl]. Tom Knoll did some exciting research on feature-indexed hypotheses for object recognition.  But Tom’s major contribution to current computer technology was his development of Photoshop when he was doing his PhD in my research group.  As we all know, Photoshop revolutionized how we view photos. Working with Kurt Skifstad at my first company Imageware, we demonstrated the first version of capturing a 3D shape of a person’s face and reproducing it using a machine in the next room at the Autofact Conference in 1994. I guess that was a primitive version of 3D printing.  At the time, we called it 3D fax.

The idea of designing a content-based organization to build a large database of images was considered crazy in 1990, but it bugged me so much that I started first a project and later a company, Virage, working with several people.  In fact, Bradley Horowitz left his research at MIT to join me in building Virage and later he managed the project that brought Google Photos to its current form.  That process building video databases resulted in my realizing that photos and videos are a lot more than just intensity values.  And that realization lead me to champion the idea that information about the physical world can be recovered more effectively and efficiently by combining correlated, but incomplete, information from several sources, including metadata.  This was the thinking that encouraged me to start building the multimedia community.

Since computing and camera technology had advanced enough by 1994, my research group at the University of California, San Diego (UCSD), particularly Koji Wakimoto[Jain 95] and then Arun Katkere and Saeed Moezzi [Moezzi 96] helped in developing initially Multiple Perspective Interactive Video and later Immersive video to realize compelling telepresence.  That research area in various forms attracted people from the movie industry as well as people interested in different art forms and collaborative spaces.  By licensing our patents from UCSD, we started a company Praja to bring immersive video technology to sports.  I left academia to be the CEO of Praja.

While developing technology for indexing sporting events, it became obvious that events are as important as objects, if not more, when indexing multimedia data.  Information about events comes from separate sources, and events combine different dimensions that play a key role in our understanding of the world.  This realization resulted in Westermann and I working on a general computational model for events.  Later we realized that by aggregating events over space and time, we could detect situations.  Vivek Singh and Mingyan Gao helped prototype an EventShop platform [Singh 2010], which was later converted to an open source platform under the leadership of Siripen Pongpaichet.

One of the most fundamental problems in society is connecting people’s needs to appropriate resources effectively, efficiently, and promptly in a given situation.  To understand people’s needs, it is essential to build objective models that could be used to recommend correct resources in given situations.  Laleh Jalali started building an event-mining framework that could be used to build an objective self model using the different types of data streams related to people that have now become easily available [Jalali 2015].  

All this work is leading to a framework that is behind my current thinking related to health intelligence. In health intelligence, our goal is to perpetually measure a person’s activities, lifestyle, environment, and bio-markers to understand his/her current state as well as continuously build his/her model. Using that model, current state, and medical knowledge, it is possible to provide perpetual guidance to help people take the right action in a given situation.

Over your distinguished career, what are the top lessons you want to share with the audience?

I have been lucky to get a chance to work on several fun projects.  More importantly, I have worked closely on an equal number of successful and not so successful projects. I consider a project successful if it accomplishes its goal and the people working on the project enjoy it.  Although each project is unique, I’ve noticed that some common themes make for a project successful.

Passion for the Project:  Time and again, I’ve seen that passion for the project makes a huge difference. When people are passionate, they don’t consider it work and will literally do whatever is required to make it successful.  In my own case, I find that the ideas that I find compelling, both in terms of their goals and implications, are the ones that motivate me to do my best.  I am focused, driven, and willing to work hard.  I learned long ago to work only on problems that I find important and compelling.  Some ideas are just not for me.  Otherwise, it is better for the project and for me if I dissociate with it at the first opportunity to do so.

Open Mind:  Departmental or similar boundaries in both academia and industry severely restrict how a problem is addressed.  Solving a problem should be the goal, not using the resources or technology of a specific department.  In academia, I often hear things like “this is not a multimedia problem” or “this is database problem.”  Usually, the goal of a project is to solve a problem, so we should use the best technique or resource available to solve the problem.

Most of the boundaries for academic disciplines are artificial, and because they keep changing, the departments based on any specific factor will likely also change over time.  By addressing challenging problems using appropriate technology and resources, we push boundaries and either expand older boundaries or create new disciplines.

Another manifestation of an open mind is the ability to see the same problem from multiple perspectives.  This is not easy—we all have our biases.  The best thing to do is to form a group of researchers from diverse cultural and disciplinary backgrounds.  Diversity naturally results in diverse perspectives.

Persistence:  Good research is usually the result of sustained efforts to understand and solve a challenge.  Many intrinsic and extrinsic issues must be handled during a successful research journey. By definition, an important research challenge requires navigating unchartered territories.  Many people get frustrated in an unmapped area and when there is no easy way to evaluate progress.  In my experience, even some of my brightest students are comfortable only when they can say I am better than X approach by N%.  In most novel problems, there is no X and no metrics to judge performance. Only a few people are comfortable in such situations where incremental progress may not be computable.  We require both kinds of people: those who can improve given approaches and those who can pioneer new areas.  The second group requires people that can be confident about their research directions without having concrete external evaluation measures.  The ability to work confidently without external affirmation is essential in important deep challenges.

In the current culture, a researcher’s persistence is also tested by “publish or perish” oriented colleagues who determine the quality of research by acceptance rates at the so-called top conferences. When your papers are rejected, you are dejected and sometimes feel that you are doing the wrong research.  Not always true.  The best thing about these conferences is that they test your self-confidence.

We have all read the stories about the research that ultimately resulted in the WWW and the paper on PageRank that later became the foundation of Google search.  Both were initially rejected. Yet, the authors were confident in their work so they persevered.  When one of my papers gets rejected (which is more often the case than with my much inferior papers), much of the time the reviewers are looking for incremental work—the trendy topics—and don’t have time, openness, and energy to think beyond what they and their friends have been doing. I read and analyze reviewers’ comments to see whether they understood my work and then decide whether to take them seriously or ignore them.  In other words, you have to be confident of your own ideas and review the reviews to decide your next steps.

I noticed that one of your favourite quotes is “Imagination is more important than knowledge.” In this regard, do you think there is enough “imagination” in today’s research, or are researchers mainly driven/constrained by grants, metrics, and trends? 

The complete quote by Albert Einstein is “Imagination is more important than knowledge. For knowledge is limited, whereas imagination embraces the entire world, stimulating progress, giving birth to evolution.”  So knowledge begins with imagination. Imagination is the beginning of a hypothesis. When the hypothesis is validated, that results in knowledge.

People often seek short-term rewards.  It is easier to follow trends and established paradigms than to go against them or create new paradigms.  This is nothing new; it has always happened. At one time scientists, like Galileo Galilei, were persecuted for opposing the established beliefs. Today, I only have to worry about my papers and grant proposals getting rejected.  The most engaged researchers are driven by their passion and the long-term rewards that may (or may not) come with it.

Albert Einstein (Source: Planet Science)

Albert Einstein (Source: Planet Science)


  1. Kuhn, T. S. The Structure of Scientific Revolutions. Chicago: University of Chicago Press, 1962. ISBN 0-226-45808-3
  2. R. Jain and T. O. Binford, “Ignorance, Myopia, and Naiveté in Computer    Vision Systems,” CVGIP, Image Understanding, 53(1), 112-117. 1991.   
  3. Norbert Wiener, Cybernetics: Or Control and Communication in the Animal and the Machine. Paris, (Hermann & Cie) & Camb. Mass. (MIT Press) ISBN 978-0-262-73009-9; 2nd revised ed. 1961.
  4. R. Jain, D. Militzer and H. Nagel, “Separating a Stationary Form from Nonstationary Scene Components in a Sequence of Real World TV Frames,” Proceedings of IJCAI 77, Cambridge, Massachusetts, 612-618. 1977.
  5. E. N. Coleman and R. Jain, “Shape from Shading for Surfaces with Texture    and Specularity,” Proceedings of IJCAI. 1981.  
  6. P. Besl, and R. Jain, “Invariant Surface Characteristics for 3-D Object     Recognition in Depth Maps,” Computer Vision, Graphics and Image Processing, 33, 33-80. 1986.
  7. R. Jain and K. Wakimoto, “Multiple Perspective Interactive Video,” Proceedings of IEEE Conference on Multimedia Systems. May 1995.
  8. S. Moezzi, Arun Katkere, D. Kuramura, and R. Jain, “Reality Modeling    and Visualization from Multiple Video Sequences,” IEEE Computer     Graphics and Applications, 58-63. November 1996.
  9. Vivek Singh, Mingyan Gao, and Ramesh Jain,”Social Pixels: Genesis and evaluation”, Proc. ACM Multimedia, 2010.
  10. Laleh Jalali, Ramesh Jain: Bringing Deep Causality to Multimedia Data Streams. ACM Multimedia 2015: 221-230



An interview with David Ayman Shamma




Describe your journey into computing from your youth up to the present. What foundational lessons did you learn from this journey? Why were you initially attracted to multimedia?

I’ve always been curious about solving problems.  Not so much the answer but actually I like to know how a problem can be broken down into parts, abstracted, and reasoned with—which often drives us to think about abstraction (is there a non-specific instance of this problem), theory (is there some known literature from the mathematical or social sciences that will help us frame what’s happening, and analogy (can we solve this because its structure is like another problem?).  My education included classes in psychology, philosophy, math, and engineering; eventually I realized Computer Science and specifically Artificial Intelligence embodied everything I was looking for: understanding people, modeling problems, and building new systems.

Interestingly enough, as an undergrad I took a job in an art department at the local state college as a technician; my job was to keep their Macs running with Adobe products. While I was there, I was allowed to audit studio art classes.  I began to see how artistic and creative processes were influenced by the tools we have—be it a 1:50 D-76 bath with fiber based paper in a darkroom or masking layers in Photoshop.  This connection between creative and constructive processes carried into my work at NASA’s Center for Mars Exploration where I worked on diagrammatic knowledge tools and then into my Ph.D on community driven Multimedia systems. It was around this time that I saw ACM Multimedia 2004 had a call for technical papers in the Interactive Arts.  Since then I’ve been active in the community, mostly focused on the Arts track but as my work began to include social computing in 2009 I started to think about hybrid social-visual systems.  In 2013, I was the Technical Program Co-chair, and  we started to look critically at the broad technical areas, the review process, and started some inclusion and diversity initiatives.

The main foundational lesson for me is to continue asking the right questions, even if you’re branching stemming out of some smaller, under-represented area or track.  In many cases, you’ll find new exciting research questions.  That said, I found I need to couple this with a personal understanding of the outside domain; only then can a truly functional hybrid system work; it’s not enough to look at divergent sources as just a big bag of the same data—pixels, tags, comments, clicks, they all carry an explicit or tacit semantic implication; respect that.

Tell us more about your vision and objectives behind your current roles? What do you hope to accomplish and how will you bring this about?

My Ph.D. dealt with social computing and community semantics: the objects in a photo carry a broader semantic conversation context of the online site sharing that photo. When I graduated, I joined an industry research lab. I spent 10 years there through a few organizational shifts. In my last 4 years there I founded the HCI Research group with a charter on investigating what our research meant to people.  My group’s research spanned across several domains: multimedia, computer vision, information visualisation, social computing, ethnography, and physical computing; this gave me deep perspective across many areas.  Personally, understanding how things are connected and what those connections meant became a focus of my research.  Data is created for a reason and structured link data can carry a tacit semantic that helps us understand people and tasks in the world. Lately, I’ve been thinking about physical spaces where people interact and create content. What sort of camera do you have on you? How does it change your practice of photography? What sensors might be in your clothes or in the world? These questions have been part of my current focus at Centrum Wiskunde & Informatica.  We’ve been working with a Dutch fashion designer in Amsterdam investigating how fashion and technology can be used in various situational tasks and environments through instrumenting clothing and creating structured data to understand people’s activity and flocking.  What’s exciting beyond the research is connecting goals of a fashion designer and computer science research; it’s an exciting bridge to create. Once all the fabric and sensors are accounted for, it becomes a social computing problem again…that’s where I like to live, creating bridges.

Can you profile your current research, its challenges, opportunities, and implications?

Now more than ever, we are a function of our own data.  Data drives much of computing today, be it data science or machine learning driven.  I like to emphasize how we collect and label data as it has direct consequences on what we can analyze, predict, and create.  For many, this means harvesting data for use.  For me, it means understanding how people act, behave, and communicate through those signals.  For example, at CSCW 2016 I published some work where we looked at the browsing behavior of millions of people on Flickr which we matched into a relatively small set of editorial judgements to surface high quality geo-tagged weather photos.  The alternate approach, which they did attempt at first, was to just train a neural net to find photos of storms or lightning or sunny days. While that’s recall optimistic, the editors were quick to point out everyone takes crummy photos of lightning so conventional approaches didn’t work. My research took a different approach, instead of training generic aesthetics into the system, we modeled a community-centric approach. Using the tacit aesthetic judgments from the Flickr community, we couple the structured link data with CNN to surface high quality photos.It’s not a case of active learning, in fact, it’s a supervised model where that supervision comes from implicit community actions and explicit editorial judgements.  We have some similar work to be published at CHI 2017 later this year where we were surfacing deviant/abuse images on Tumblr; a task that was even harder as the image may not be representative of such behavior, so the social-visual system was a necessity.

Taking you interest in AI and fashion into account, I am wondering what you generally think about the current hype on deep learning and in context to the fashion research. Do you think AI based systems will ever be able to understand context which is an important factor in fashion?

You know, I remember when DeepBlue beat Kasparov back in the 90s and while it was great, I didn’t think much of it as an AI victory (nor did IBM if I recall). The recent win by AlphaGo  is different and something amazing.  I don’t think it’s hype as things work and work well—however we still face many of the same limitations. With regard to fashion, it’s a great time to be excited about AI. I mean we see solutions to many of the older research and fashion issues (like point your camera at someone and find the clothes they are wearing to buy online) but I think smart electronics, AI and fashion is the new sweet spot.  There have been many advancements in textiles like pixel to stitch knitting and small electronics make for a fun new playground for AI, sensors, and IoT. We’re just now starting to explore how clothes and fashion can sense, detect, and respond to people and to the environment.  I get what you’re saying by AI hype and that’s another discussion, but right now I’m excited to build the next generation of wearable tech.

How generalizable is data from sources like Flickr? For example, are your insights on Flickr also valid in non-western countries?

I certainly have had reviewers ask me how generalizable research is because it used Flickr data or Yelp data or Twitter data or whatever; I see it as the hallmark of a bad review.  On one hand, there is no sense to believe that any slice of a specific social media dataset should be generalizable. People act differently on Flickr than they do on Instagram or on Snapchat.  The application/website dictates an interaction, and really that’s what we are studying—as a research community we need to move beyond just studying naive pixels and examine what it’s doing.  Ok, if you’re just looking for indoor vs outdoor shots in Yelp photos, then maybe.  But have you ever tried to find a restaurant in Japan versus Italy versus America? Store fronts look completely different. Internationalization is rarely studied by multimedia researchers and I think multimedia mediated cultural communication is more important than website generalization. 

I think it would be very interesting if you could also answer about what do you think is the role or responsibility of multimedia researchers in context of all the fake news/alternative new debate. Do you think we should focus on it?

In 2009, I began publishing work on doing multimedia summarization from using aggregated Twitter feeds from the Obama McCain debate. Back then, people really really wanted to tweet and it was a narrow interest community.  A few years later, during the Egyptian of 2011, I ran my methods against the Twitter firehose and saw some mis-information (like a bus on fire that was reported which was actually from another country years ago). Delayed information is a systemic problem, where something happened hours or days ago and it gets propagated as fresh information. I don’t believe we had widespread purposeful propagation of misinformation (least not like what we see in today’s world). So today, we have misplaced information, delayed information, fake/alt information and the field of multimedia is ripe to handle this problem. For example, take a fake news story with a photo.  Has the photo been altered to retell a story? Is the photo from a different news story? Are there clusters of other news sources that contradict? There’s a whole world of multimedia problems, many of which large companies are struggling to get a grip on, in finding fake news, but the hard problem will be the explanation. Identifying fake is half of the problem, explaining to people why it’s fake is the other.  News, now more than ever, is highly visual (photos/video) and social; dealing with a plurality of signals is the core of multimedia research.

In this context do you think that fake news are a problem of social network platforms or should newspapers also be investigated?

Can you name a news source that does not rely on social network platforms?  Conversely, have you seen Twitter deliver news?  Their streaming video with tweet interfaces speaks to research we did 10 years back.  I don’t think we can decouple the two, but we’ve seen how social media sites tend to amplify things by propagating clickable content.  So for a news agency, it starts with the title and snippet of a story and it’s related photo.  But then there’s also the face news agencies gaming the social sites.  There’s been some great work from UW cracking the problem, but I think it’s time for multimedia research to step up here as visual content always carries more engagement.

How would you describe the role of women especially in the field of multimedia?

Diversity of all types—gender, nationality, race—is critically important to the future of multimedia research.  When I was on the TPC for Multimedia in 2013 I did some data analytics of the past several years of the conference series; the gender stats were abysmal.  We worked hard to increase the gender diversity in the area chairs and in the conference.  To the former, following some advice from Maria Klawe I heard in a lecture maybe 10 years prior, we pushed on topic diversity for the conference.  The idea here is legacy areas can carry legacy diversity problems; so newer areas (social computing, affect, crowdsourcing, music, etc.) are more likely to have better gender leadership ratios.  It was the correct approach and we doubled the number of women in leadership roles in the ACs but still there was much room to grow.  We coupled this with finding corporate support for a womens & diversity lunch—a practice that I’m happy the conference has continued.  Diversity brings an expanded set of ideas, methods, and approaches in research.  We’ve come a ways since 2013 and I’m very happy to see the 2017 program also similarly expand its diversity but we have a very long way to go to catch up to some other SIGs.

How would you describe your top innovative achievements in terms of the problems you were trying to solve, your solutions, and the impact it has today and into the future?

Impact happens where research connects to people. For me, it’s usually revolves around creative practice in multimedia.  How online broadcasters DJing house and hip-hop connect with their audience online and how does it differ from when they are in a club?  If you have an iPad and an iPhone and want to take a picture, when do you reach for the iPad to take the photo?  If you’re posting a photo to Instagram, what filter will you use to enhance the photo?  The most valuable research include method, system, and people. Let’s take that last one as an example.  One could build a prediction model to automatically apply filters based on a training set of what got likes and the types of transformation but would that change people’s creative practice?  We found people enjoyed the process of selection (despite usually picking the same filter over and over again). So the question becomes how do we optimize the experience without hindering it.

In my time as Director of Research at Flickr, we enjoyed looking at the full stack: data, machine learning, engineering, visualization, and all the components that affect people and media experience. We knew there was an advantage to easily dive into 13 billion photos and 100 million people but felt, even inside a corporation, there should be more open data for all researchers.  This lead to the creation of the YFCC100M ( 100 million Creative Commons images in a single dataset for open research.  Beyond the data itself, we found ourselves reviewing small technical Creative Commons details to ensure legal and privacy concerns were met but still opening the data for wide academic and corporate use.  The impact has been incredible.  Outside of the multimedia and computer vision communities, in the first year since release we’ve seen published work using our dataset from the HCI, Data Science, and Visualization communities and even were featured by the Library of Congress.  All driven by the idea to share data we felt was too locked up; fortunately Flickr, Creative Commons, and Yahoo Legal shared our vision and we’ll look to see more impact to come.

Over your distinguished career, what are your top lessons you want to share with the audience?

Really nothing happens in a vacuum. Partnerships and collaborations make things interesting as they make one malleable and push one to think full stack. This is shaped by my 10 years in an industry lab, connecting with academia through hosting interns, collaborative work, and sponsorships really fueled my work.  I’d say still a good 70% of our work was internally driven but that 30% outreach was really valuable.  Now at an academic lab, I’m doing the reverse.  We partnered with a fashion designer to keep connected to their goals and their problems while we think about the wearable and social Internet of Things.  It’s great to think without constraints but really adapting to the real world and thinking end-to-end is a critical driver for me.  At the end of the day, I want to use it. Build what you love and make it real.  This was easier when I was at a corporation, but there are still plenty of ways to collaborate depending on scope. And really think full stack in system and evaluation.  You’ll find yourself evaluating your work on multiple levels from F-1 metrics to Likert scale surveys. What we do is develop new systems and methods but work with real impact will affect applications and design. My favorite research (of mine or others) always critically engages with the bigger picture.

Since you are active researcher in both US and in Europe, what do you think are the main differences? What is positive and what is negative? And what could we learn from each other?

I did a semester sabbatical at the Keio-NUS CUTE center in Singapore a few years back, so it’s not my first dive outside of industry.  I’m reminded in La Nausée Sartre wrote that anyplace you live feels the same after two weeks; the idea being once you get back to job and life, it becomes the same again. I can’t say I quite agree in this case. The move from an industry lab in California to an academic one in the Netherlands was a bit of a culture and cadence shift.  After almost a year, it’s clear to me that it’s the pace as we share research culture.  We tend to sprint constantly in industry and the sprinting seems to come and go in the academic. Each style has it’s pros and cons; there’s been times I wanted everyone to be running and times I was happy I could dive into something because we weren’t running. I don’t think it’s something to enumerate positive and negative points, just a different state of being.  I’m not sure why I gave you an existential response either.

An interview with Judith Redi

Describe your journey into computing from your youth up to the present. What foundational lessons did you learn from this journey? Why were you initially attracted to multimedia?

Dr. Judith Redi

Dr. Judith Redi

My path to multimedia was, let’s say, non-linear. I grew up in the Italian educational system, which up until university, is somewhat biased towards social sciences and humanities. My family was not one of engineers/scientists either, and never really encouraged me to look at the technical side of things. Basically, I was on a science-free educational diet until university. On the other hand, my hometown used to host the headquarters of Olivetti (may remember fancy typewriters and early personal computers?). This meant that at a very young age I had a PC at home and at school, and could use it (as a “user” on the other side of the systems we develop; I had no clue about programming).

When the time came to choose a major at university, I decided to turn the tables, a bit as a provocative action towards my previous education/mind-set, and a bit because I was fascinated by the perspective of being able to design and build future technologies. So, I picked computer engineering, perhaps inspired by my hometown technological legacy. I immediately got fascinated by artificial intelligence, and its potential to make machines more human-like (I still tell all my bachelor students that they should have a picture of Turing on their desk or above their bed). I specialized in machine learning and applied it to cryptanalysis within my master thesis. I won a scholarship to continue that research line in a PhD project at the University of Genoa. And then Philips came along, and multimedia with it.

At the time (2007), Philips was still manufacturing displays, and to stay ahead of the competition, they had to make sure their products would deliver to users the highest possible visual quality. They had algorithms to enhance image quality, but needed a system able to understand how much enhancement was needed, and of which type (sharpening? De-noising?), based on the analysis on the incoming video signal. They wanted to try a machine-learning approach to this issue, and referred to my group for collaboration. I picked up the project immediately: the goal was to model human vision (or at least the processes underlying visual quality perception), which implied not only developing new intelligent systems at the intersection between Signal Processing and Machine Learning, but also to learn more about the users of these systems, their perception and cognition. It was the fact that it would allow me to adopt a user-centred approach, closing the loop back to my social science-oriented education, that made multimedia so attractive to me. So, I left cyber-security, embraced Multimedia, and never left since.

One Philips internship, a best PhD thesis award and a Postdoc later, I am still fascinated by this duality. Much has changed in multimedia delivery, with the shift from linear TV to on-demand content consumption, video streaming accounting for 70% of the internet traffic nowadays, and the advent of Ultra High Definition solutions. User expectations in terms of Quality of Experience (QoE) increase by the day, and they are not only affected by the amount of disruptions (due to encoding, untrustworthy transmissions, rendering inaccuracies) in the delivered video, but also relate to content semantics and popularity, user affective state, environment and social context. The role of these factors on QoE is yet to be understood, let alone modelled. This is what I am working on at TU Delft, and is a long term plan, so I guess I won’t be leaving multimedia any time soon.

I’d say it’s too early for me to draw “foundational lessons” worth sharing from my journey. I guess there are a few things, though, that I figured out along the years, and that may be worthwhile mentioning:

  1. Seemingly reckless choices may be the best decisions you have ever made. Change is scary, but can pay off big time.

  2. Luck exists but hard work is a much safer bet

  3. Keep having fun doing your research. If you’re not having fun anymore, see point (1).

Tell us more about your vision and objectives behind your current roles? What do you hope to accomplish and how will you bring this about?

As a researcher, I have been devoting most of my efforts to understanding multimedia experiences and steer their optimization (or improvement) towards a higher user satisfaction (with the delivery system). On the longer term, I want broaden this scope, to make an even bigger impact on people’s life: I want to go beyond quality of experience and multimedia enjoyment, and target the optimization (or at least improvement) of users’ well-being.

For the past four years, I have been working with Philips Research on an Ambient Assisted Living system able to (1) sense the mood of a user in a room and (2) adapt the lighting in the room to alleviate negative moods (e.g., sadness, or anxiety), when sensed. We were able to show that the system can successfully counter negative moods in elderly users (see our recent PLoS One publication if you are interested), without the need of human intervention. The thing is, negative affective states are experienced by elderly (but by younger people too, according to recent findings) quite often, and most times, a fellow human (relative, friend, caretaker) is not available to comfort the person. My vision is to build systems that, based on the unobtrusive sensing of users’ affective states, can act upon the detection of negative states and relieve the user just as a human would do.

I want to design “empathic technology”, able to provide empathic care, whenever human care is not within reach. Challenges are multiple here. First, (long-term) affective states (such as mood, which is more constant and subtle than emotion) are to be sensed. (Wearable) sensors, cameras, or also interaction with mobile devices and social media can provide relevant information here. Empathic care can then be conveyed through ambient intelligence solutions, but also by creative industries products, ranging from gaming to intelligent clothing, to, of course, Multimedia technology (think about empathic recommender systems, or videotelephony systems that are optimized to maximize the affective charge of the communication). This type of work is highly multidisciplinary (involving multimedia systems, affective computing, embedded systems and sensors, HCI and certainly psychology), and the low-hanging fruits are not many. But I’d like this to be my contribution to make the world a better place, and I am ready to take up the challenge.

Can you profile your current research, its challenges, opportunities, and implications?

Internet-based video fruition has been reality for a while, yet it is constantly growing. Cisco’s forecasts see video delivery to account for 79% of the overall internet consumer traffic by 2018 (this is equivalent to one million minutes of video crossing IP networks every second). As the media fruition grows, so do user expectations in terms of Quality of Experinece (see the recent Conviva reports!). And, future multimedia will have to be optimized for multiple, more immersive (plenoptic, HDRi, ultra-high definition) devices, both fixed and mobile. Moore’s law and broadband speed alone won’t do the job. Resources and delivery mechanisms have to be optimized on a more application- and user-specific basis. To do so, it will be essential to be able to measure (unobtrusively) the extent to which the user deems the video experience to be of a high quality.

In this context, my work aims to (1) understand the perceptual, cognitive and affective processes underlying user appreciation for multimedia experiences, and (2) model these processes in order to automatically assess the delivered QoE, and, when applicable, enhance it. It is important here to bear in mind that multimedia quality of experience cannot be considered to depend solely on the presence (absence) of visual/auditory impairments introduced by technology limitations (e.g., packet loss errors or blocking artifacts from compression). Although that’s been the most common approach to QoE assessment and optimization, it is not sufficient anymore. The appearance of social media and internet-based delivery has challenged the way media are consumed: we don’t deal with passive observers anymore, but with users that select specific videos, to be delivered on specific devices, in any type of context. Elements such as semantics, user personality, preferences and intent, and socio- cultural context of fruition come into play, that have never been investigated (let alone modelled) for delivery optimization. My research focuses on integrating these elements in QoE estimation, to enable effective, personalized optimization.

The challenges are countless: user and context characteristics have to be quantified and modelled, to be then integrated with the video content analysis to deliver a final quality assessment, representing the experience as it would be perceived by that user, in that context, given that specific video. Before that, which user and context factors impact QoE is to be determined (to date, there is not even agreement on a taxonomy of these factors). Adaptive streaming protocols make it possible to implement user- and context- aware delivery strategies, the willingness of users to share personal data publicly can lead to more accurate user models, and especially crowdsourcing and crowdsensing can support the systematic study of the influence that context and user factors have on the overall QoE.

How would you describe the role of women especially in the field of multimedia?

Just like for their male colleagues (would you ask them to describe the role of men in multimedia?), the role of women in multimedia is:

  1. to push the boundaries of science, knowledge and practice in the field, doing amazing research that will make the world a better place
  2. to train new generations of brilliant engineers and scientists that will keep doing amazing research to make the world an even better place and
  3. serve the community as professionals and leaders to steer the future amazing research that will go on making the wold better and better.

I’d say the first two points are covered. The third, instead, may be implemented a bit better in practice, as there is a general lack of representativeness of women at a leadership level. The reasons for this are countless. They go from the lack of incoming talent (traditionally girls are not attracted to STEM subjects, perhaps for socio-cultural reasons), to the so-called leaking pipeline, which sees talented women leaving demanding yet rewarding careers too early, to an underlying presence of the impostor syndrome, that sometimes prevents women from putting their name forward for given roles. The solution is not necessarily in quotas (although I understand the reasoning behind the need for quotas, I think they are actually making women’s life more difficult – there is an underlying feeling that “women have it all easy these days” that makes work relationships more suspicious and ends up making women have to work three times as hard to show that they actually deserve what they accomplished), but rather in coaching and dedicated sponsorship of talent since the early stages.

How would you describe your top innovative achievements in terms of the problems you were trying to solve, your solutions, and the impact it has today and into the future?

The methods that I developed for subjective image quality assessment have been adopted within Philips research and their evolution to video quality assessment is now under evaluation of the Video Quality Experts Group to be advised as an alternative methodology to the standard ACR and paired comparison. The research that I carried out on the suitability of crowdsourcing for subjective QoE testing and adaptation of traditional lab-based experimental designs to crowdtesting is now included in the Qualinet white paper on Best practices for crowdsourced QoE, and has helped in better understanding the potential of this tool for QoE research (and the risks involved in its use). This research is also currently feeding new ITU-T recommendations on the subject. The models that I developed for objective QoE estimation have been published in top journals and pose the basis for a more encompassing and personalized QoE optimization.

Over your distinguished career, what are your top lessons you want to share with the audience?

Again, I am not sure whether I am yet in the position of giving advice and/or sharing lessons, but here are a couple of things:

  1. Be patient and long-sighted. Going for research that pays off on the short term is very appealing, especially when you are challenged with job insecurity (been there, done that). But it is not a sustainable strategy, you can’t make the world a better place with your research if you don’t have a long term vision, where all the pieces fit together towards a final goal. And on the long term, it’s not fun either.

  2. Be generous. Science is supposed to move forward as a collaborative effort. That’s why we talk about a “scientific community”. Be generous in sharing your knowledge and work (open access, datasets, code). Be generous in providing feedback, to your peers (be constructive in your reviews!) and to students. Be generous in helping out fellow scientists and early stage researchers. True, it is horribly time consuming. But it is rewarding, and makes our community tighter and stronger.

For girls, watch Sheryl Sandberg’s TED talk, do participate to the Grace Hopper Celebration of Women in Computing, don’t be afraid to come to the ACMMM women’s lunches, they are a lot of fun. Actually, these are good tips for boys too.

For the rest just watch The last lecture of Randy Pausch because he said it all already and much better than I could ever do.

If you were conducting this interview, what questions would you ask, and then what would be your answers?

Q: Why should one attend the ACMMM women’s lunch?

A: If you are a female junior member of the community, do attend because it will give you the opportunity to chat with senior women who have been around for a while, and can tell you all about how they got where they are (most precious advice, trust me). If you are a female senior member of the community, do attend because you could meet some young, talented researcher that needs some good tips from you, and you should not keep all your valuable advice for yourself :). If you are a male member of the community, you should attend because we really need to initiate some constructive dialogue on how to deal with the problem of low female representation in the community (because it is a problem, see next question). Being this a community problem (and not a problem of females only), we need all members of the community to discuss it.

Q: Why do we need more women in Multimedia?

A: Read this or this, or just check the Wikipedia page on women in STEM.

An interview with Klara Nahrstedt

Michael Riegler (MR): Describe your journey into computing from your youth up to the present. What foundational lessons did you learn from this journey? Why were you initially attracted to multimedia?

Prof. Klara Nahrstedt

Prof. Klara Nahrstedt

Klara Nahrstedt (KN): From my youth I have been attracted and interested in mathematics, physics and other sciences. However, since most of my family were electrical and computer engineers, I was surrounded by engineering gadgets and devices, and one of them was a very early computer, able to answer various quiz questions about the world. I liked this new device with its many potentials. Therefore, my interests and my family’s influence guided me towards an educational journey between science and engineering. I did my undergraduate studies in Mathematics and my Diploma work in Numerical Analysis, at the Humboldt University zu Berlin in East Germany. And after the Berlin Wall came down in 1989, my educational journey led me to the Computer and Information Science Department at the University of Pennsylvania in Philadelphia where I did my PhD degree and studied multimedia systems and networking.

My interest in multimedia came during my time at the Institute for Informatik, where I worked as a research programmer. This was the time after my Diploma Degree and after my System Administrator job at the Computer Center of the Ministry of Agriculture in East Berlin. This was the time when Europe, in contrast to USA, invested heavily in the new ISO-defined X.25-based digital networking technology, and with it in the new X.400 email system and its applications. One of the very interesting discussions at the time was to transport via email not only text messages, but also digital audio and images as messages. I wanted to be part of the discussion, since I believed that a picture (image) is worth 1000 words and auditory interfaces would be easier for users to enter messages than text messages. I wanted to help develop solutions that would enable transport of these multi-modal media, and my long journey into multimedia systems and networks started. After I joined University of Pennsylvania, as part of my PhD work, I was exposed to the research in the GRASP laboratory where researchers studied computer vision algorithms and cameras, mounted on robots. As a researcher interested in networking and multimedia, it was very natural for me to explore the integrated multimedia networking problems for tele-robotic applications and enable video and control information to be transported from remote robots to operators and to visualize what the remote robot was doing. Since my PhD the journey into deep understanding of multimedia systems and networks continues as new knowledge, technologies, applications, and users emerge.

The foundational lessons that I learned from this journey are: (1) acquire very strong fundamental knowledge in science and humanities very early independent what future opportunities, jobs, interests, and circumstances guide you towards; (2) work hard and believe in yourself; and (3) keep continuously learning.

MR: Tell us more about your vision and objectives behind your current roles? What do you hope to accomplish and how will you bring this about?

KN: During my professional life, I had three different roles: researcher, educator and provider of professional services in different functions.

  • As a researcher, my vision and objective are to provide theoretical and practical cyber-solutions that enable people to communicate seamlessly and trustworthy with each other and with their physical environments.
  • As an educator, my vision and objective were and are to educate as best I can the next generation of undergraduate and graduate students who are very well prepared to tackle the numerous new challenges in the fast changing human-cyber-physical environments.
  • In the space of professional services, I served in various roles as the member of numerous program committees, and organizing member and/or chair, co-chair, editor of IEEE and ACM professional venues, as the chair of ACM Special Interest Group on Multimedia (SIGMM), and as the member of various departmental and college committees, and now as the Director of the Interdisciplinary Research Unit, the Coordinated Science Laboratory (CSL) in the College of Engineering at the University of Illinois at Urbana-Champaign. In each of the administrative and service roles, my vision and objective are to provide high quality service to the community if it is a high quality technical program at a conference or journal, fair and balanced allocation of resources that would advance the mission of SIGMM, or a broad support of interdisciplinary work in CSL.

I hope to achieve the vision and objectives of my research, educational and professional service activities via hard work, continuous learning, willingness to listen to others, and a very strong collaboration with others, especially my students, colleagues and staff members that I interact with.

MR: Can you profile your current research, its challenges, opportunities, and implications?

KN: My current research moves in three different directions which have some commonalities, but also differences. The major commonality of my research is in aiming to solve the underlying joint performance and trust issues in resource management of multi-modal systems and networking that we find in the current human-cyber-physical systems. The three different directions of my research are: (a) 3D teleimmersive systems for tele-health, (b) trustworthy cyber-physical systems such as power-grid, oil and gas, and (c) trustworthy and timely cloud-based cyber-infrastructures for scientific instruments such as distributed microscopes.

In all of these challenges and directions, the challenges are in providing real-time acquisition, distribution, analysis and retrieval of multi-modal data in conjunction with providing security, reliability and safety.

The opportunities in the areas of human-cyber-physical systems in health, and critical infrastructures are enormous as people are aging, physical infrastructures are being fully stressed, and multimedia devices are challenging every societal cyber-infrastructure by generating Big Data in terms of their volume, velocity and variety.

We are living in truly exciting times as the digital systems are getting more and more complex. The implications are that we have a lot of work to do and solve many challenges as a multimedia system and networking community in collaboration with many other communities. It is very clear that a single computing community is not able to solve the many problems that are coming upon us in the space of multi-modal human-cyber-physical systems. Inter and cross-disciplinary research is the call of the day.

MR: How would you describe the role of women especially in the field of multimedia?

KN: “Difficult” comes to my mind. The number of women in multimedia computing is small and in multimedia systems and networks even smaller. I wish that the role and visibility of women in multimedia technology field would be greater when it comes to IEEE and ACM awards, conference leadership roles, editorial boards memberships, participations in SIGMM technical challenges, and other visible events and roles. Multimedia technology became such a ubiquitous base for numerous application fields including education, training, entertainment, health care, social work which have very strong representations of women in general. Hence, I believe that women in multimedia should play even more of a crucial role in the future than today, especially in innovation, leadership, and interconnection of multimedia computing technologies with the above mentioned application fields.

MR: How would you describe your top innovative achievements in terms of the problems you were trying to solve, your solutions, and the impact it has today and into the future?

KN: My top innovative achievements range from bringing a much better understanding into the field of Quality of Service (QoS) Management and Quality of Service Routing for multimedia systems and networks, to developing novel real-time and trusted resource management architectures and protocols for complex multi-modal applications, systems and networks such as the 3D teleimmersion, energy-efficient mobile multimedia, and trustworthy smart grid, to name few. My QoS research impact can be seen in current wide area wired and wireless networks and systems. The impact of the research management algorithms, architectures and systems that I and my research group have developed can be seen throughout the Microsoft, Google, HP, and IBM solutions where my graduate and undergraduate students took on an employment and brought with them research results and knowledge that then made their ways into multimedia applications, systems and network products.

MR: Over your distinguished career, what are your top lessons you want to share with the audience?

KN: The top lessons that I would like to share are: be patient, honest, open-minded, and fair; don’t give up; be humble but don’t be shy to “toot your own horn” when appropriate; listen what others have to say; and be respectful to others since everybody has something to contribute to the community and society in his/her own way.


An interview with Wallapak Tavanapong

MR: Describe your journey into computing from your youth up to the present.

Wallapak Tavanapong

Wallapak Tavanapong

Pak: I started learning about computing quite late. I did not know what a computer was until I joined a B.S. degree program in Computer Science at Thammasat University, Thailand, and learned the foundation there. After finishing the degree, I joined the M.S. program in Computer Science at the University of Central Florida (UCF), Orlando, Florida, USA. UCF was a great learning place for me. I had a wonderful advisor, Prof. Kien A. Hua, good classes, and great friends. My research at the time was video-on-demand, which was a hot topic then. After my Ph.D., I joined the Department of Computer Science at Iowa State University in 1999 as an Assistant Professor and was promoted to a Full Professor recently.

Iowa State University is a great place for my career. In the beginning, I continued on with the research in video-on-demand and multimedia caching. In 2003, my colleagues, Profs. JungHwan Oh, Piet C. de Groen, Johnny Wong, and I began investigating automated content analysis of endoscopic video for improving quality of the procedure. At the time, few works exist and mostly were on automated detection of polyp appearance in images. Our approach is to automatically analyze an entire procedure, calculate detailed objective metrics that reflect quality of inspection for the entire procedure, and provide real-time feedback to assist the endoscopist to improve the quality. We co-founded EndoMetric Corporation to transfer the technology into practice. I am glad that this research area receives much more attention now both in academia and industry. I am glad that our work has some influence on later work. In 2013, I began new interdisciplinary research and education initiatives in political informatics and computation communication and advertising.

MR: What foundational lessons did you learn from this journey?


First, never give up when facing difficulty. Second, there are several paths toward good research. I am more attracted to research problems in a different discipline. I like to create a new computing research problem out of vague problem descriptions in other disciplines. I love interdisciplinary research.

MR: Why were you initially attracted to multimedia?


My initial interest was in database research. As data began to come in different media types, extension to multimedia was natural.

MR: Tell us more about your vision and objectives behind your current roles? What do you hope to accomplish and how will you bring this about?


First, I’d like to see my research helps to prevent or reduce suffering from cancer for many. To achieve this goal, I need to do more to push my technology into practice. Second, I’d like to see computational thinking integrated into science and math curriculum in elementary schools in the US and other countries soon. Over the past five years, I have been engaging in our departmental K-12 outreach activities, coaching K-12 kids and interested K-12 teachers in computational thinking. I’d like to see more women in computer science and computing fields. In our K-12 outreach program, we found that young girls started losing their interest in science as early as the fifth grade. So, I hope to get them interested in computing early in the third grade. Last, I’d like to see that my interdisciplinary work with political scientists and communication scholars leads to a national social multimedia repository that is useful for social scientists and the public to learn about decision making in public policies that affect many lives.

MR: Can you profile your current research, its challenges, opportunities, and implications?

Pak: My top two projects are

  • Reconstruction of a virtual colon from 2D colonoscopic images:

    The human colon is a complex tubular structure with multiple twists and turns. A good colon exam increases early detection of colorectal cancer. I’d like to provide a 3D colon inspection map during the procedure for the endoscopist to know which areas inside the colon that they might have missed. There are many challenges. The most critical one is that commonly used endoscopes are not equipped with 3D camera positioning technology. I am working to add low-cost hardware equipment that provides some position information. I will utilize the position together with content analysis of endoscopic images to reconstruct the virtual colon. The work has a potential to increase the polyp detection rate during colonoscopy, preventing deaths and reducing pain and suffering.

  • Multimedia information system for political science and communication:

    This system would help answering research questions in political science and communication that could not have been answered before because of the sheer volume, variety, and velocity of data. Specifically, my team is working on understanding how states learn about policies from one another, how news reporters carry information from state legislatures to the public, how a public policy is influenced, etc. This is an application domain that lends itself to multimedia research, ranging from the underlying data management technology, automated content analysis of multiple media types and sources: web and video online ads, TV ads, state bills and laws, and tweets by political figures, to visualization of the resulting knowledge from the analysis.

MR: How would you describe the role of women especially in the field of multimedia?

Pak: I think the role of women in multimedia is same as men. But our number is much lower. We need to increase the number of women in the field. I believe that we need to get young girls interested in computing as early as elementary school.

MR: How would you describe your top innovative achievements in terms of the problems you were trying to solve, your solutions, and the impact it has today and into the future?

Pak: I would say that my top achievement so far is in the idea and the realization of real-time computer-aided analysis and feedback to improve quality of colonoscopy. We were the first to investigate this problem. There are several challenges, for instance, defining what to analyze that reflect quality as seen by the domain experts, coming up with effective algorithms to compute the quality measurements, showing that the automated measurement indeed improves quality, making the automated analysis real-time, effective, and low cost to be used in practice, deploying the technology for daily use in hospitals and clinics.

My technology has already saved a couple of lives and I would like it to do more in the future. I have seen more researchers in academia and industry get into this research area, which is great. We need more researchers and developers in multimedia and healthcare to help medical professions improve quality of care via automation.

MR: Over your distinguished career, what are your top lessons you want to share with the audience?

Pak: Never give up. Find good mentors who care about you, believe in you, and give you different perspectives. A peer mentor is great. I learn a lot from my colleagues. Find a research problem you are passionate about. Last, when realizing that there is a problem, do not complain, look for a good solution, and fix it.


An Interview with Cynthia Liem: The PHENICX Project


The PHENICX project is supported by the European Commission, FP7 (Seventh Framework Programme, STREP project, ICT-2011.8.2 ICT for access to cultural resources, grant agreement No 601166). The project is running for a year now and Cynthia Liem is involved since the initial planning and proposal writing. Currently, she is a work package leader in the project, and part of the overall project coordination team in the role of dissemination coordinator.

Partners in the project are Universitat Pompeu Fabra, Barcelona, ES; Delft University of Technology, NL; Johannes Kepler University Linz, AT; Austrian Research Institute for Artificial Intelligence, Vienna, AT; Video Dock BV, Amsterdam, NL; Royal Concertgebouw Orchestra, Amsterdam, NL; and Escola Superior de Música de Catalunya, Barcelona, ES. More information on the project can be found at

Q: What is the goal and scope of the PHENICX project?

PHENICX is about music and concert experiences. We want to use multimedia technologies to enhance the experience of a concert and make it more interesting and accessible for broad audiences. In this, we mainly focus on classical music.

Basically, the project has two sides. First of all, there is a content analysis side, in which we analyze concert performance data in a broad sense. We do not only look at an audio stream, but also e.g. at videos, gesture information, and social commenting information from people who attended concerts. Besides multiple modalities, we also try to take into account multiple perspectives: think of multiple cameras and microphones registering an orchestra, but also of multiple types of people (a conductor, orchestra musicians, or just your personal friends) speaking about a concert. Finally, a concert really is a multilayered phenomenon, with lots of things going on at the same time in which one could be potentially interested. The particular notes being played from a score are part of a larger structural whole; and while 130 individuals may be playing at the same time in a symphony orchestra, they form sub-groups which all have a particular role in the musical narrative and instrumental mix.

On the other side, it’s about the experience, about getting and keeping users from different consumer groups engaged. This is not just targeted at live attendance scenarios in the concert hall, but also for scenarios in which people attend concerts off-site through a live stream, or want to relive a concert on-demand after its performance. While for the content analysis part, we mostly focus on signal-oriented research topics, for this experience part we strongly look into topics such as recommendation, visualization and interaction. For example, how can you make the whole multilayered aspect of music more tangible? This can for example be done with automated score-following, through more simplified visualizations, but also by contrasting a particular performance against other existing performances of the same piece.

Our mission to broaden audiences for the classical music genre can be seen as a way of cultural heritage preservation using ICT. In the end, we really hope to see digital technology affecting culture consumption in a positive way. [As a concrete example, our partners Video Dock and the Royal Concertgebouw Orchestra already are working on a commercial tablet app called RCO Editions. The technologies we work on in PHENICX can really help in making the production of the app more scalable, expanding its feature set, and optimizing its user experience.

Q: Are there special organizational challenges?

In the project there are seven partners, four of them being academic partners. The three non-academic partners are major players in different parts of the music stakeholder spectrum, but have less experience with academic projects – especially the Royal Concertgebouw Orchestra, which really is involved for the first time in a large academic technology project. So in communicating and working with each other, there is always some translation needed between partners with different background and project experience levels. This is a very interesting organizational challenge in which we always try to find an optimal balance between different stakeholders.

Another potential challenge is language. Especially in the first year, we have been running a lot of focus groups to validate use cases. But while we have grown completely accustomed to using English in our daily academic work, as soon as you wish to interact with realistic local potential users of your technology in all project partner countries, you can’t take for granted these users have full expressive command of English (the younger generation typically does, but you don’t want to only reach them). And music is a very attractive topic for general public dissemination, since it’s a concrete part in many people’s lives; but once again, to make full use of this opportunity, you may have to look beyond English. So we’re having some dedicated organizational activities on that, working to also hold some studies and get some publicity material available in local languages.

Q: What is your personal relation to the project?

Well, I wrote a significant part of the proposal, so in that sense have a considerable relation to the project … but, at least as importantly, my musician background creates a strong personal link to this project. Having degrees in computer science and classical piano performance, I’m really interested in the interface between these two: working with music and digital data, using data technologies to improve on what you can learn and do with music – and PHENICX definitely is about this. So I’m very actively trying to use this double background for the project. It is especially useful for communication and dissemination: I can talk to people at the more musical side, many of which do not have extensive technical backgrounds, but also to those at the more technical side, who do not always have an extensive music background.

Funnily enough, the project also affected views I had from my own musicianship. The Royal Concertgebouw Orchestra is one of the most famous orchestras in the world. If you’re a music student in Holland, you can be backstage and engage with people from many national orchestras, but only the lucky few will manage to get even in the neighborhood of this particular orchestra. Now I’m having this connecting role in the project between academics and music stakeholders, and the orchestra became a project partner, I suddenly find myself being in their office quite often. I would never have expected that!

Besides that, with our work on user requirements and focus groups, I really managed to be in contact with actual audience. In our focus groups, we asked people why they liked going to concert performances, and we frequently heard people responding they valued feeling isolated from external influences in the concert hall, to have themselves being swept away by the music. Probably because a concert hall is a bit of a working space for me, I had totally forgotten this escapism aspect of concert attendance. So here, the project really made me aware of my own professional biases and ‘put me back on the ground’.

Q: Would you ever write an EU project proposal again?

Well, yes, I would, definitely with a consortium and project as inspiring as PHENICX. But I hope that next time I’ll have a bit more time than the three weeks in which we raced to completing the PHENICX proposal. 😉

Curriculum Vitae:

Cynthia Liem obtained her BSc and MSc degrees in Media and Knowledge Engineering (Computer Science) with honors at Delft University of Technology, The Netherlands, and currently is a PhD student at the Multimedia Information Retrieval Lab of the same university, working under the supervision of Prof. Alan Hanjalic. Besides, she holds Bachelor and Master of Music degrees in classical piano performance from the Royal Conservatoire in The Hague. Her research interests are strongly motivated by her background in both engineering and music and concentrate around multimedia content analysis for the music information retrieval domain.

From this background, she has been very active in getting music on the multimedia research agenda, particularly at the ACM Multimedia Conference, where she first initiated and served as the main organizer of the ACM MIRUM workshop (2011, 2012). This led to her becoming a co-chair of a dedicated ‘Music & Audio’ area at ACM MM 2013, and currently the more broadened ‘Music, Speech, and Audio Processing in Multimedia’ area for ACM MM 2014. She also was a main initiator of the EU FP7 PHENICX project (2013 – 2016), in which she now serves as work package leader and dissemination coordinator.

She is the recipient of several international scholarships and awards, including the Lucent Global Science Scholarship in 2005, the Google Anita Borg Scholarship in 2008, the Google European Doctoral Fellowship in Multimedia in 2010 (which partially supports her PhD research work), and the UfD Best PhD Candidate Award at Delft University of Technology in 2012. Besides her ongoing academic and musical activities, Cynthia has interned at Bell Labs Europe Netherlands, Philips Research, Google UK and Google Research, Mountain View, USA.

The interviewer, Mathias Lux, is a Associate Professor at the Institute for Information Technology (ITEC) at Klagenfurt University, where he has been since 2006. He received his M.S. in Mathematics in 2004 and his Ph.D. in Telematics in 2006 from Graz University of Technology. Before joining Klagenfurt University, he worked in industry on web-based applications, as a junior researcher at a research center for knowledge-based applications, and as research and teaching assistant at the Knowledge Management Institute (KMI) of Graz University of Technology. In research, he is working on user intentions in multimedia retrieval and production, visual information retrieval, and serious games. In his scientific career he has (co-) authored more than 60 scientific publications, has served in multiple program committees and as reviewer of international conferences, journals, and magazines, and has organized several scientific events. He is also well known for managing the development of the award-winning and popular open source tools Caliph & Emir and LIRE for visual information retrieval.

An interview with Aljosa Smolic


Dr. Aljosa Smolic joined Disney Research Zürich, Switzerland in 2009, as Senior Research Scientist and Head of the “Advanced Video Technology” group. Before he was Scientific Project Manager at the Fraunhofer Institute for Telecommunications, Heinrich-Hertz-Institut (HHI), Berlin, also heading a research group. He has been involved in several national and international research projects, where he conducted research in various fields of video processing, video coding, computer vision and computer graphics and published more than 100 refereed papers in these fields. In current projects he is responsible for research in 2D video, 3D video and free viewpoint video processing and coding.

Q: What is your main area of research?

A: I’m working on video processing in a general sense and visual computing. I’m interested in everything related to pixel processing like camera systems, processing visual information, perception and computational systems that are creating high quality output for the user.

Q: What got you interested in this area in the first place?

A: In my studies in electrical engineering I was focusing on audio processing, in a sense that if I wouldn’t become a rock star, I still could be an audio engineer. Then I got the opportunity to work at Fraunhofer HHI on Image Processing, where I turned my signal processing interests from audio to image processing, and that’s how I ended up here.

Q: Does your research & work influence your private life a lot, like owning a stereoscopic TV, taking a lot of videos and photos, etc.?

A: Yes, in a sense that I’m very critical on any type of visual information. I’m also very picky watching television and I notice all the small imperfections. I have an expert view on cinema, any type of multimedia presentation and audio. On the other hand I don’t create too much content myself. I don’t have a special camera and I don’t do too much of filming. And I don’t have too much of fancy 3D equipment for myself at home.

Q: Speaking of 3D equipment at home … Obviously 3D TV home equipment didn’t start off too well. Do you think 3D TV will rise again in say 10-15 years, or will we skip towards the “holodeck”?

A: The holodeck … I formulated that as my long term research question, so I’m still working on it and it’s still a long way. We are not yet there and stereo or 3D TV at home didn’t reach the broad adoption that many people thought of two or three years ago. I believe TV is a more difficult thing than for instance home cinema on Blue-Ray. I think business & technology based on 3D Blue-Ray disk work well. You can buy content, which is very well produced to be consumed in a situation very similar to watching a movie in a cinema. But I think it’s more difficult to adopt stereoscopic technology for the classic TV watching experience, which should be more social. The quality of the content should be better, and the need to wear glasses is not that accepted for watching TV.

Q: What are possible technological advances between now and the holodeck? Does something like Illumiroom (a project from Microsoft Research, that projects peripheral content around a screen) or higher resolutions like 4K will have an impact?

A: Things like Illumiroom and Philips Ambilight are all a step towards the holodeck as much as stereoscopic TV was. I believe there are a lot of more steps in different directions necessary in order to get a 3D immersive experience. Regarding higher resolutions, I’m not so enthusiastic about 4K. As from what I saw so far the difference between HD and 4k is very subtle. Only under very specific conditions and very specific distances you are able to perceive any difference. So I don’t think it matters that much and I don’t see that 4K will have that much of an impact over HD.

I rather look forward to HDR. I’ve seen a few demos which offered an impressive level of experience.

Those displays are starting to become available in professional and consumer markets.

Q: If you would re-start your PhD right now, would you end up in the same field or do you think there is another research direction that is more interesting to you right now?

A: I don’t know … I could always do theoretical physics and go to CERN to try to create black holes, which is always an option. The other option would be to work more on the rock star career. Well, but I’m pretty happy where I ended up right now.

Curriculum Vitae:

Dr. Aljoša Smolić joined Disney Research Zurich, Switzerland in 2009, as Senior Research Scientist and Head of the “Advanced Video Technology” group. Before he was Scientific Project Manager at the Fraunhofer Institute for Telecommunications, Heinrich-Hertz-Institut (HHI), Berlin, also heading a research group. He has been involved in several national and international research projects, where he conducted research in various fields of video processing, video coding, computer vision and computer graphics and published more than 100 referred papers in these fields. In current projects he is responsible for research in 2D video, 3D video and free viewpoint video processing and coding. He received the Dipl.-Ing. Degree in Electrical Engineering from the Technical University of Berlin, Germany in 1996, and the Dr.-Ing. Degree in Electrical Engineering and Information Technology from Aachen University of Technology (RWTH), Germany, in 2001. Dr. Smolic received the “Rudolf-Urtlel-Award” of the German Society for Technology in TV and Cinema (FKTG) for his dissertation in 2002. He is Area Editor for Signal Processing: Image Communication and served as Guest Editor for the Proceedings of the IEEE, IEEE Transactions on CSVT, IEEE Signal Processing Magazine, and other scientific journals. He chaired the MPEG ad hoc group on 3DAV pioneering standards for 3D video. In this context he also served as one of the Editors of the Multi-view Video Coding (MVC) standard. Since many years he is teaching full lecture courses on Multimedia Communications and other topics, now at ETH Zurich.

Dr. Mathias Lux is a Senior Assistant Professor at the Institute for Information Technology (ITEC) at Klagenfurt University, where he has been since 2006. He received his M.S. in Mathematics in 2004 and his Ph.D. in Telematics in 2006 from Graz University of Technology. Before joining Klagenfurt University, he worked in industry on web-based applications, as a junior researcher at a research center for knowledge-based applications, and as research and teaching assistant at the Knowledge Management Institute (KMI) of Graz University of Technology. In research, he is working on user intentions in multimedia retrieval and production, visual information retrieval, and serious games. In his scientific career he has (co-) authored more than 60 scientific publications, has served in multiple program committees and as reviewer of international conferences, journals, and magazines, and has organized several scientific events. He is also well known for managing the development of the award-winning and popular open source tools Caliph & Emir and LIRE for visual information retrieval.