Impact of the New @sigmm Records

The SIGMM Records have renewed, with the ambition of continue being a useful resource for the multimedia community. The intention is to provide a forum for (open) discussion and to become a primary source of information (and of inspiration!).

The new team (http://records.mlab.no/impressum/) has committed to lead the Records in the coming years, gathering relevant contributions in the following main clusters:

The team has also revitalized the presence of SIGMM on Social Media. SIGMM accounts on Facebook and Twitter have been created for disseminating relevant news, events and contributions for the SIGMM community. Moreover, a new award has been approved: the Best Social Media Reporters from each SIGMM conference will get a free registration to one of the SIGMM conferences within a period of one year. The award criteria are specified at http://records.mlab.no/2017/05/20/awarding-the-best-social-media-reporters/

The following paragraphs detail the impact of all these new activities in terms of increased number of visitors and visits to the Records website (Figure 1), and broaden reach. All the statistics presented below started to be collected since the publication of the June issue (July 29th 2017).

Figure 1, Number of visitors and visits since the June issue

Figure 1. Number of visitors and visits since the publication of the June issue

Visitors and Visits to the Records website

The daily number of visitors ranges approximately between 100 and 400. It has been noticed that this variation is strongly influenced by the publication of Social Media posts promoting contents published on the website. In the first month (since July 29th, one day after the publication of the issue), more than 13000 visitors were registered, and more than 17000 visitors have been registered until now (see Table 1 for detailed statistics). The number of visits to the different posts and pages of the website accumulates up to 90000. The top 5 countries with highest number of visitors are also listed in Table 2. Likewise, the top 3 posts with highest impact, in terms of number of visits and of Social Media shares (via the Social Media icons recently added in the posts and pages of the website) are listed in Table 3. As an example, the daily number of visits to the main page of the June issue is provided in Figure 2, with a total number of 199 visits since its publication.

Finally, the top 3 referring sites (i.e., external websites from which visitors have clicked an URL to access the Records website) are Facebook (>600 references), Google (>200 references) and Twitter (>100 references). So, it seems that Social Media is helping to increase the impact of the Records. More than 30 users have accessed the Records website through the SIGMM website (sigmm.org) as well.

Table 1. Number of visitors and visits to the SIGMM Records website

Period Visitors
Day ~100-400
Week ~2000-3000
Month ~8000-13000
Total (Since July 29th)

17491   (90027 visits)

Table 2. Top 5 countries in terms of number of visitors

Rank Country Visitors
1 China 3144
2 United States 1899
3 India 1297
4 Germany 750
5 Brazil 687

Table 3. Top 3 posts on the Records website with highest impact

Post Date Visits Shares

Interview to Prof. Ramesh Jain

29/08/2017 497 103
Interview to Suranga Nanayakkara 13/09/2017 337 15
Introduction to the Opinion Column 28/7/2017 129 13

Figure 1. Visits to the main page of the June issue since its publication (199 visits)

Figure 2. Visits to the main page of the June issue since its publication (199 visits)

Impact of the Social Media channels

The use of Social Media includes a Facebook page and a Twitter account (@sigmm). The number of followers is still not high (25 followers in Facebook, 80 followers in Twitter), which is natural with recently created channels. However, the impact of the posts on these platforms, in terms of reach, likes and shares is noteworthy. Tables 4 and 5 lists the top 3 Facebook posts and tweets, respectively, with highest impact up to now.

Table 4. Top 3 Facebook posts with highest impact

Post Date Reach (users) Likes Shares
>10K visitors in 3 weeks 21/08/2017 1347 6 4
Interview to Suranga Nanayakkara 13/09/2017 1221 81 3
Interview to Prof. Ramesh Jain 30/08/2017 642 28 4

Table 5. Top 3 tweets with highest impact

Post Date Likes Retweets
Announcing the publication of the June issue 28/07/2017 7 9
Announcing the availability of the official @sigmm account 8/09/2017 8 9
Social Media Reporter Award: Report from ICMR 2017 11/09/2017 5 8

Awarded Social Media Reporters

The Social Media co-chairs, with the approval of the SIGMM Executive Committee, have already started the processes of selecting the Best Social Media Reporters from the latest SIGMM conferences. In particular, Miriam Redi has been the winner from ICMR 2017 and her post-summary of the conference has been included in the September issue (available at: http://records.mlab.no/2017/09/02/report-from-icmr-2017/). Congratulations!

The Editorial Team would like to take this opportunity to thank all the SIGMM members who use Social Media channels to share relevant news and information from the SIGMM community. We are convinced it is a very important service for the community.

We will keep pushing to improve the Records and extend their impact!

The Editorial Team.

JPEG Column: 76th JPEG Meeting in Turin, Italy

The 76th JPEG meeting was held at Politecnico di Torino, Turin, Italy, from 15 to 21 of July. The current standardisation activities have been complemented by the 25th anniversary of the first JPEG standard. Simultaneously, JPEG pursues the development of different standardised solutions to meet the current challenges on imaging technology, namely on emerging new applications and on low complexity image coding. The 76th JPEG meeting featured mainly the following highlights:

  • JPEG 25th anniversary of the first JPEG standard
  • High Throughput JPEG 2000
  • JPEG Pleno
  • JPEG XL
  • JPEG XS
  • JPEG Reference Software

In the following an overview of the main JPEG activities at the 76th meeting is given.

JPEG 25th anniversary of the first JPEG standard – JPEG is proud tocelebrate the 25th anniversary of its first standard. This very successful standard won an Emmy award in 1995-96 and its usage is still rising, reaching in 2015 the impressive daily rate of over 3 billion images exchanged in just a few social networks. During the celebration, a number of early members of the committee were awarded for their contributions to this standard, namely Alain Léger, Birger Niss, Jorgen Vaaben and István Sebestyén. Also Richard Clark for his long lasting contribution as JPEG Webmaster and contributions to many JPEG standards was also rewarded during the same ceremony. The celebration will continue at the next 77th JPEG meeting that will be held in Macau, China from 21 to 27, October, 2017.

IMG_1113 2

High Throughput JPEG 2000 – The JPEG committee is continuing its work towards the creation of a new Part 15 to the JPEG 2000 suite of standards, known as High Throughput JPEG 2000 (HTJ2K). In a significant milestone, the JPEG Committee has released a Call for Proposals that invites technical contributions to the HTJ2K activity. The deadline for an expression of interest is 1 October 2017, as detailed in the Call for Proposals, which is publicly available on the JPEG website at https://jpeg.org/jpeg2000/htj2k.html.

The objective of the HTJ2K activity is to identify and standardize an alternate block coding algorithm that can be used as a drop-in replacement for the block coding defined in JPEG 2000 Part-1. Based on existing evidence, it is believed that significant increases in encoding and decoding throughput are possible on modern software platforms, subject to small sacrifices in coding efficiency. An important focus of this activity is interoperability with existing systems and content libraries. To ensure this, the alternate block coding algorithm supports mathematically lossless transcoding between HTJ2K and JPEG 2000 Part-1 codestreams at the code-block level.

JPEG Pleno – The JPEG committee intends to provide a standard framework to facilitate capture, representation and exchange of omnidirectional, depth-enhanced, point cloud, light field, and holographic imaging modalities. JPEG Pleno aims at defining new tools for improved compression while providing advanced functionalities at the system level. Moreover, it targets to support data and metadata manipulation, editing, random access and interaction, protection of privacy and ownership rights as well as other security mechanisms. At the 76th JPEG meeting in Turin, Italy, responses to the call for proposals for JPEG Pleno light field image coding were evaluated using subjective and objective evaluation metrics, and a Generic JPEG Pleno Light Field Architecture was created. The JPEG committee defined three initial core experiments to be performed before the 77thJPEG meeting in Macau, China. Interested parties are invited to join these core experiments and JPEG Pleno standardization.

JPEG XL – The JPEG Committee is working on a new activity, known as Next generation Image Format, which aims to develop an image compression format that demonstrates higher compression efficiency at equivalent subjective quality of currently available formats and that supports features for both low-end and high-end use cases.  On the low end, the new format addresses image-rich user interfaces and web pages over bandwidth-constrained connections. On the high end, it targets efficient compression for high-quality images, including high bit depth, wide color gamut and high dynamic range imagery. A draft Call for Proposals (CfP) on JPEG XL has been issued for public comment, and is available on the JPEG website.

JPEG XS – This project aims at the standardization of a visually lossless low-latency lightweight compression scheme that can be used as a mezzanine codec for the broadcast industry and Pro-AV markets. Targeted use cases are professional video links, IP transport, Ethernet transport, real-time video storage, video memory buffers, and omnidirectional video capture and rendering. After a Call for Proposal and the assessment of the submitted technologies, a test model for the upcoming JPEG XS standard was created. Several rounds of Core Experiments have allowed to further improving the Core Coding System, the last one being reviewed during this 76th JPEG meeting in Torino. More core experiments are on their way, including subjective assessments. JPEG committee therefore invites interested parties – in particular coding experts, codec providers, system integrators and potential users of the foreseen solutions – to contribute to the further specification process. Publication of the International Standard is expected for Q3 2018.

JPEG Reference Software – Together with the celebration of 25th anniversary of the first JPEG Standard, the committee continued with its important activities around the omnipresent JPEG image format; while all newer JPEG standards define a reference software guiding users in interpreting and helping them in implementing a given standard, no such references exist for the most popular image format of the Internet age. The JPEG committee therefore issued a call for proposals https://jpeg.org/items/20170728_cfp_jpeg_reference_software.html asking interested parties to participate in the submission and selection of valuable and stable implementations of JPEG (formally, Rec. ITU-T T.81 | ISO/IEC 10918-1).

 

Final Quote

“The experience shared by developers of the first JPEG standard during celebration was an inspiring moment that will guide us to further the ongoing developments of standards responding to new challenges in imaging applications. said Prof. Touradj Ebrahimi, the Convener of the JPEG committee.

About JPEG

JPEG-signatureThe Joint Photographic Experts Group (JPEG) is a Working Group of ISO/IEC, the International Organisation for Standardization / International Electrotechnical Commission, (ISO/IEC JTC 1/SC 29/WG 1) and of the Interna
tional Telecommunication Union (ITU-T SG16), responsible for the popular JBIG, JPEG, JPEG 2000, JPEG XR, JPSearch and more recently, the JPEG XT, JPEG XS, JPEG Systems and JPEG Pleno families of imaging standards.

The JPEG group meets nominally three times a year, in Europe, North America and Asia. The latest 75th    meeting was held on March 26-31, 2017, in Sydney, Australia. The next (76th) JPEG Meeting will be held on July 15-21, 2017, in Torino, Italy.

More information about JPEG and its work is available at www.jpeg.org or by contacting Antonio Pinheiro and Frederik Temmermans of the JPEG Communication Subgroup at pr@jpeg.org.

If you would like to stay posted on JPEG activities, please subscribe to the jpeg-news mailing list on https://listserv.uni-stuttgart.de/mailman/listinfo/jpeg-news.  Moreover, you can follow JPEG twitter account on http://twitter.com/WG1JPEG.

Future JPEG meetings are planned as follows:

  • No. 77, Macau, CN, 23 – 27 October 2017

 

Report from ACM ICMR 2017

ACM ICMR 2017 in “Little Paris”

ACM ICMR is the premier International Conference on Multimedia Retrieval, and from 2011 it “illuminates the state of the arts in multimedia retrieval”. This year, ICMR was in an wonderful location: Bucharest, Romania also known as “Little Paris”. Every year at ICMR I learn something new. And here is what I learnt this year.

img1

Final Conference Shot at UP Bucharest


UNDERSTANDING THE TANGIBLE: object, scenes, semantic categories – everything we can see.

1) Objects (and YODA) can be easily tracked in videos.

Arnold Smeulders delivered a brilliant keynote on “things” retrieval: given an object in an image, can we find (and retrieve) it in other images, videos, and beyond? Very interesting technique for tracking objects (e.g. Yoda) in videos based on similarity learnt through siamese networks.

Tracking Yoda with Siamese Networks

Tracking Yoda with Siamese Networks

2) Wearables + computer vision help explore cultural heritage sites.

As showed in his keynote, at MICC University of Florence, Alberto del Bimbo and his amazing team have designed smart audio guides for indoor and outdoor spaces. The system detects, recognises, and describes landmarks and artworks from wearable camera inputs (and GPS coordinates, in case of outdoor spaces)

3) We can finally quantify how much images provide complementary semantics compared to text [BEST MULTIMODAL PAPER AWARD].

For ages, the community have asked how relevant different modalities are for multimedia analysis: this paper (http://dl.acm.org/citation.cfm?id=3078991) finally proposes solution to quantify information gaps between different modalities.

4) Exploring news corpuses is now very easy: news graphs are easy to navigate and aware of the type of relations between articles.

Remi Bois and his colleagues presented this framework (http://dl.acm.org/citation.cfm?id=3079023), made for professional journalists and the general public, for seamlessly browsing through large-scale news corpus. They built a graph where nodes are articles in a news corpus. The most relevant items to each article are chosen (and linked) based on an adaptive nearest neighbor technique. Each link is then characterised according to the type of relation of the 2 linked nodes.

5) Panorama outdoor images are much easier to localise.

In his beautiful work (https://t.co/3PHCZIrA4N), Ahmet Iscen from Inria developed an algorithm for location prediction from StreetView images, outperforming the state of the art thanks to an intelligent stitching pre-processing step: predicting locations from panoramas (stitched individual views) instead of individual street images improve performances dramatically!

UNDERSTANDING THE INTANGIBLE: artistic aspects, beauty, intent: everything we can perceive

1) Image search intent can be predicted by the way we look.

In his best paper candidate research work (http://dl.acm.org/citation.cfm?id=3078995), Mohammad Soleymani showed that image search intent (seeking information, finding content, or re-finding content) can be predicted from physisological responses (eye gaze) and implicit user interaction (mouse movements).

2) Real-time detection of fake tweets is now possible using user and textual cues.

Another best paper (http://dl.acm.org/citation.cfm?id=3078979) candidate, this time from CERTH. The team collected a large dataset of fake/real sample tweets spanning 17 events and built an effective model from misleading content detection from tweet content and user characteristics. A live demo here: http://reveal-mklab.iti.gr/reveal/fake/.

3) Music tracks have different functions in our daily lives.

Researchers from TU Delft have developed an algorithm (http://dl.acm.org/citation.cfm?id=3078997) which classifies music tracks according to their purpose in our daily activities: relax, study and workout.

4) By transferring image style we can make images more memorable!

The team at University of Trento built an automatic framework (https://arxiv.org/abs/1704.01745) to improve image memorability. A selector finds the style seeds (i.e. abstract paintings) which are likely to increase memorability of a given image, and after style transfer, the image will be more memorable!

5) Neural networks can help retrieve and discover child book illustrations.

In this (https://arxiv.org/pdf/1704.03057.pdf) amazing work, motivated by real children experiences, Pinar and her team from Hacettepe University collected a large dataset of children book illustrations and found that neural networks can predict and transfer style, allowing to make “Winnie the witch”-like many other illustrations.

Winnie the Witch

Winnie the Witch

6) Locals perceive their neighborhood as less interesting, more dangerous and dirtier compared to non-locals. 

In this (http://www.idiap.ch/~gatica/publications/SantaniRuizGatica-icmr17.pdf) wonderful work presented by Darshan Santain from IDIAP, researchers asked locals and crowd-workers to look at pictures from various neighborhoods in Guanajuato.

THE FUTURE: What’s Next?

1) We will be able to anonymize images of outdoor spaces thanks to Instagram filters, as proposed by this (http://dl.acm.org/citation.cfm?id=3080543) work in the Brave New Idea session.

When an image of an outdoor space is manipulated with appropriate Instagram filters, the location of the image can be masked from vision-based geolocation classifiers.

2) Soon we will be able to embed watermarks in our Deep Neural Network models in order to protect our intellectual property [BEST PAPER AWARD].

This is a disruptive, novel idea, and that is why this work from KDDI Research and Japan National Institute of Informatics won the best paper award. Congratulations!

3) Given an image view of an object, we will predict the other side of things (from Smeulders’ keynote). In the pic: predicting the other side of chairs. Beautiful.

Predicting the other side of things

Predicting the other side of things 

 

THANKS: To the organisers, to the volunteers, and to all the authors for their beautiful work :)

MPEG Column: 119th MPEG Meeting in Turin, Italy

The original blog post can be found at the Bitmovin Techblog and has been updated here to focus on and highlight research aspects.

The MPEG press release comprises the following topics:

  • Evidence of New Developments in Video Compression Coding
  • Call for Evidence on Transcoding for Network Distributed Video Coding
  • 2nd Edition of Storage of Sample Variants reaches Committee Draft
  • New Technical Report on Signalling, Backward Compatibility and Display Adaptation for HDR/WCG Video Coding
  • Draft Requirements for Hybrid Natural/Synthetic Scene Data Container

Evidence of New Developments in Video Compression Coding

At the 119th MPEG meeting, responses to the previously issued call for evidence have been evaluated and they have all successfully demonstrated evidence. The call requested responses for use cases of video coding technology in three categories:

  • standard dynamic range (SDR) — two responses;
  • high dynamic range (HDR) — two responses; and
  • 360° omnidirectional video — four responses.

The evaluation of the responses included subjective testing and an assessment of the performance of the “Joint Exploration Model” (JEM). The results indicate significant gains over HEVC for a considerable number of test cases with comparable subjective quality at 40-50% less bit rate compared to HEVC for the SDR and HDR test cases with some positive outliers (i.e., higher bit rate savings). Thus, the MPEG-VCEG Joint Video Exploration Team (JVET) concluded that evidence exists of compression technology that may significantly outperform HEVC after further development to establish a new standard. As a next step, the plan is to issue a call for proposals at 120th MPEG meeting (October 2017) and responses expected to be evaluated at the 122th MPEG meeting (April 2018).

We already witness an increase of research articles addressing video coding technologies with capabilities beyond HEVC which will further increase in the future. The main driving force is over the top (OTT) delivery which calls for more efficient bandwidth utilization. However, competition is also increasing with the emergence of AV1 of AOMedia and we may observe also an increasing number of articles in that direction including evaluations thereof. An interesting aspect is also that the number of use cases is also increasing (e.g., see different categories above), which adds further challenges to the “complex video problem”.

Call for Evidence on Transcoding for Network Distributed Video Coding

The call for evidence on transcoding for network distributed video coding targets interested parties possessing technology providing transcoding of video at lower computational complexity than transcoding done using a full re-encode. The primary application is adaptive bitrate streaming where a highest bitrate stream is transcoded into lower bitrate streams. It is expected that responses may use “side streams” (or side information, some may call it metadata) accompanying the highest bitrate stream to assist in the transcoding process. MPEG expects submissions for the 120th MPEG meeting where compression efficiency and computational complexity will be assessed.

Transcoding has been discussed already for a long time and I can certainly recommend this article from 2005 published in the Proceedings of the IEEE. The question is, what is different now, 12 years later, and what metadata (or side streams/information) is required for interoperability among different vendors (if any)?

A Brief Overview of Remaining Topics…

  • The 2nd edition of storage of sample variants reaches Committee Draft and expands its usage to MPEG-2 transport stream whereas the first edition primarily focused on ISO base media file format.
  • The new technical report for high dynamic range (HDR) and wide colour gamut (WCG) video coding comprises a survey of various signaling mechanisms including backward compatibility and display adaptation.
  • MPEG issues draft requirements for a scene representation media container enabling the interchange of content for authoring and rendering rich immersive experiences which is currently referred to as hybrid natural/synthetic scene (HNSS) data container.

Other MPEG (Systems) Activities at the 119th Meeting

DASH is in fully maintenance mode as only minor enhancements/corrections have been discussed including contributions to conformance and reference software. The omnidirectional media format (OMAF) is certainly the hottest topic within MPEG systems which is actually between two stages (i.e., between DIS and FDIS) and, thus, a study of DIS has been approved and national bodies are kindly requested to take this into account when casting their votes (incl. comments). The study of DIS comprises format definitions with respect to coding and storage of omnidirectional media including audio and video (aka 360°). The common media application format (CMAF) has been ratified at the last meeting and awaits publications by ISO. In the meantime CMAF is focusing on conformance and reference software as well as amendments regarding various media profiles. Finally, requirements for a multi-image application format (MiAF) are available since the last meeting and at the 119th MPEG meeting a work draft has been approved. MiAF will be based on HEIF and the goal is to define additional constraints to simplify its file format options.

We have successfully demonstrated live 360 adaptive streaming as described here but we expect various improvements from standards available and under development of MPEG. Research aspects in these areas are certainly interesting in the area of performance gains and evaluations with respect to bandwidth efficiency in open networks as well as how these standardization efforts could be used to enable new use cases. 

Publicly available documents from the 119th MPEG meeting can be found here (scroll down to the end of the page). The next MPEG meeting will be held in Macau, China, October 23-27, 2017. Feel free to contact me for any questions or comments.

Report from ICMR 2017

ACM International Conference on Multimedia Retrieval (ICMR) 2017

ACM ICMR 2017 in “Little Paris”

ACM ICMR is the premier International Conference on Multimedia Retrieval, and from 2011 it “illuminates the state of the arts in multimedia retrieval”. This year, ICMR was in an wonderful location: Bucharest, Romania also known as “Little Paris”. Every year at ICMR I learn something new. And here is what I learnt this year.

ICMR2017

Final Conference Shot at UP Bucharest

UNDERSTANDING THE TANGIBLE: object, scenes, semantic categories – everything we can see.

1) Objects (and YODA) can be easily tracked in videos.

Arnold Smeulders delivered a brilliant keynote on “things” retrieval: given an object in an image, can we find (and retrieve) it in other images, videos, and beyond? Very interesting technique for tracking objects (e.g. Yoda) in videos based on similarity learnt through siamese networks.

Tracking Yoda with Siamese Networks

2) Wearables + computer vision help explore cultural heritage sites.

As showed in his keynote, at MICC University of Florence, Alberto del Bimbo and his amazing team have designed smart audio guides for indoor and outdoor spaces. The system detects, recognises, and describes landmarks and artworks from wearable camera inputs (and GPS coordinates, in case of outdoor spaces).

3) We can finally quantify how much images provide complementary semantics compared to text [BEST MULTIMODAL PAPER AWARD].

For ages, the community has asked how relevant different modalities are for multimedia analysis: this paper (http://dl.acm.org/citation.cfm?id=3078991) finally proposes a solution to quantify information gaps between different modalities.

4) Exploring news corpuses is now very easy: news graphs are easy to navigate and aware of the type of relations between articles.

Remi Bois and his colleagues presented this framework (http://dl.acm.org/citation.cfm?id=3079023), made for professional journalists and the general public, for seamlessly browsing through large-scale news corpus. They built a graph where nodes are articles in a news corpus. The most relevant items to each article are chosen (and linked) based on an adaptive nearest neighbor technique. Each link is then characterised according to the type of relation of the 2 linked nodes.

5) Panorama outdoor images are much easier to localise.

In his beautiful work (https://t.co/3PHCZIrA4N), Ahmet Iscen from Inria developed an algorithm for location prediction from StreetView images, outperforming the state of the art thanks to an intelligent stitching pre-processing step: predicting locations from panoramas (stitched individual views) instead of individual street images improves performances dramatically!

UNDERSTANDING THE INTANGIBLE: artistic aspects, beauty, intent: everything we can perceive

1) Image search intent can be predicted by the way we look.

In his best paper candidate research work (http://dl.acm.org/citation.cfm?id=3078995), Mohammad Soleymani showed that image search intent (seeking information, finding content, or re-finding content) can be predicted from physisological responses (eye gaze) and implicit user interaction (mouse movements).

2) Real-time detection of fake tweets is now possible using user and textual cues.

Another best paper candidate (http://dl.acm.org/citation.cfm?id=3078979), this time from CERTH. The team collected a large dataset of fake/real sample tweets spanning 17 events and built an effective model from misleading content detection from tweet content and user characteristics. A live demo here: http://reveal-mklab.iti.gr/reveal/fake/

3) Music tracks have different functions in our daily lives.

Researchers from TU Delft have developed an algorithm (http://dl.acm.org/citation.cfm?id=3078997) which classifies music tracks according to their purpose in our daily activities: relax, study and workout.

4) By transferring image style we can make images more memorable!

The team at University of Trento built an automatic framework (https://arxiv.org/abs/1704.01745) to improve image memorability. A selector finds the style seeds (i.e. abstract paintings) which are likely to increase memorability of a given image, and after style transfer, the image will be more memorable!

5) Neural networks can help retrieve and discover child book illustrations.

In this amazing work (https://arxiv.org/pdf/1704.03057.pdf), motivated by real children experiences, Pinar and her team from Hacettepe University collected a large dataset of children book illustrations and found that neural networks can predict and transfer style, allowing to make “Winnie the witch”-like many other illustrations.

Winnie the Witch

6) Locals perceive their neighborhood as less interesting, more dangerous and dirtier compared to non-locals.

In this wonderful work (http://www.idiap.ch/~gatica/publications/SantaniRuizGatica-icmr17.pdf), presented by Darshan Santain from IDIAP, researchers asked locals and crowd-workers to look at pictures from various neighborhoods in Guanajuato and rate them according to interestingness, cleanliness, and safety.

THE FUTURE: What’s Next?

1) We will be able to anonymize images of outdoor spaces thanks to Instagram filters, as proposed by this work (http://dl.acm.org/citation.cfm?id=3080543) in the Brave New Idea session.  When an image of an outdoor space is manipulated with appropriate Instagram filters, the location of the image can be masked from vision-based geolocation classifiers.

2) Soon we will be able to embed watermarks in our Deep Neural Network models in order to protect our intellectual property [BEST PAPER AWARD]. This is a disruptive, novel idea, and that is why this work from KDDI Research and Japan National Institute of Informatics won the best paper award. Congratulations!

3) Given an image view of an object, we will predict the other side of things (from Smeulders’ keynote). In the pic: predicting the other side of chairs. Beautiful.

Predicting the other side of things

THANKS: To the organisers, to the volunteers, and to all the authors for their beautiful work :)

EDITORIAL NOTE: A more extensive report from ICMR 2017 by Miriam is available on Medium

An interview with Prof. Ramesh Jain

Prof. Ramesh Jain in 2016.

Prof. Ramesh Jain in the year 2016.

Prof. Ramesh Jain in 2016.

Please describe your journey into computing from your youth up to the present. What foundational lessons did you learn from this journey? Why you were initially attracted to multimedia?

I am luckier than most people in that I have been able to experience really diverse situations in my life. Computing was just being introduced at Indian Universities when I was a student, so I never had a chance to learn computing in a classroom setting.  I took a few electronics courses as part of my undergraduate education, but nothing even close to computing.  I first used computers during my doctoral studies at the Indian Institute of Technology, Kharagpur, in 1970.  I was instantly fascinated and decided to use this emerging technology in the design of sophisticated control systems.  The information I picked up along the way was driven by my interests and passion.

I grew up in a traditional Indian Ashram, with no facilities for childhood education, so this was not the first time I faced a lack of formal instruction.  My father taught me basic reading, writing, and math skills and then I took a school placement exam.  I started school at the age of nine in fifth grade.

During my doctoral days, two areas fascinated me: computing and cybernetics.  I decided to do my research in digital control systems because it gave me a chance to combine computing and control.  At the time, the use of computing was very basic—digitizing control signals and understanding the effect of digitalization.  After my PhD, I became interested in artificial intelligence and entered AI through pattern recognition.  

In my current research, I am applying cybernetics to health.  Computing has finally matured enough that it can be applied in real control systems that play a critical role in our lives.  And what is more important to our well-being than our health?

The main driver of my career has been realizing that ultimately I am responsible for my own learning. Teachers are important, but ultimately I learn what I find interesting.  The most important attribute in learning is a person’s curiosity and desire to solve problems.  

Something else significantly impacted my thinking in my early research days.  I found that it is fundamental to accept ignorance about a problem and then examine concepts and techniques from multiple perspectives.  One person’s or one research paper’s perspective is just that—an opinion.  By examining multiple perspectives and relating those to your experiences, you can better understand a problem and its solutions.

Another important lesson is that problems or concepts are often independent of the academic and other organisational walls that exist.  Interesting problems always require perspectives, concepts, and technologies from different academic disciplines. Over time, it’s then necessary to create to new disciplines, or as Thomas Kuhn called them new paradigms [Kuhn 62].

In the late 1980s, much of my research was addressing different aspects of computer vision.  I was frustrated by the slow progress in computer vision.  In fact, I coauthored a paper on this topic that became quite controversial [Jain 91].  It was clear that computer vision could be central to computing in the real world, such as in industry, medical imaging, and robotics, but it was unable to solve any real problems.  Progress was slow.  

While working on object recognition, it became increasingly obvious to me that images alone do not contain enough information to solve the vision problem.  Projection of real-world images to a photograph results in a loss of information that can only be recovered by combining information from many other sources, including knowledge in many different forms, metadata, and other signals.  I started thinking that our goal should be to understand the real world using sensors and other sources of knowledge, not just images.  I felt that we were addressing the wrong problem—understanding the physical world using only images.  The real problem is to understand the physical world.  The physical world can only be understood by capturing correlated information.  To me, this is multimedia: understand the physical world using multiple disparate sensors and other sources of information.

This is a very good definition of multimedia. In this context, what do you think is the future of multimedia research in general?

Different aspects of physical world must be captured using different types of sensors. In early days, multimedia concerned itself with the two most dominant human senses:vision and hearing. As the field is advancing, we must deal with every type of sensor that is developed to capture information in different applications. Multimedia must become the area that processes disparate data in context to convert it to information.

Taking into account that you are working with AI for such a long time, what do you think about the current trend of deep learning and how it will develop?

Every field has its trends. Learning is definitely a very important step in AI and has attracted attention from early days. However, it was known that reasoning and search play equally important role in AI. Ultimately problem solving depends on recognizing real world objects and patterns and here learning plays key role. To design successful deep systems, learning needs to be combined with search and reasoning.

Prof. Ramesh Jain at an early stage of his career (1975).

Prof. Ramesh Jain at an early stage of his career (1975).

Please tell us more about your vision and objectives behind your current roles. What do you hope to accomplish, and how will you bring this about?

One thing that is of great interest to every human is their health.  Ironically, technology utilization in healthcare is not as pervasive as in many other fields.  Another intriguing fact about technology and health is that almost all progress in health is due to advances in technology, but barriers to using technology are also the most overwhelming in health.  I experienced the terrifying state of healthcare first hand while going through treatment for gastro-esophageal cancer in 2004.  It became clear to me during my fight with cancer that technology could revolutionize most aspects of treatment—from diagnosis to guidance and operationalization of patient care and engagement—but it was not being used.  During that period, it became clear to me that multimodal data leading to information and knowledge is the key to success in this and many other fields.  That experience changed my thinking and research.

Ancient civilizations observed that health is not the absence of disease; disease is a perturbation of a healthy state.  This wisdom was based on empirical observations and resulted in guidelines for healthy living that includes diet, sleep, and whole-body exercise, such as yoga or tai chi.  Now is the time to develop scientific guidelines based on the latest evolving knowledge and technology to maximize periods of overall health and minimize suffering during diseases in human lives.  It seems possible to raise life expectancy to 100+ years for most people.  I want to cross the 100-year threshold myself and live an active life until my last day.  I am working toward making that happen.

Technology for healthcare is increasingly a popular topic.  Data is at the center of healthcare, and new areas like precision health and wellness are becoming increasingly popular. At the University of California, Irvine (UCI), we’ve created a major effort to bring together researchers from Information and Computer Sciences, Health Sciences, Engineering, Public Health, Nursing, Biology, and others fields who are adopting a novel perspective in an effort to build technology that empowers people. From this perspective, we adopt a cybernetics approach to health.  This work is being done at the UCI’s Institute for Future Health, of which I am the founding director.  

At the Institute for Future Health, currently we are building a community that will do academic research as well as work closely with industry, local communities, hospitals, and start-up companies. We will also collaborate with global researchers and practitioners interested in this approach.  There is significant interest from several institutions in several countries to collaborate and pursue this approach.

This is very interesting and relevant! Do you think that the multimedia community will be open for such a direction or since it is so important and societal relevant would it be good to built a new research community around this idea?

As you said, this is the most important research direction I have been involved in and most challenging. And this is an important direction in itself — this needs to happen using all tech and other resources.

Since I can not wait for any community to be ready to address this, I started building a community to address Future Health. But, I believe that this could be the most relevant application for multimedia technology as well as the techniques from multimedia are very relevant to this area.

Exciting problem because the time is right to address this area.

Do you think that the multimedia community has the right skills to address medical multimedia problems and how could the community be encouraged into that direction?

Multimedia community is better equipped than any other community to deal with diverse types of data. New tools will be required for new challenges, but we already have enough tools and techniques to address many current challenges. To do this, however, the community has to become an open forward looking community going beyond visual information to consider all other modes that are currently ignored under ‘meta data’. All data is data and contributes to information.

Can you profile your current research and its challenges, opportunities, and implications?

I am involved in a research area that is one of the most challenging and that has implications for every human.

The most exciting aspect of health is that it is truly a multimodal data-intensive operation.  As discussed by Norbert Wiener in his book Cybernetics [Wiener 48] about 75 years ago, control and communication processes in machines and animals are similar and are based on information.  Until recently, these principles formed the basis for understanding health, but they can now be used to control health as well.  This is exciting for everybody, and it motivates me to work hard and make something happen. For others, but also for me.

We can discuss some fundamental components of this area from a cybernetics/information perspective:

Creating individual health model:  Each person is unique.  Our bodies and lives are determined by two major factors:  genetics and lifestyle.  Until recently, personal genome information was difficult to obtain, and personal lifestyle information was only anecdotally collected.  This century is different. Personal genomic, in fact all Omics, data is becoming easier to get and more precise and informative. And mobile phones, wearables, the Internet of Things (IoTs) around us, and social media are all coming together to quantitatively determine different aspects of our lifestyles as well as many bio-markers.

This requires combining multimodal data from different sources, which is a challenge. By collecting all such lifestyle data, we can start assembling a log of information—a kind of multimodal lifelog on turbo charge—that could be used to build a model of a person using event mining tools.  By combining genomic and lifestyle data, we can form a complete model of a person that contains all detailed health-related information.

Aggregating individual health models to population disease models:  Current disease models rely on limited data from real people.  Until recently, it was not possible to gather all such data. As discussed earlier, the situation is rapidly changing.  Once data is available for individual health models, it could be sliced and diced to formulate disease models for different populations and demographics.  This will be revolutionary.

Correlating health and related knowledge to actions for each individual and for society: Cybernetics underlies most complex engineering real-time systems.  The concept of feedback used generate a correct signal to be applied to a system to take it from the current state to a desired state is essential in all real-time control systems.  Even for the human body, homeostasis uses similar principles.  Can we use this to guide people in their lifestyle choices and medical compliance?  

Navigation systems are a good example of how an old, tedious problem can become extremely easy to use.  Only 15 years ago, we needed maps and a lot of planning to visit new places.  Now, mobile navigation systems can anticipate upcoming actions and even help you correct your mistakes gracefully, in real time.  They can also identify traffic conditions and suggest the best routes.

If technology can do this for navigation in the physical world, can we develop technology to help us select appropriate lifestyle decisions and do so perpetually?  The answer is obviously yes.  By compiling all health and related knowledge, determining your current personal health situation and surrounding environmental situations, and using your past chronicle to log your preferences, it can provide you with suggestions that will make your life not only more healthy but also more enjoyable.

This is our dream at the Institute for Future Health.

Future Health: Perpetual enhancement of health by managing lifestyle and environment.

Future Health: Perpetual enhancement of health by managing lifestyle and environment.

4) How would you describe your top innovative achievements in terms of the problems you were trying to solve, your solutions, and the impact it has today and into the future?

I am lucky to have been active for more than four decades and to have had the opportunity to participate in research and entrepreneurial activities in multiple countries at the best organizations. This gave me a chance to interact with the brightest young people as well as seasoned creative visionaries and researchers.  Thus, it is difficult for me to decide what to list.  I will adopt a chronological approach to answer your question.

Working in H.H. Nagel’s research group in Hamburg Germany, I got involved in developing an approach to motion detection and analysis in 1976.  We wrote the first papers on video analysis that worked with traffic video sequences and detected and analyzed the motion of cars, pedestrians, and other objects.  Our paper at IJCAI 1977 [Jain 77] was remarkable in showing these results at a time when digitizing a picture was a chore lasting minutes and the most powerful computer could not store a full video frame in its memory.  Even today, the first step in many video analysis systems is differencing, as proposed in that work.

Many bright people contributed powerful ideas in computer vision from my groups.  E. North Coleman was possibly the first person to propose Photometric Stereo in 1981 [Coleman].  Paul Besl’s work on segmentation using surface characteristics and 3D object recognition made a significant impact [Besl]. Tom Knoll did some exciting research on feature-indexed hypotheses for object recognition.  But Tom’s major contribution to current computer technology was his development of Photoshop when he was doing his PhD in my research group.  As we all know, Photoshop revolutionized how we view photos. Working with Kurt Skifstad at my first company Imageware, we demonstrated the first version of capturing a 3D shape of a person’s face and reproducing it using a machine in the next room at the Autofact Conference in 1994. I guess that was a primitive version of 3D printing.  At the time, we called it 3D fax.

The idea of designing a content-based organization to build a large database of images was considered crazy in 1990, but it bugged me so much that I started first a project and later a company, Virage, working with several people.  In fact, Bradley Horowitz left his research at MIT to join me in building Virage and later he managed the project that brought Google Photos to its current form.  That process building video databases resulted in my realizing that photos and videos are a lot more than just intensity values.  And that realization lead me to champion the idea that information about the physical world can be recovered more effectively and efficiently by combining correlated, but incomplete, information from several sources, including metadata.  This was the thinking that encouraged me to start building the multimedia community.

Since computing and camera technology had advanced enough by 1994, my research group at the University of California, San Diego (UCSD), particularly Koji Wakimoto[Jain 95] and then Arun Katkere and Saeed Moezzi [Moezzi 96] helped in developing initially Multiple Perspective Interactive Video and later Immersive video to realize compelling telepresence.  That research area in various forms attracted people from the movie industry as well as people interested in different art forms and collaborative spaces.  By licensing our patents from UCSD, we started a company Praja to bring immersive video technology to sports.  I left academia to be the CEO of Praja.

While developing technology for indexing sporting events, it became obvious that events are as important as objects, if not more, when indexing multimedia data.  Information about events comes from separate sources, and events combine different dimensions that play a key role in our understanding of the world.  This realization resulted in Westermann and I working on a general computational model for events.  Later we realized that by aggregating events over space and time, we could detect situations.  Vivek Singh and Mingyan Gao helped prototype an EventShop platform [Singh 2010], which was later converted to an open source platform under the leadership of Siripen Pongpaichet.

One of the most fundamental problems in society is connecting people’s needs to appropriate resources effectively, efficiently, and promptly in a given situation.  To understand people’s needs, it is essential to build objective models that could be used to recommend correct resources in given situations.  Laleh Jalali started building an event-mining framework that could be used to build an objective self model using the different types of data streams related to people that have now become easily available [Jalali 2015].  

All this work is leading to a framework that is behind my current thinking related to health intelligence. In health intelligence, our goal is to perpetually measure a person’s activities, lifestyle, environment, and bio-markers to understand his/her current state as well as continuously build his/her model. Using that model, current state, and medical knowledge, it is possible to provide perpetual guidance to help people take the right action in a given situation.

Over your distinguished career, what are the top lessons you want to share with the audience?

I have been lucky to get a chance to work on several fun projects.  More importantly, I have worked closely on an equal number of successful and not so successful projects. I consider a project successful if it accomplishes its goal and the people working on the project enjoy it.  Although each project is unique, I’ve noticed that some common themes make for a project successful.

Passion for the Project:  Time and again, I’ve seen that passion for the project makes a huge difference. When people are passionate, they don’t consider it work and will literally do whatever is required to make it successful.  In my own case, I find that the ideas that I find compelling, both in terms of their goals and implications, are the ones that motivate me to do my best.  I am focused, driven, and willing to work hard.  I learned long ago to work only on problems that I find important and compelling.  Some ideas are just not for me.  Otherwise, it is better for the project and for me if I dissociate with it at the first opportunity to do so.

Open Mind:  Departmental or similar boundaries in both academia and industry severely restrict how a problem is addressed.  Solving a problem should be the goal, not using the resources or technology of a specific department.  In academia, I often hear things like “this is not a multimedia problem” or “this is database problem.”  Usually, the goal of a project is to solve a problem, so we should use the best technique or resource available to solve the problem.

Most of the boundaries for academic disciplines are artificial, and because they keep changing, the departments based on any specific factor will likely also change over time.  By addressing challenging problems using appropriate technology and resources, we push boundaries and either expand older boundaries or create new disciplines.

Another manifestation of an open mind is the ability to see the same problem from multiple perspectives.  This is not easy—we all have our biases.  The best thing to do is to form a group of researchers from diverse cultural and disciplinary backgrounds.  Diversity naturally results in diverse perspectives.

Persistence:  Good research is usually the result of sustained efforts to understand and solve a challenge.  Many intrinsic and extrinsic issues must be handled during a successful research journey. By definition, an important research challenge requires navigating unchartered territories.  Many people get frustrated in an unmapped area and when there is no easy way to evaluate progress.  In my experience, even some of my brightest students are comfortable only when they can say I am better than X approach by N%.  In most novel problems, there is no X and no metrics to judge performance. Only a few people are comfortable in such situations where incremental progress may not be computable.  We require both kinds of people: those who can improve given approaches and those who can pioneer new areas.  The second group requires people that can be confident about their research directions without having concrete external evaluation measures.  The ability to work confidently without external affirmation is essential in important deep challenges.

In the current culture, a researcher’s persistence is also tested by “publish or perish” oriented colleagues who determine the quality of research by acceptance rates at the so-called top conferences. When your papers are rejected, you are dejected and sometimes feel that you are doing the wrong research.  Not always true.  The best thing about these conferences is that they test your self-confidence.

We have all read the stories about the research that ultimately resulted in the WWW and the paper on PageRank that later became the foundation of Google search.  Both were initially rejected. Yet, the authors were confident in their work so they persevered.  When one of my papers gets rejected (which is more often the case than with my much inferior papers), much of the time the reviewers are looking for incremental work—the trendy topics—and don’t have time, openness, and energy to think beyond what they and their friends have been doing. I read and analyze reviewers’ comments to see whether they understood my work and then decide whether to take them seriously or ignore them.  In other words, you have to be confident of your own ideas and review the reviews to decide your next steps.

I noticed that one of your favourite quotes is “Imagination is more important than knowledge.” In this regard, do you think there is enough “imagination” in today’s research, or are researchers mainly driven/constrained by grants, metrics, and trends? 

The complete quote by Albert Einstein is “Imagination is more important than knowledge. For knowledge is limited, whereas imagination embraces the entire world, stimulating progress, giving birth to evolution.”  So knowledge begins with imagination. Imagination is the beginning of a hypothesis. When the hypothesis is validated, that results in knowledge.

People often seek short-term rewards.  It is easier to follow trends and established paradigms than to go against them or create new paradigms.  This is nothing new; it has always happened. At one time scientists, like Galileo Galilei, were persecuted for opposing the established beliefs. Today, I only have to worry about my papers and grant proposals getting rejected.  The most engaged researchers are driven by their passion and the long-term rewards that may (or may not) come with it.

Albert Einstein (Source: Planet Science)

Albert Einstein (Source: Planet Science)

References:

  1. Kuhn, T. S. The Structure of Scientific Revolutions. Chicago: University of Chicago Press, 1962. ISBN 0-226-45808-3
  2. R. Jain and T. O. Binford, “Ignorance, Myopia, and Naiveté in Computer    Vision Systems,” CVGIP, Image Understanding, 53(1), 112-117. 1991.   
  3. Norbert Wiener, Cybernetics: Or Control and Communication in the Animal and the Machine. Paris, (Hermann & Cie) & Camb. Mass. (MIT Press) ISBN 978-0-262-73009-9; 2nd revised ed. 1961.
  4. R. Jain, D. Militzer and H. Nagel, “Separating a Stationary Form from Nonstationary Scene Components in a Sequence of Real World TV Frames,” Proceedings of IJCAI 77, Cambridge, Massachusetts, 612-618. 1977.
  5. E. N. Coleman and R. Jain, “Shape from Shading for Surfaces with Texture    and Specularity,” Proceedings of IJCAI. 1981.  
  6. P. Besl, and R. Jain, “Invariant Surface Characteristics for 3-D Object     Recognition in Depth Maps,” Computer Vision, Graphics and Image Processing, 33, 33-80. 1986.
  7. R. Jain and K. Wakimoto, “Multiple Perspective Interactive Video,” Proceedings of IEEE Conference on Multimedia Systems. May 1995.
  8. S. Moezzi, Arun Katkere, D. Kuramura, and R. Jain, “Reality Modeling    and Visualization from Multiple Video Sequences,” IEEE Computer     Graphics and Applications, 58-63. November 1996.
  9. Vivek Singh, Mingyan Gao, and Ramesh Jain,”Social Pixels: Genesis and evaluation”, Proc. ACM Multimedia, 2010.
  10. Laleh Jalali, Ramesh Jain: Bringing Deep Causality to Multimedia Data Streams. ACM Multimedia 2015: 221-230

 

 

Awarding the Best Social Media Reporters

The SIGMM Records team has adopted a new strategy to encourage the publication of information, and thus increase the chances to reach the community, increase knowledge and foster interaction. It consists of awarding the best Social Media reporters for each SIGMM conference, being the award a free registration to one of the SIGMM conference within a period of one year. All SIGMM members are welcome to participate and contribute, and are candidates to receive the award.

The Social Media Editors will issue a new open Call for Reports (CfR) via the Social Media channels every time a new SIGMM conference takes place, so the community can remember or be aware of this initiative, as well as can refresh its requirements and criteria.

The CfR will encourage activity on Social Media channels, posting information and contents related to the SIGMM conferences, with the proper hashtags (see our Recommendations). The reporters will be encouraged to mainly use Twitter, but other channels and innovative forms or trends of dissemination will be very welcome!

The Social Media Editors will be the jury for deciding the best reports (i.e., collection of posts) on Social Media channels, and thus will not qualify for this award. The awarded reporters will be additionally asked to provide a post-summary of the conference. The number of awards for each SIGMM conference is indicated in the table below. The awarded reporter will get a free registration to one of the SIGMM conferences (of his/her choice) within a period of one year.

Read more

Posting about SIGMM on Social Media

In Social Media, a common and effective mechanism to associate the publications about a specific thread, topic or event is to use hashtags. Therefore, the Social Media Editors believe in the convenience of recommending standards or basic rules for the creation and usage of the hashtags to be included in the publications related to the SIGMM conferences.

In this context, a common doubt is whether to include the ACM word and the year in the hashtags for conferences. Regarding the year, our recommendation is to not include it, as the date is available for the publications themselves and, this way, a single hashtag can be used to gather all the publications for all the editions of a specific SIGMM conference. Regarding the ACM word, our recommendation is to include it in the hashtag only if the conference acronym contains less than four letters (i.e., #acmmm, #acmtvx) and otherwise not (i.e., #mmsys, #icmr). Although consistency is important, not including ACM for MM (and for TVX) is clearly not a good identifier, and including it for MMSYS and ICMR results in a too long hashtag. Indeed, the #acmmmsys and #acmicmr hashtags have not been used before, contrarily to the wide use of #acmmm (and also of #acmtvx). Therefore, our recommendations for the usage and inclusion of hashtags can be summarized as:

Conference Hashtag

Include #ACM and #SIGMM?

MM #acmmm Yes
MMSYS #mmsys Yes
ICMR #icmr Yes

 

 

Report from MMM 2017

Harpa, the venue of the conference banquet.

MMM 2017 — 23rd International Conference on MultiMedia Modeling

MMM is a leading international conference for researchers and industry practitioners for sharing new ideas, original research results and practical development experiences from all MMM related areas. The 23rd edition of MMM took place on January 4-6 of 2017, on the modern campus of Reykjavik University. In this short report, we outline the major aspects of the conference, including: technical program; best paper session; video browser showdown; demonstrations; keynotes; special sessions; and social events. We end by acknowledging the contributions of the many excellent colleagues who helped us organize the conference. For more details, please refer to the MMM 2017 web site.

Technical Program

The MMM conference calls for research papers reporting original investigation results and demonstrations in all areas related to multimedia modeling technologies and applications. Special sessions were also held that focused on addressing new challenges for the multimedia community.

This year, 149 regular full paper submissions were received, of which 36 were accepted for oral presentation and 33 for poster presentation, for a 46% acceptance ratio. Overall, MMM received 198 submissions for all tracks, and accepted 107 for oral and poster presentation, for a total of 54% acceptance rate. For more details, please refer to the table below.

MMM2017 Submissions and Acceptance Rates

MMM2017 Submissions and Acceptance Rates


Best Paper Session

Four best paper candidates were selected for the best paper session, which was a plenary session at the start of the conference.

The best paper, by unanimous decision, was “On the Exploration of Convolutional Fusion Networks for Visual Recognition” by Yu Liu, Yanming Guo, and Michael S. Lew. In this paper, the authors propose an efficient multi-scale fusion architecture, called convolutional fusion networks (CFN), which can generate the side branches from multi-scale intermediate layers while consuming few parameters.

Phoebe Chen, Laurent Amsaleg and Shin’ichi Satoh (left) present the Best Paper Award to Yu Liu and Yanming Guo (right).

Phoebe Chen, Laurent Amsaleg and Shin’ichi Satoh (left) present the Best Paper Award to Yu Liu and Yanming Guo (right).

The best student paper, partially chosen due to the excellent presentation of the work, was “Cross-modal Recipe Retrieval: How to Cook This Dish?” by Jingjing Chen, Lei Pang, and Chong-Wah Ngo. In this work, the problem of sharing food pictures from the viewpoint of cross-modality analysis was explored. Given a large number of image and recipe pairs acquired from the Internet, a joint space is learnt to locally capture the ingredient correspondence from images and recipes.

Phoebe Chen, Laurent Amsaleg and Shin’ichi Satoh (left) present the Best Student Paper Award to Jingjing Chen and Chong-Wah Ngo (right).

Phoebe Chen, Shin’ichi Satoh and Laurent Amsaleg (left) present the Best Student Paper Award to Jingjing Chen and Chong-Wah Ngo (right).

The two runners-up were “Spatio-temporal VLAD Encoding for Human Action Recognition in Videos” by Ionut Cosmin Duta, Bogdan Ionescu, Kiyoharu Aizawa, and Nicu Sebe, and “A Framework of Privacy-Preserving Image Recognition for Image-Based Information Services” by Kojiro Fujii, Kazuaki Nakamura, Naoko Nitta, and Noboru Babaguchi.

Video Browser Showdown

The Video Browser Showdown (VBS) is an annual live video search competition, which has been organized as a special session at MMM conferences since 2012. In VBS, researchers evaluate and demonstrate the efficiency of their exploratory video retrieval tools on a shared data set in front of the audience. The participating teams start with a short presentation of their system and then perform several video retrieval tasks with a moderately large video collection (about 600 hours of video content). This year, seven teams registered for VBS, although one team could not compete for personal and technical reasons. For the first time in 2017, live judging was included, in which a panel of expert judges made decisions in real-time about the accuracy of the submissions for ⅓ of the tasks.

Teams and spectators in the Video Browser Showdown.

Teams and spectators in the Video Browser Showdown.

On the social side, two changes were also made from previous conferences. First, VBS was held in a plenary session, to avoid conflicts with other schedule items. Second, the conference reception was held at VBS, which meant that attendees had extra incentives to attend VBS, namely food and drink. And third, Alan Smeaton served as “color commentator” during the competition, interviewing the organizers and participants, and helping explain to the audience what was going on. All of these changes worked well, and contributed to a very well attended VBS session.

The winners of VBS 2017, after a very even and exciting competition, were Luca Rossetto, Ivan Giangreco, Claudiu Tanase, Heiko Schuldt, Stephane Dupont and Omar Seddati, with their IMOTION system.

The winners of VBS 2017, after a very even and exciting competition, were Luca Rossetto, Ivan Giangreco, Claudiu Tanase, Heiko Schuldt, Stephane Dupont and Omar Seddati, with their IMOTION system.


Demonstrations

Five demonstrations were presented at MMM. As in previous years, the best demonstration was selected using both a popular vote and a selection committee. And, as in previous years, both methods produced the same winner, which was: “DeepStyleCam: A Real-time Style Transfer App on iOS” by Ryosuke Tanno, Shin Matsuo, Wataru Shimoda, and Keiji Yanai.

The winners of the Best Demonstration competition hard at work presenting their system.

The winners of the Best Demonstration competition hard at work presenting their system.


Keynotes

The first keynote, held in the first session of the conference, was “Multimedia Analytics: From Data to Insight” by Marcel Worring, University of Amsterdam, Netherlands. He reported on a novel multimedia analytics model based on an extensive survey of over eight hundred papers. In the analytics model, the need for semantic navigation of the collection is emphasized and multimedia analytics tasks are placed on an exploration-search axis. Categorization is then proposed as a suitable umbrella task for realizing the exploration-search axis in the model. In the end, he considered the scalability of the model to collections of 100 million images, moving towards methods which truly support interactive insight gain in huge collections.

Björn Þór Jónsson introduces the first keynote speaker, Marcel Worring (right).

Björn Þór Jónsson introduces the first keynote speaker, Marcel Worring (right).

The second keynote, held in the last session of the conference, was “Creating Future Values in Information Access Research through NTCIR” by Noriko Kando, National Institute of Informatics, Japan. She reported on NTCIR (NII Testbeds and Community for Information access Research), which is a series of evaluation workshops designed to enhance the research in information access technologies, such as information retrieval, question answering, and summarization using East-Asian languages, by providing infrastructures for research and evaluation. Prof Kando provided motivations for the participation in such benchmarking activities and she highlighted the range of scientific tasks and challenges that have been explored at NTCIR over the past twenty years. She ended with ideas for the future direction of NTCIR.

key2

Noriko Kando presents the second MMM keynote.


Special Sessions

During the conference, four special sessions were held. Special sessions are mini-venues, each focusing on one state-of-the-art research direction within the multimedia field. The sessions are proposed and chaired by international researchers, who also manage the review process, in coordination with the Program Committee Chairs. This year’s sessions were:
– “Social Media Retrieval and Recommendation” organized by Liqiang Nie, Yan Yan, and Benoit Huet;
– “Modeling Multimedia Behaviors” organized by Peng Wang, Frank Hopfgartner, and Liang Bai;
– “Multimedia Computing for Intelligent Life” organized by Zhineng Chen, Wei Zhang, Ting Yao, Kai-Lung Hua, and Wen-Huang Cheng; and
– “Multimedia and Multimodal Interaction for Health and Basic Care Applications” organized by Stefanos Vrochidis, Leo Wanner, Elisabeth André, Klaus Schoeffmann.

Social Events

This year, there were two main social events at MMM 2017: a welcome reception at the Video Browser Showdown, as discussed above, and the conference banquet. Optional tours then allowed participants to further enjoy their stay on the unique and beautiful island.

The conference banquet was held in two parts. First, we visited the exotic Blue Lagoon, which is widely recognised as one of the modern wonders of the world and one of the most popular tourist destinations in Iceland. MMM participants had the option of bathing for two hours in this extraordinary spa, and applying the healing silica mud to their skin, before heading back for the banquet in Reykjavík.

The banquet itself was then held at the Harpa Reykjavik Concert Hall and Conference Centre in downtown Reykjavík. Harpa is one of Reykjavik‘s most recent, yet greatest and most distinguished landmarks. It is a cultural and social centre in the heart of the city and features stunning views of the surrounding mountains and the North Atlantic Ocean.

Harpa, the venue of the conference banquet.

Harpa, the venue of the conference banquet.

During the banquet, Steering Committee Chair Phoebe Chen gave a historical overview of the MMM conferences and announced the venues for MMM 2018 (Bangkok, Thailand) and MMM 2019 (Thessaloniki, Greece), before awards for the best contributions were presented. Finally, participants were entertained by a small choir, and were even asked to participate in singing a traditional Icelandic folk song.

MMM 2018 will be held at Chulalongkorn University in Bangkok, Thailand.  See http://mmm2018.chula.ac.th/.

MMM 2018 will be held at Chulalongkorn University in Bangkok, Thailand. See http://mmm2018.chula.ac.th/.


Acknowledgements

There are many people who deserve appreciation for their invaluable contributions to MMM 2017. First and foremost, we would like to thank our Program Committee Chairs, Laurent Amsaleg and Shin’ichi Satoh, who did excellent work in organizing the review process and helping us with the organization of the conference; indeed they are still hard at work with an MTAP special issue for selected papers from the conference. The Proceedings Chair, Gylfi Þór Guðmundsson, and Local Organization Chair, Marta Kristín Lárusdóttir, were also tirelessly involved in the conference organization and deserve much gratitude.

Other conference officers contributed to the organization and deserve thanks: Frank Hopfgartner and Esra Acar (demonstration chairs); Klaus Schöffmann, Werner Bailer and Jakub Lokoč (VBS Chairs); Yantao Zhang and Tao Mei (Sponsorship Chairs); all the Special Session Chairs listed above; the 150 strong Program Committee, who did an excellent job with the reviews; and the MMM Steering Committee, for entrusting us with the organization of MMM 2017.

Finally, we would like to thank our student volunteers (Atli Freyr Einarsson, Bjarni Kristján Leifsson, Björgvin Birkir Björgvinsson, Caroline Butschek, Freysteinn Alfreðsson, Hanna Ragnarsdóttir, Harpa Guðjónsdóttir), our hosts at Reykjavík University (in particular Arnar Egilsson, Aðalsteinn Hjálmarsson, Jón Ingi Hjálmarsson and Þórunn Hilda Jónasdóttir), the CP Reykjavik conference service, and all others who helped make the conference a success.

JPEG Column: 75th JPEG Meeting in Sydney, Australia

17499035_10206924918881900_1929813196733915915_n

The 75th JPEG meeting was held at National Standards Australia in Sydney, Australia, from 26 to 31 March. Multiples activities have been ensued, pursuing the development of new standards that meet the current requirements and challenges on imaging technology. JPEG is continuously trying to provide new reliable solutions for different image applications. The 75th JPEG meeting featured mainly the following highlights:

  • JPEG issues a Call for Proposals on Privacy & Security;Image may contain: 3 people, people sitting, screen and indoor
  • New draft Call for Proposal for a Part 15 of JPEG 2000 standard on High Throughput coding;
  • JPEG Pleno defines methodologies for proposals evaluation;
  • A test model for the upcoming JPEG XS standard was created;
  • A new standardisation effort on Next generation Image Formats was initiated.

In the following an overview of the main JPEG activities at the 75th meeting is given.

JPEG Privacy & Security – JPEG Privacy & Security is a work item (ISO/IEC 19566-4) aiming at developing a standard for providing technical solutions which can ensure privacy, maintaining data integrity, and protecting intellectual property rights (IPR). JPEG Privacy & Security is exploring ways on how to design and implement the necessary features without significantly impacting coding performance while ensuring scalability, interoperability, and forward & backward compatibility with current JPEG standard frameworks.
Since the JPEG committee intends to interact closely with actors in this domain, public workshops on JPEG Privacy & Security were organised in previous JPEG meetings. The first workshop was organized on October 13, 2015 during the JPEG meeting in Brussels, Belgium. The second workshop was organized on February 23, 2016 during the JPEG meeting in La Jolla, CA, USA. Following the great success of these workshops, a third and final workshop was organized on October 18, 2016 during the JPEG meeting in Chengdu, China. These workshops targeted on understanding industry, user, and policy needs in terms of technology and supported functionalities. The proceedings of these workshops are published on the Privacy and Security page of JPEG website at www.jpeg.org under Systems section.
The JPEG Committee released a Call for Proposals that invites contributions on adding new capabilities for protection and authenticity features for the JPEG family of standards. Interested parties and content providers are encouraged to participate in this standardization activity and submit proposals. The deadline for an expression of interest and submissions of proposals has been set to October 6th, 2017, as detailed in the Call for Proposals. The Call for Proposals on JPEG Privacy & Security is publicly available on the JPEG website, https://jpeg.org/jpegsystems/privacy_security.html.

High Throughput JPEG 2000 – The JPEG committee is working towards the creation of a new Part 15 to the JPEG 2000 suite of standards, known as High Throughput JPEG 2000 (HTJ2K). The goal of this project is to identify and standardize an alternate block coding algorithm that can be used as a drop-in replacement for the algorithm defined in JPEG 2000 Part-1. Based on existing evidence, it is believed that large increases in encoding and decoding throughput (e.g., 10X or beyond) should be possible on modern software platforms, subject to small sacrifices in coding efficiency. An important focus of this activity is inter-operability with existing systems and content repositories. In order to ensure this, the alternate block coding algorithm that will be the subject of this new Part of the standard should support mathematically lossless transcoding between HTJ2K and JPEG 2000 Part-1 codestreams at the code-block level. A draft Call for Proposals (CfP) on HTJ2K has been issued for public comment, and is available on JPEG web-site.

JPEG Pleno – The responses to the JPEG Pleno Call for Proposals on Light Field Coding will be evaluated at the July JPEG meeting in Torino. During JPEG 75th meetings has been defined the quality assessment procedure for this highly challenging type of large volume data. In addition to light fields, JPEG Pleno is also addressing point cloud and holographic data. Currently, the committee is undertaking in-depth studies to prepare standardization efforts on coding technologies for these image data types, encompassing the collection of use cases and requirements, but also investigations towards accurate and appropriate quality assessment procedures for associated representation and coding technologies. JPEG committee is probing for input from the involved industrial and academic communities.

JPEG XS – This project aims at the standardization of a visually lossless low-latency lightweight compression scheme that can be used as a mezzanine codec for the broadcast industry and Pro-AV markets. Targeted use cases are professional video links, IP transport, Ethernet transport, real-time video storage, video memory buffers, and omnidirectional video capture and rendering. After a Call for Proposal and the assessment of the submitted technologies, a test model for the upcoming JPEG XS standard was created and results of core experiments have been reviewed during the 75th JPEG meeting in Sydney. More core experiments are on their way to further improve the final standard: JPEG committee therefore invites interested parties – in particular coding experts, codec providers, system integrators and potential users of the foreseen solutions – to contribute to the further specification process.

Next generation Image Formats – The JPEG Committee is exploring a new activity, which aims to develop an image compression format that demonstrates higher compression efficiency at equivalent subjective quality of currently available formats, and that supports features for both low-end and high-end use cases.  On the low end, the new format addresses image-rich user interfaces and web pages over bandwidth-constrained connections. On the high end, it targets efficient compression for high-quality images, including high bit depth, wide color gamut and high dynamic range imagery.

Final Quote

“JPEG is committed to accommodate reliable and flexible security tools for JPEG file formats without compromising legacy usage of our standards said Prof. Touradj Ebrahimi, the Convener of the JPEG committee.

About JPEG

JPEG-signatureThe Joint Photographic Experts Group (JPEG) is a Working Group of ISO/IEC, the International Organisation for Standardization / International Electrotechnical Commission, (ISO/IEC JTC 1/SC 29/WG 1) and of the Interna
tional Telecommunication Union (ITU-T SG16), responsible for the popular JBIG, JPEG, JPEG 2000, JPEG XR, JPSearch and more recently, the JPEG XT, JPEG XS, JPEG Systems and JPEG Pleno families of imaging standards.

The JPEG group meets nominally three times a year, in Europe, North America and Asia. The latest 75th    meeting was held on March 26-31, 2017, in Sydney, Australia. The next (76th) JPEG Meeting will be held on July 15-21, 2017, in Torino, Italy.

More information about JPEG and its work is available at www.jpeg.org or by contacting Antonio Pinheiro (pinheiro@ubi.pt) or Frederik Temmermans (ftemmerm@etrovub.be) of the JPEG Communication Subgroup.

If you would like to stay posted on JPEG activities, please subscribe to the jpeg-news mailing list on https://listserv.uni-stuttgart.de/mailman/listinfo/jpeg-news.  Moreover, you can follow JPEG twitter account on http://twitter.com/WG1JPEG.

Future JPEG meetings are planned as follows:

  • No. 76, Torino, IT, 17 – 21 July, 2017
  • No. 77, Macau, CN, 23 – 27 October 2017