JPEG Column: 77th JPEG Meeting in Macau, China

 

JPEG XS is now entering the final phases of the standard definition and soon will be available. It is important to clarify the change on the typical JPEG approach, as this is the first JPEG image compression standard that is not developed only targeting the best compression performance for the best perceptual quality. Instead, JPEG XS establishes a compromise between compression efficiency and low complexity. This new approach is also complemented with the development of a new part for the well-established JPEG 2000, named High Throughput JPEG 2000.

With these initiatives JPEG committee is standardizing low complexity and low latency codecs, with a slight sacrifice of the compression performance usually seek in previous standards. This change of paradigm is justified considering the current trends on multimedia technology with the continuous grow on devices that are usually highly dependent of battery life cycles, namely mobiles, tablets, and also augmented reality devices or autonomous robots. Furthermore this standard provides support for applications like Omnidirectional video capture or real time video storage and streaming applications. Nowadays, networks tend to grow in available bandwidth. The memory available in most devices has also been reaching impressive numbers. Although compression is still required to simplify the large amount of data manipulation, its performance might become secondary if kept into acceptable levels. As it is obvious, considering the advances in coding technology of the last 25 years, these new approaches define codecs with compression performances largely above the JPEG standard used in most devices today. Moreover, they provide enhanced capabilities like HDR support, lossless or near lossless modes, or alpha plane coding.

On the 77th JPEG meeting held in Macau, China, from 21st to 27th of October several activities have been considered, as shortly described in the following.

IMG_3037r025

  1. A call for proposals on JPEG 360 Metadata for the current JPEG family of standards has been issued.
  2. New advances on low complexity/low latency compression standards, namely JPEG XS and High Throughput JPEG 2000.
  3. Continuation of JPEG Pleno project that will lead to a family of standards on different 3D technologies, like light fields, digital holography and also point clouds data.
  4. New CfP for the Next-Generation Image Compression Standard.
  5. Definition of a JPEG reference software.

Moreover, a celebration of the 25th JPEG anniversary where early JPEG committee members from Asia have been awarded has taken place.

The different activities are described in the following paragraphs.

 

JPEG Privacy and Security

JPEG Privacy & Security is a work item (ISO/IEC 19566-4) aiming at developing a standard that provides technical solutions, which can ensure privacy, maintaining data integrity and protecting intellectual property rights (IPR). A Call for Proposals was published in April 2017 and based on descriptive analysis of submitted solutions for supporting protection and authenticity features in JPEG files, a working draft of JPEG Privacy & Security in the context of JPEG Systems standardization was produced during the 77th JPEG meeting in Macau, China. To collect further comments from the stakeholders in this filed, an open online meeting for JPEG Privacy & Security will be conducted before the 78th JPEG meeting in Rio de Janeiro, Brazil, on Jan. 27-Feb 2, 2018. JPEG Committee invites interested parties to the meeting. Details will be announced in the JPEG Privacy & Security AhG email reflector.

 

JPEG 360 Metadata

The JPEG Committee has issued a “Draft Call for Proposals (CfP) on JPEG 360 Metadata” at the 77th JPEG meeting in Macau, China. The JPEG Committee notes the increasing use of multi-sensor images from multiple image sensor devices, such as 360 degree capturing cameras or dual-camera smartphones available to consumers. Images from these cameras are shown on computers, smartphones and Head Mounted Displays (HMDs). JPEG standards are commonly used for image compression and file format to store and share such content. However, because existing JPEG standards do not fully cover all new uses, incompatibilities have reduced the interoperability of 360 images, and thus reduce the widespread ubiquity, which consumers have come to expect when using JPEG-based images. Additionally, new modalities for interaction with images, such as computer-based augmentation, face-tagging, and object classification, require support for metadata that was not part of the scope of the original JPEG. To avoid fragmentation in the market and to ensure interoperability, a standard way of interaction with multi-sensor images with richer metadata is desired in JPEG standards. This CfP invites all interested parties, including manufacturers, vendors and users of such devices to submit technology proposals for enabling interactions with multi-sensor images and metadata that fulfill the scope, objectives and requirements.

 

High Throughput JPEG 2000

The JPEG Committee is continuing its work towards the creation of a new Part 15 to the JPEG 2000 suite of standards, known as High Throughput JPEG 2000 (HTJ2K).

Since the release of an initial Call for Proposals (CfP) at the outcome of its 76th meeting, the JPEG Committee has completed the software test bench that will be used to evaluate technology submissions, and has reviewed initial registrations of intent. Final technology submissions are due on 1 March 2018.

The HTJ2K activity aims to develop an alternate block-coding algorithm that can be used in place of the existing block coding algorithm specified in ISO/IEC 15444-1 (JPEG 2000 Part 1). The objective is to significantly increase the throughput of JPEG 2000, at the expense of a small reduction in coding efficiency, while allowing mathematically lossless transcoding to and from codestreams using the existing block coding algorithm.

 

JPEG XS

This project aims at the standardization of a visually lossless low-latency lightweight compression scheme that can be used as a mezzanine codec for the broadcast industry, Pro-AV and other markets. Targeted use cases are professional video links, IP transport, Ethernet transport, real-time video storage, video memory buffers, and omnidirectional video capture and rendering. After four rounds of Core Experiments, the Core Coding System has now been finalized and the ballot process has been initiated.

Additional parts of the Standard are still being specified, in particular future profiles, as well as transport and container formats. The JPEG Committee therefore invites interested parties – in particular coding experts, codec providers, system integrators and potential users of the foreseen solutions – to contribute to the further specification process. Publication of the International Standard is expected for Q3 2018.

 

JPEG Pleno

This standardization effort is targeting the generation of a multimodal framework for the exchange of light field, point cloud, depth+texture and holographic data in end-to-end application chains. Currently, the JPEG Committee is defining the coding framework of the light field modality for which the signalling syntax will be specified in part 2 of the JPEG Pleno standard. In parallel, JPEG is reaching out to companies and research institutes that are active in the point cloud and holography arena and invites them to contribute to the standardization effort. JPEG is seeking for additional input both at the level of test data and quality assessment methodologies for this specific type of image modalities as technology that supports their generation, reconstruction and/or rendering.

 

JPEG XL

The JPEG Committee has launched a Next-Generation Image Compression Standardization activity, also referred to as JPEG XL. This activity aims to develop a standard for image compression that offers substantially better compression efficiency than existing image formats (e.g. >60% over JPEG-1), along with features desirable for web distribution and efficient compression of high-quality images.

The JPEG Committee intends to issue a final Call for Proposals (CfP) following its 78th meeting (January 2018), with the objective of seeking technologies that fulfill the objectives and scope of the Next-Generation Image Compression Standardization activity.

A draft Call for Proposals, with all related info, has been issued and can be found in JPEG website. Comments are welcome and should be submitted as specified in the document.

To stay posted on the action plan for JPEG XL, please regularly consult our website at jpeg.org and/or subscribe to our e-mail reflector. You will receive information to confirm your subscription, and upon the acceptance of the moderator will be included in the mailing-list.

 

JPEG Reference Software

Along with its celebration of the 25th anniversary of the commonly known JPEG still image compression specifications, The JPEG Committee has launched an activity to fill a long-known gap in this important image coding standard, namely the definition of a JPEG reference software. For its 77th meeting, The JPEG Committee collected submissions for a reference software that were evaluated for suitability, and started now the standardization process of such software on the basis of submissions received.


IMG_1670r050

JPEG 25th anniversary of the first JPEG standard

The JPEG Committee had a 25th anniversary celebration of its first standard in Macau specifically organized to honour past committee members from Asia, and was proud to award Takao Omachi for his contributions to the first JPEG standard, Fumitaka Ono for his long lasting contributions to JBIG and JPEG standards, and Daniel Lee for contributions to JPEG family of standards and long lasting services as Convenor of the JPEG Committee. The celebrations of the anniversary of this successful standard that is still growing in its use after 25th years will have a third and final event during the 79th JPEG meeting planned in La Jolla, CA, USA.

JPEG77annivers25

 

Final Quote

“JPEG is committed to design of specifications that ensure privacy and other security and protection solutions across the entire JPEG family of standards” said Prof. Touradj Ebrahimi, the Convener of the JPEG committee. 

 

About JPEG

The Joint Photographic Experts Group (JPEG) is a Working Group of ISO/IEC, the International Organisation for Standardization / International Electrotechnical Commission, (ISO/IEC JTC 1/SC 29/WG 1) and of the International Telecommunication Union (ITU-T SG16), responsible for the popular JBIG, JPEG, JPEG 2000, JPEG XR, JPSearch and more recently, the JPEG XT, JPEG XS, JPEG Systems and JPEG Pleno families of imaging standards.

The JPEG group meets nominally three times a year, in Europe, North America and Asia. The latest 75th    meeting was held on March 26-31, 2017, in Sydney, Australia. The next (76th) JPEG Meeting will be held on July 15-21, 2017, in Torino, Italy.

More information about JPEG and its work is available at www.jpeg.org or by contacting Antonio Pinheiro and Frederik Temmermans of the JPEG Communication Subgroup at pr@jpeg.org.

If you would like to stay posted on JPEG activities, please subscribe to the jpeg-news mailing list on https://listserv.uni-stuttgart.de/mailman/listinfo/jpeg-news.  Moreover, you can follow JPEG twitter account on http://twitter.com/WG1JPEG

Future JPEG meetings are planned as follows:

  • No 78, Rio de Janeiro, Brazil, January 27 to February 2, 2018
  • No 79, La Jolla (San Diego), CA, USA, April 9 to 15, 2018
  • No 80, Berlin, Germany, July 7 to 13, 2018

 

 

How Do Ideas Flow around SIGMM Conferences?

 

The ACM Multimedia conference just celebrated its quarter century in October 2017. This is a great opportunity to reflect on the intellectual influence of the conference, and the SIGMM community in general.

The progress on big scholarly data allows us to make this task analytical. I download a data dump from  Microsoft Academic Graph (MAG) in February 2016. I find all papers from ACM Multimedia (MM), the SIGMM flagship conference — there are 4,346 publication entries from 1993 to 2015. I then search the entire MAG for: (1) any paper that appears in the reference list of these MM papers – 35,829 entries across 1,560 publication venues (including both journals and conferences), with an average of 8.24 per paper; (2) any paper that cites any of these MM papers – 46826 citations from 1694 publication venues, with an average of 10.77 citations per paper.

This data allows us to profile the incoming (references) and outgoing (citations) influence in the community in detail. In this article, we highlight two questions below.

Where are the intellectual influences of the SIGMM community coming from, and going to?

If you have been publishing in, and going to SIGMM conference(s) for a while, you may wonder where the ideas presented today would have its influence after 5, 10, 20 years? You may also wonder if the ideas cross over to other fields and disciplines, and which stay and flourish within the SIGMM community. You may also wonder whether the influence flow has changed since you entered the community, 3, 5, 10, or 20+ years ago.

If you are new to SIGMM, you may wonder what this community’s intellectual heritage is. For new students or researchers who recently entered this area, you may wonder what other publication venues are you likely to find work relevant to multimedia.

Figure 1. The citation flow for ACM Multimedia (1993-2015). Summary of incoming vs outgoing citations to the top 25 venues in either direction. Node colors: ratio of citations (outgoing ideas, red) vs references (incoming ideas, blue). Node sizes: amount of total citation+references in either direction. Thickness of blue edges are scaled by the number of references going to a given venue; thickness of red edges are scaled by the number of citations coming from a given venue. Nodes are sorted left-to-right by the ratio of incoming vs outgoing citations to this conference.

Figure 1. The citation flow for ACM Multimedia (1993-2015). Summary of incoming vs outgoing citations to the top 25 venues in either direction. Node colors: ratio of citations (outgoing ideas, red) vs references (incoming ideas, blue). Node sizes: amount of total citation+references in either direction. Thickness of blue edges are scaled by the number of references going to a given venue; thickness of red edges are scaled by the number of citations coming from a given venue. Nodes are sorted left-to-right by the ratio of incoming vs outgoing citations to this conference.

A summary of this information is found in the “citation flower” graph above, summarising the incoming and outgoing influence since the inception of ACM MM (1993-2015).

On the right of the “citation flower” we can see venues that have had more influence in MM than otherwise, these include computer vision and pattern recognition (CVPR, ICCV, ECCV, T-PAMI, IJCV), machine learning (NIPS, JMLR, ICML), networking and systems (INFOCOM), information retrieval (SIGIR), human-computer interaction (CHI) as well as related journals (IEEE Multimedia). The diversity of incoming influence is part of the SIGMM identity, as the community has always been a place where ideas from disparate areas meet and generate interesting solutions to problems as well as generating new challenges. As indicated by the break down over time (on a separate page), the incoming influence of CVPR is increasing, and that of IEEE Trans. Circuits Systems on Video Technology is decreasing — this is consistent with video encoding technology maturing over the last two decades, and computer vision being fast-evolving currently.

On the left of the “citation flower”, we can see that ACM MM has been a major influencer for a variety of multimedia venues — from conferences (ICME, MIR, ICMR, CIVR) to journals (Multimedia Tools and Applications, IEEE Trans. Multimedia), to journals in related areas (IEEE Trans. On Knowledge Discovery and Engineering).

How many papers are remembered in the collective memory of the academic community and for how long?

Or, as a heated post-conference beer conversation may put it: are 80% of the papers forgotten in 2 years? Spoiler alert: no, for most conferences we looked at; but about 20% tend not be cited at all.

Figure 2. Fraction of ACM MM papers that are cited at least once more than X years after they are published, with a linear regression overlay.

Figure 2. Fraction of ACM MM papers that are cited at least once more than X years after they are published, with a linear regression overlay.

In Figure 2, we see a typical linear decline of the fraction of papers being cited. For example, 53% of papers have at least one citation after being published for 10 years. There are multiple factors that affect the shape of this citation survival graph, such as the size of this research community, the turnover rate of ideas (fast-moving or slow-moving), the perceived quality of publications, and others. See here for a number of different survival curves in different research communities.

What about the newer, and more specialised SIGMM conferences?

Figure 3 and Figure 4 show the citation flowers for ICMR and MMSys, both conferences have had five years of publication data in MAG. We can see that both conferences are well-embedded among the SIGMM and related venues (ACM Multimedia, IEEE Trans. Multimedia), both have strong influence from the computer vision community including T-PAMI, CVPR and ICCV. The sub-community specific influences are coming from WWW, ICML NIPS for ICMR; and INFOCOM, SIGMETRICS, SIGMAR for MMSys. In terms of out-going influence, MMSys influences venues in networking (ICC, CoNEXT), and ICMR influences Information Science and MMSys.

Figure 3. The citation flow for ICMR (2011-2015). See Figure 1 caption for the meaning of node/edge colors and sizes.

Figure 3. The citation flow for ICMR (2011-2015). See Figure 1 caption for the meaning of node/edge colors and sizes.

Figure 4. The citation flow for MMSys (2011-2015). See Figure 1 caption for the meaning of node/edge colors and sizes.

Figure 4. The citation flow for MMSys (2011-2015). See Figure 1 caption for the meaning of node/edge colors and sizes.

Overall, this case study shows the truly multi-disciplinary nature of SIGMM, the community should continue the tradition of fusing ideas and strive to increase its influence in other communities.  

I hope you find these analyzes and observations somewhat useful, and I would love to hear comments and suggestions from the community.  Of course, the data is not perfect, and there is a lot more to do. The project overview page [1] contains details about data processing and several known issues, software for this analysis and visualisation are also released publicly [2].

Acknowledgements

I thank Alan Smeaton and Pablo Cesar for encouraging this post and many helpful editing suggestions. I also thank Microsoft Academic for making data available.

References

[1] Visualizing Citation Patterns of Computer Science Conferences, Lexing Xie, Aug 2016,  http://cm.cecs.anu.edu.au/post/citation_vis/

[2] Repository for analyzing citation flow https://github.com/lexingxie/academic-graph

 

Practical Guide to Using the YFCC100M and MMCOMMONS on a Budget

 

The Yahoo-Flickr Creative Commons 100 Million (YFCC100M), the largest freely usable multimedia dataset to have been released so far, is widely used by students, researchers and engineers on topics in multimedia that range from computer vision to machine learning. However, its sheer volume, one of the traits that make the dataset unique and valuable, can pose a barrier to those who do not have access to powerful computing resources. In this article, we introduce useful information and tools to boost the usability and accessibility of the YFCC100M, including the supplemental material provided by the Multimedia Commons (MMCOMMONS) community. In particular, we provide a practical guide on how to set up a feasible and cost effective research and development environment locally or in the cloud that can access the data without having to download it first.

YFCC100M: The Largest Multimodal Public Multimedia Dataset

Datasets are unarguably one of the most important components of multimedia research. In recent years there was a growing demand for a dataset that was not specifically biased or targeted towards certain topics, sufficiently large, truly multimodal, and freely usable without licensing issues.

The YFCC100M dataset was created to meet these needs and overcome many of the issues affecting existing multimedia datasets. It is, so far, the largest publicly and freely available multimedia collection of metadata representing about 99.2 million photos and 0.8 million videos, all of which were uploaded to Flickr between 2004 and 2014. Metadata included in the dataset are, for example, title, description, tags, geo-tag, uploader information, capture device information, URL to the original item. Additional information was later released in the form of expansion packs to supplement the dataset, namely autotags (presence of visual concepts, such as people, animals, objects, events, architecture, and scenery), Exif metadata, and human-readable place labels. All items in the dataset were published under one of the Creative Commons commercial or noncommercial licenses, whereby approximately 31.8% of the dataset is marked for commercial use and 17.3% has the most liberal license that only requires attribution to the photographer. For academic purposes, the entire dataset can be used freely, which enables fair comparisons and reproducibility of published research works.

Two articles from the people who created the dataset, YFCC100M: The New Data in Multimedia Research and Ins and Outs of the YFCC100M give more detail about the the motivation, collection process, and interesting characteristics and statistics about the dataset. Since its initial release in 2014, the YFCC100M quickly gained popularity and is widely used in the research community. As of September 2017, the dataset had been requested over 1400 times and cited over 300 times in research publications with topics ranging in multimedia from computer vision to machine learning. Specific topics include, but are not limited to, image and video search, tag prediction, captioning, learning word embeddings, travel routing, event detection, and geolocation prediction. Demos that use the YFCC100M can be found here.

Figure 1. Overview diagram of YFCC100M and Multimedia Commons.

Figure 1. Overview diagram of YFCC100M and Multimedia Commons.


MMCOMMONS: Making YFCC100M More Useful and Accessible

Out of the many things that the YFCC100M offers, its sheer volume is what makes it especially valuable, but it is also what makes the dataset not so trivial to work with. The metadata alone spans 100 million lines of text and is 45GB in size, not including the expansion packs. To work with the images and/or videos of YFCC100M, they need to be downloaded first using the individual URLs contained in the metadata. Aside from the time required to download all 100 million items, which would further occupy 18TB of disk space, the main problem is that a growing number of images and videos is becoming unavailable due to the natural lifecycle of digital items, where people occasionally delete what they have shared online. In addition, the time alone to process and analyze images and videos is generally infeasible for students and scientists in small research groups who do not have access to high performance computing resources.

These issues were noted upon the creation of the dataset and the MMCOMMONS community was formed to coordinate efforts for making the YFCC100M more useful and accessible to all, and to persist the contents of the dataset over time. To that end, MMCOMMONS provides an online repository that holds supplemental material to the dataset, which can be mounted and used to directly process the dataset in the cloud. The images and videos included in the YFCC100M can be accessed and even downloaded freely from an AWS S3 bucket, which was made possible courtesy of the Amazon Public Dataset program. Note that a tiny percentage of images and videos are missing from the bucket, as they already had disappeared when organizers started the download process right after the YFCC100M was published. This notwithstanding, the images and videos hosted in the bucket still serve as a useful snapshot that researchers can use to ensure proper reproduction of and comparison with their work. Also included in the Multimedia Commons repository are visual and aural features extracted from the image and video content. The MMCOMMONS website provides a detailed description of conventional features and deep features, which include HybridNet, VGG and VLAD. These CNN features can be a good starting point for those who would like to jump right into using the dataset for their research or application.

The Multimedia Commons has been supporting multimedia researchers by generating annotations (see the YLI Media Event Detection and MediaEval Placing tasks), developing tools, as well as organizing competitions and workshops for ideas exchange and collaboration.

Setting up a Research Environment for YFCC100M and MMCOMMONS

Even with pre-extracted features available, to do meaningful research one still needs a lot of computing power to process the large amount of YFCC100M and MMCOMMONS data. We would like to lower the barrier of entry for students and scientists who don’t have access to dedicated high-performance resources. In the following we describe how one can easily set up a research environment for handling the large collection. We introduce how Apache MXNet, Amazon EC2 Spot Instance and AWS S3 can be used to create a research development environment that can handle the data in a cost-efficient way, as well as other ways to use it more efficiently.

1) Use a subset of dataset

It is not necessary to work with the entire dataset just because you can. Depending on the use case, it may make more sense to use a well-chosen subset. For instance, the YLI-GEO and YLI-MED subsets released by the MMCOMMONS can be useful for geolocation and multimedia event detection tasks, respectively. For other needs, the data can be filtered to generate a customized subset.

The YFCC100M Dataset Browser is a web-based tool you can use to search the dataset by keyword. It provides an interactive visualization with statistics that helps to better understand the search results. You can generate a list file (.csv) of the items that match the search query, which you can then use to fetch the images and/or videos afterwards. The limitations of this browser are that it only supports keyword search on the tags and that it only accepts ASCII text as valid input, as opposed to UNICODE for queries using non-Roman characters. Also, queries can take up to a few seconds to return results.

A more flexible way to search the collection with lower latency is to set up your own Apache Solr server and indexing (a subset of) the metadata. For instance, the autotags metadata can be indexed to search for images that have visual concepts of interest. A step-by-step guide to setting up a Solr server environment with the dataset can be found here. You can write Solr queries in most programming languages by using one of the Solr wrappers.

2) Work directly with data from AWS S3

Apache MXNet, a deep learning framework you can run locally on your workstation, allows training with S3 data. Most training and inference modules in MXNet accept data iterators that can read data from and write data to a local drive as well as AWS S3.

The MMCOMMONS provides a data iterator for YFCC100M images, stored as a RecordIO file, so you can process the images in the cloud without ever having to download them to your computer. If you are working with a subset that is sufficiently large, you can further filter it to generate a custom RecordIO file that suits your needs. Since the images stored in the RecordIO file are already resized and saved compactly, generating a RecordIO from an existing RecordIO file by filtering on-the-fly is more time and space efficient than downloading all images first and creating a RecordIO file from scratch. However, if you are using a subset that is relatively small, it is recommended to download just those images you need from S3 and then create a RecordIO file locally, as that will considerably speed up processing the data.

While one would generally set up Apache MXNet to run locally, you should note that the I/O latency of using S3 data can be greatly reduced if you would set it up to run on an Amazon EC2 instance in the same region as where the S3 data is stored (namely, us-west-2, Oregon), see Figure 2. Instructions for setting up a deep learning environment on Amazon EC2 can be found here.

Figure 2. The diagram shows a cost-efficient setup with a Spot Instance in the same region (us-west-2) as the S3 buckets that houses YFCC100M and MMCOMMONS images/videos and RecordIO files. Data in the S3 buckets can be accessed in a same way from researcher’s computer; the only downside with this is the longer latency for retrieving data from S3. Note that there are several Yahoo! Webscope buckets (I3set1-I3setN) that hold a copy of the YFCC100M, but you only can access it using the path you were assigned after requesting the dataset.

Figure 2. The diagram shows a cost-efficient setup with a Spot Instance in the same region (us-west-2) as the S3 buckets that houses YFCC100M and MMCOMMONS images/videos and RecordIO files. Data in the S3 buckets can be accessed in a same way from researcher’s computer; the only downside with this is the longer latency for retrieving data from S3. Note that there are several Yahoo! Webscope buckets (I3set1-I3setN) that hold a copy of the YFCC100M, but you only can access it using the path you were assigned after requesting the dataset.


3) Save cost by using Amazon EC2 Spot Instances

Cloud computing has become considerably cheaper in recent years. However, the price for using a GPU instance to process the YFCC100M and MMCOMMONS can still be quite expensive. For instance, Amazon EC2’s on-demand p2.xlarge instance (with a NVIDIA TESLA K80 GPU and 12GB RAM) costs 0.9 USD per hour in the us-west-2 region. This would cost approximately $650 (€540) a month if used full-time.

One way to reduce the cost is to set up a persistent Spot Instance environment. If you request an EC2 Spot Instance, you can use the instance as long as its market price is below your maximum bidding price. If the market price goes beyond your maximum bid, the instance gets terminated after a two minutes warning. To deal with such frequent interruptions it is important to store your intermediate results often to persistent storage space, such as AWS S3 or AWS EFS. The market price of the EC2 instance fluctuates, see Figure 3, so there is no guarantee as to how much you can save or how long you have to wait for your final results to be ready. But if you are willing to experiment with pricing, in our case we were able to reduce the costs by 75% during the period January-April 2017.

Figure 3. You can check the current and past market price of different EC2 instance types from the Spot Instance Pricing History panel.

Figure 3. You can check the current and past market price of different EC2 instance types from the Spot Instance Pricing History panel.


4) Apply for academic AWS credits

Consider applying for the AWS Cloud Credits for Research Program to receive AWS credits to run your research in the cloud. In fact, thanks to the grant we were able to release LocationNet, a pre-trained geolocation model that used all geotagged YFCC100M images.

Conclusion

YFCC100M is at the moment the largest multimedia dataset released to the public, but its sheer volume poses a high barrier to actually use it. To boost the usability and accessibility of the dataset, the MMCOMMONS community provides an additional AWS S3 repository with tools, features, and annotations to facilitate creating a feasible research development environment for those with fewer resources at their disposal. In this column, we provided a guide on how a subset of the dataset can be created for specific scenarios, how the hosted YFCC100M and MMCOMMONS data on S3 can be used directly for training a model with Apache MXNet, and finally how Spot Instances and academic AWS credits can make running experiments cheaper or even free.

Join the Multimedia Commons Community

Please let us know if you’re interested in contributing to the MMCOMMONS. This is a collaborative effort among research groups at several institutions (see below). We welcome contributions of annotations, features, and tools around the YFCC100M dataset, and may potentially be able to host them on AWS. What are you working on?

See the this page for information about how to help out.

Acknowledgements:

This dataset would not have been possible without the effort of many people, especially those at Yahoo, Lawrence Livermore National Laboratory, International Computer Science Institute, Amazon, ISTI-CNR, and ITI-CERTH.

Opinion Column: Tracks, Reviews and Preliminary Works

 

Welcome to the first edition of the  SIGMM Community Discussion Column!

As promised in our introductory edition, this column will report highlights and lowlights of online discussion threads among the members of the Multimedia community (see our Facebook MM Community Discussion group).

After an initial poll, this quarter the community chose to discuss about the reviewing process and structure of the SIGMM-sponsored conferences. We organized the discussion around 3 main sub-topics: importance of tracks, structure of reviewing process, and value of preliminary works.  We collected more than 50 contributions from the members of the Facebook MM Community Discussion group. Therefore, the following synthesis represents only these contributions. We encourage everyone to participate in the upcoming discussions, so that this column becomes more and more representative of the entire community.

In a nutshell, the community agreed that: we need more transparent communication and homogeneous rules across thematic areas; we need more useful rebuttals; there is no need for conflict of interest tracks; large conferences must protect preliminary and emergent research works. Solutions were suggested to improve these points.

Communication, Coordination and Transparency. All participants agreed that more vertical (from chairs to authors) and horizontal (in between area chairs or technical program chairs) communication could improve the quality of both papers and reviews in SIGMM-sponsored conferences. For example, lack of transparency and communication regarding procedures might deal to uneven rules and deadlines across tracks.

Tracks. How should conference thematic areas be coordinated? The community’s view can be summarized into 3 main perspectives:

  1. Rule Homogeneity.  The majority of participants agreed that big conferences should have thematic areas, and that tracks should be jointly coordinated by a technical program committee. Tracks are extremely important, but in order for the conference to give an individual, unified message, as opposed to “multi-conferences”, the same review and selection process should apply to all tracks. Moreover, hosting a face to face global TPC meetings is key for a solid, homogeneous conference program.
  2. Non-uniform Selection Process to Help Emerging Areas. A substantial number of participants pointed out that one role of the track system is to help emerging subcommunities: thematic areas ensure a balanced programme with representation from less explored topics (for example, music retrieval or arts and multimedia). Under this perspective, while the reviewing process should be the same for all tracks, the selection phase could be non-uniform. “Mathematically applying a percentage rate per area” does not help selecting the actually high-quality papers across tracks: with a uniformly applied low acceptance rate rule, minor tracks might have one or two papers accepted only, despite the high quality of the submissions.
  3. Abolish Tracks. A minority of participants agreed that, similar to big conferences such as CVPR, tracks should be completely abolished. A rigid track-based structure makes it somehow difficult for authors to choose the right track where to submit; moreover, reviewers and area chairs are often experts in more than one area. These issues could be addressed by a flexible structure where papers are assigned to area chairs and reviewers based on the topic.

Reviewing process  How do we want the reviewing process to be? Here is the view of the community on four main points: rebuttal, reviewing instructions, conflict of interest, and reviewers assignment.

  1. Rebuttal: important, but we need to increase impact. The majority of participants agreed that rebuttal is helpful to increase review quality and to grant authors more room for discussion. However, it was pointed out that sometimes the rebuttal process is slightly overlooked by both reviewers and area chairs, thus decreasing the potential impact of the rebuttal phase. It was suggested that, in order to raise awareness on rebuttal’s value, SIGMM could publish statistics on the number of reviewers who changed their opinion after rebuttal. Moreover, proposed improvements on the rebuttal process included: (1) more time allocated for reviewers to have a discussion regarding the quality of the papers; (2) a post-rebuttal feedback where reviewers respond to authors’ rebuttal (to promote reviewers-authors discussion and increase awareness on both sides) and (3) a closer supervision of the area chairs.
  2. Reviewing Guidelines: complex, but they might help preliminary works. Do reviewing guidelines help reviewers writing better reviews? For most participants, giving instructions to reviewers appear to be somehow impractical, as reviewers do not necessarily read or follow the guidelines. A more feasible solution is to insert weak instructions through specific questions in the reviewing form (e.g. “could you rate the novelty of the paper?”). However, it was also pointed out that written rules could help area chairs justify a rejection of a bad review.  Also, although reviewing instructions might change from track to track, general written rules regarding “what is a good paper” could help the reviewers understand what to accept. For example, clarification is needed on the depth of acceptable research works, and on how preliminary works should be evaluated, given the absence of a short paper track.
  3. Brave New Idea Track: ensuring scientific advancement. Few participants expressed their opinion regarding this track hosting novel, controversial research ideas. They remarked the importance of such a track to ensure scientific advancement, and it was suggested that, in the future, this track could host exploratory works (former short papers), as preliminary research works  are crucial to make a conference exciting.
  4. Conflict of Interest (COI) Track: perhaps we should abolish it. Participants almost unanimously agreed that a COI track is needed only when the conference management system is not able to handle conflicts on its own. It was suggested that, if that is not the case, a COI track might actually have a antithetical effect (is the COI track acceptance rate for ACM MM higher this year?).
  5. Choosing Reviewers: A Semi-Automated Process. The aim of the reviewers assignment procedure is to give the right papers to the right reviewers. How to make this procedure successful? Some participants supported the “fully manual assignment” option, where area chairs directly nominate reviewers for their own track. Others proposed to have a “fully automatic assignment”, based on an automated matching system such as the Toronto Paper Matching System (TPMS). A discussion followed, and eventually most participants agreed on a semi-automated process, having first the TPMS surfacing a relevant pool of reviewers (independent of tracks) and then area chairs manually intervening. Manual inspection of area chairs is crucial for inter-disciplinary papers needing reviews from experts from different areas.

Finally, during the discussion, few observations and questions regarding the future of the community arouse. For example: how to steer the direction of the conference, given the increase in number of AI-related papers? How to support diversity of topics, and encourage papers in novel fields (e.g. arts and music) beyond the legacy (traditional multimedia topics)? Given the wide interest on such issues, we will include these discussion topics in our next pre-discussion poll. To participate in the next discussion, please visit and subscribe to the Facebook MM Community Discussion group, and raise your voice!

Xavier Alameda-Pineda and Miriam Redi.

Report from ACM MMSys 2017

–A report from Christian Timmerer, AAU/Bitmovin Austria

The ACM Multimedia Systems Conference (MMSys) provides a forum for researchers to present and share their latest research findings in multimedia systems. It is a unique event targeting “multimedia systems” from various angles and views across all domains instead of focusing on a specific aspect or data type. ACM MMSys’17 was held in Taipei, Taiwan in June 20-23, 2017.

MMSys is a single-track conference which hosts also a series of workshops, namely NOSSDAV, MMVE, and NetGames. Since 2016, it kicks off with overview talks and 2017 we’ve seen the following talks: “Geometric representations of 3D scenes” by Geraldine Morin; “Towards Understanding Truly Immersive Multimedia Experiences” by Niall Murray; “Rate Control In The Age Of Vision” by Ketan Mayer-Patel; “Humans, computers, delays and the joys of interaction” by Ragnhild Eg; “Context-aware, perception-guided workload characterization and resource scheduling on mobile phones for interactive applications” by Chung-Ta King and Chun-Han Lin.

Additionally, industry talks have been introduced: “Virtual Reality – The New Era of Future World” by WeiGing Ngang; “The innovation and challenge of Interactive streaming technology” by Wesley Kuo; “What challenges are we facing after Netflix revolutionized TV watching?” by Shuen-Huei Guan; “The overview of app streaming technology” by Sam Ding; “Semantic Awareness in 360 Streaming” by Shannon Chen; “On the frontiers of Video SaaS” by Sega Cheng.

An interesting set of keynotes presented different aspects related multimedia systems and its co-located workshops:

  • Henry Fuchs, The AR/VR Renaissance: opportunities, pitfalls, and remaining problems
  • Julien Lai, Towards Large-scale Deployment of Intelligent Video Analytics Systems
  • Dah Ming Chiu, Smart Streaming of Panoramic Video
  • Bo Li, When Computation Meets Communication: The Case for Scheduling Resources in the Cloud
  • Polly Huang, Measuring Subjective QoE for Interactive System Design in the Mobile Era – Lessons Learned Studying Skype Calls

IMG_4405The program included a diverse set of topics such as immersive experiences in AR and VR, network optimization and delivery, multisensory experiences, processing, rendering, interaction, cloud-based multimedia, IoT connectivity, infrastructure, media streaming, and security. A vital aspect of MMSys is dedicated sessions for showcasing latest developments in the area of multimedia systems and presenting datasets, which is important towards enabling reproducibility and sustainability in multimedia systems research.

The social events were a perfect venue for networking and in-depth discussion how to advance the state of the art. A welcome reception was held at “LE BLE D’OR (Miramar)”, the conference banquet at the Taipei World Trade Center Club, and finally a tour to the Shilin Night Market was organized.

ACM MMSys 2917 issued the following awards:

  • The Best Paper Award  goes to “A Scalable and Privacy-Aware IoT Service for Live Video Analytics” by Junjue Wang (Carnegie Mellon University), Brandon Amos (Carnegie Mellon University), Anupam Das (Carnegie Mellon University), Padmanabhan Pillai (Intel Labs), Norman Sadeh (Carnegie Mellon University), and Mahadev Satyanarayanan (Carnegie Mellon University).
  • The Best Student Paper Award goes to “A Measurement Study of Oculus 360 Degree Video Streaming” by Chao Zhou (SUNY Binghamton), Zhenhua Li (Tsinghua University), and Yao Liu (SUNY Binghamton).
  • The NOSSDAV’17 Best Paper Award goes to “A Comparative Case Study of HTTP Adaptive Streaming Algorithms in Mobile Networks” by Theodoros Karagkioules (Huawei Technologies France/Telecom ParisTech), Cyril Concolato (Telecom ParisTech), Dimitrios Tsilimantos (Huawei Technologies France), Stefan Valentin (Huawei Technologies France).

Excellence in DASH award sponsored by the DASH-IF 

  • 1st place: “SAP: Stall-Aware Pacing for Improved DASH Video Experience in Cellular Networks” by Ahmed Zahran (University College Cork), Jason J. Quinlan (University College Cork), K. K. Ramakrishnan (University of California, Riverside), and Cormac J. Sreenan (University College Cork)
  • 2nd place: “Improving Video Quality in Crowded Networks Using a DANE” by Jan Willem Kleinrouweler, Britta Meixner and Pablo Cesar (Centrum Wiskunde & Informatica)
  • 3rd place: “Towards Bandwidth Efficient Adaptive Streaming of Omnidirectional Video over HTTP” by Mario Graf (Bitmovin Inc.), Christian Timmerer (Alpen-Adria-Universität Klagenfurt / Bitmovin Inc.), and Christopher Mueller (Bitmovin Inc.)

Finally, student travel grants awards have been sponsored by SIGMM. All details including nice pictures can be found here.


ACM MMSys 2018 will be held in Amsterdam, The Netherlands, June 12 – 15, 2018 and includes the following tracks:

  • Research track: Submission deadline on November 30, 2017
  • Demo track: Submission deadline on February 25, 2018
  • Open Dataset & Software Track: Submission deadline on February 25, 2018

MMSys’18 co-locates the following workshops (with submission deadline on March 1, 2018):

  • MMVE2018: 10th International Workshop on Immersive Mixed and Virtual Environment Systems,
  • NetGames2018: 16th Annual Worksop on Network and Systems Support for Games,
  • NOSSDAV2018: 28th ACM SIGMM Workshop on Network and Operating Systems Support for Digital Audio and Video,
  • PV2018: 23rd Packet Video Workshop

MMSys’18 includes the following special sessions (submission deadline on December 15, 2017):

JPEG Column: 76th JPEG Meeting in Turin, Italy

The 76th JPEG meeting was held at Politecnico di Torino, Turin, Italy, from 15 to 21 of July. The current standardisation activities have been complemented by the 25th anniversary of the first JPEG standard. Simultaneously, JPEG pursues the development of different standardised solutions to meet the current challenges on imaging technology, namely on emerging new applications and on low complexity image coding. The 76th JPEG meeting featured mainly the following highlights:

  • JPEG 25th anniversary of the first JPEG standard
  • High Throughput JPEG 2000
  • JPEG Pleno
  • JPEG XL
  • JPEG XS
  • JPEG Reference Software

In the following an overview of the main JPEG activities at the 76th meeting is given.

JPEG 25th anniversary of the first JPEG standard – JPEG is proud tocelebrate the 25th anniversary of its first standard. This very successful standard won an Emmy award in 1995-96 and its usage is still rising, reaching in 2015 the impressive daily rate of over 3 billion images exchanged in just a few social networks. During the celebration, a number of early members of the committee were awarded for their contributions to this standard, namely Alain Léger, Birger Niss, Jorgen Vaaben and István Sebestyén. Also Richard Clark for his long lasting contribution as JPEG Webmaster and contributions to many JPEG standards was also rewarded during the same ceremony. The celebration will continue at the next 77th JPEG meeting that will be held in Macau, China from 21 to 27, October, 2017.

IMG_1113 2

High Throughput JPEG 2000 – The JPEG committee is continuing its work towards the creation of a new Part 15 to the JPEG 2000 suite of standards, known as High Throughput JPEG 2000 (HTJ2K). In a significant milestone, the JPEG Committee has released a Call for Proposals that invites technical contributions to the HTJ2K activity. The deadline for an expression of interest is 1 October 2017, as detailed in the Call for Proposals, which is publicly available on the JPEG website at https://jpeg.org/jpeg2000/htj2k.html.

The objective of the HTJ2K activity is to identify and standardize an alternate block coding algorithm that can be used as a drop-in replacement for the block coding defined in JPEG 2000 Part-1. Based on existing evidence, it is believed that significant increases in encoding and decoding throughput are possible on modern software platforms, subject to small sacrifices in coding efficiency. An important focus of this activity is interoperability with existing systems and content libraries. To ensure this, the alternate block coding algorithm supports mathematically lossless transcoding between HTJ2K and JPEG 2000 Part-1 codestreams at the code-block level.

JPEG Pleno – The JPEG committee intends to provide a standard framework to facilitate capture, representation and exchange of omnidirectional, depth-enhanced, point cloud, light field, and holographic imaging modalities. JPEG Pleno aims at defining new tools for improved compression while providing advanced functionalities at the system level. Moreover, it targets to support data and metadata manipulation, editing, random access and interaction, protection of privacy and ownership rights as well as other security mechanisms. At the 76th JPEG meeting in Turin, Italy, responses to the call for proposals for JPEG Pleno light field image coding were evaluated using subjective and objective evaluation metrics, and a Generic JPEG Pleno Light Field Architecture was created. The JPEG committee defined three initial core experiments to be performed before the 77thJPEG meeting in Macau, China. Interested parties are invited to join these core experiments and JPEG Pleno standardization.

JPEG XL – The JPEG Committee is working on a new activity, known as Next generation Image Format, which aims to develop an image compression format that demonstrates higher compression efficiency at equivalent subjective quality of currently available formats and that supports features for both low-end and high-end use cases.  On the low end, the new format addresses image-rich user interfaces and web pages over bandwidth-constrained connections. On the high end, it targets efficient compression for high-quality images, including high bit depth, wide color gamut and high dynamic range imagery. A draft Call for Proposals (CfP) on JPEG XL has been issued for public comment, and is available on the JPEG website.

JPEG XS – This project aims at the standardization of a visually lossless low-latency lightweight compression scheme that can be used as a mezzanine codec for the broadcast industry and Pro-AV markets. Targeted use cases are professional video links, IP transport, Ethernet transport, real-time video storage, video memory buffers, and omnidirectional video capture and rendering. After a Call for Proposal and the assessment of the submitted technologies, a test model for the upcoming JPEG XS standard was created. Several rounds of Core Experiments have allowed to further improving the Core Coding System, the last one being reviewed during this 76th JPEG meeting in Torino. More core experiments are on their way, including subjective assessments. JPEG committee therefore invites interested parties – in particular coding experts, codec providers, system integrators and potential users of the foreseen solutions – to contribute to the further specification process. Publication of the International Standard is expected for Q3 2018.

JPEG Reference Software – Together with the celebration of 25th anniversary of the first JPEG Standard, the committee continued with its important activities around the omnipresent JPEG image format; while all newer JPEG standards define a reference software guiding users in interpreting and helping them in implementing a given standard, no such references exist for the most popular image format of the Internet age. The JPEG committee therefore issued a call for proposals https://jpeg.org/items/20170728_cfp_jpeg_reference_software.html asking interested parties to participate in the submission and selection of valuable and stable implementations of JPEG (formally, Rec. ITU-T T.81 | ISO/IEC 10918-1).

 

Final Quote

“The experience shared by developers of the first JPEG standard during celebration was an inspiring moment that will guide us to further the ongoing developments of standards responding to new challenges in imaging applications. said Prof. Touradj Ebrahimi, the Convener of the JPEG committee.

About JPEG

JPEG-signatureThe Joint Photographic Experts Group (JPEG) is a Working Group of ISO/IEC, the International Organisation for Standardization / International Electrotechnical Commission, (ISO/IEC JTC 1/SC 29/WG 1) and of the Interna
tional Telecommunication Union (ITU-T SG16), responsible for the popular JBIG, JPEG, JPEG 2000, JPEG XR, JPSearch and more recently, the JPEG XT, JPEG XS, JPEG Systems and JPEG Pleno families of imaging standards.

The JPEG group meets nominally three times a year, in Europe, North America and Asia. The latest 75th    meeting was held on March 26-31, 2017, in Sydney, Australia. The next (76th) JPEG Meeting will be held on July 15-21, 2017, in Torino, Italy.

More information about JPEG and its work is available at www.jpeg.org or by contacting Antonio Pinheiro and Frederik Temmermans of the JPEG Communication Subgroup at pr@jpeg.org.

If you would like to stay posted on JPEG activities, please subscribe to the jpeg-news mailing list on https://listserv.uni-stuttgart.de/mailman/listinfo/jpeg-news.  Moreover, you can follow JPEG twitter account on http://twitter.com/WG1JPEG.

Future JPEG meetings are planned as follows:

  • No. 77, Macau, CN, 23 – 27 October 2017

 

An interview with Prof. Alan Smeaton

Prof. Alan Smeaton in 2017.

A young Alan Smeaton before the start of his career.

The young Alan Smeaton before the start of his career.

Please describe your journey into computing from your youth up to the present. What foundational lessons did you learn from this journey? Why were you initially attracted to multimedia?

I started a University course in Physics and Mathematics and in order to make up my credits I needed to add another subject so I chose Computer Science, which was then a brand new topic in the Science Faculty.  Maybe it was because the class sizes were small so the attention we got was great, or maybe I was drawn to the topic in some other way but I dropped the Physics and took the Computer Science modules instead and I never looked back.  I was fortunate in that my PhD supervisor was Keith van Rijsbergen who is one of the “fathers” of information retrieval and who had developed the probabilistic model of IR. Having him as my supervisor was the first lucky thing to have happened to me in my research. His approach was to let me make mistakes in my research, to go down cul-de-sacs and discover them myself, and as a result I emerged as a more rounded, streetwise researcher and I’ve tried to use the same philosophy with my own students.  

For many years after completing my PhD I was firmly in the information retrieval area. I hosted the ACM SIGIR Conference in Dublin in the mid 1990s and was Program Co-Chair in 2003, and workshops, tutorials, etc. chair in other years. My second lucky break in my research career happened in 1991 when Donna Harman of NIST asked me if I’d like to join the program committee of a new initiative she was forming called TREC, which was going to look at information retrieval on test collections of documents and queries but in a collaborative, shared framework.  I jumped at the opportunity and got really involved in TREC in those early years through the 1990s. In 2001 Donna asked me if I’d chair a new TREC track that she wanted to see happen, doing content analysis and search on digital video which was then emerging and in which our lab was establishing a reputation for novel research.  Two years later that TREC activity had grown so big it was spawned off as a separate activity and TRECVid was born, starting formally in 2003 and continuing each year since then. That’s my third lucky break.

Sometime in the early 2000s I went to my first ACM MULTMEDIA conference because of my leading of TRECVid, and I loved it. The topics, the openness, the collaborations, the workshops, the intersection of disciplines all appealed to me and I don’t think I’ve missed an ACM MULTIMEDIA Conference since then.

Talking about ACM MULTIMEDIA, this year emerged some critics that there was no female keynote speaker. What do you think about this and how do you see the role of women in research and especially in the field of multimedia?

The first I heard of this was when I saw it on the conference website and that is when I realised it and I don’t agree with it. I will be proposing several initiatives to the Executive Committee of SIGMM to improve the gender balance and diversity in our sponsored conferences, covering invited panel speakers, invited keynote speakers, raising the importance of the women’s lunch event at the ACM MULTIMEDIA conference starting with this year.  I will also propose including a role for a Diversity Chair in some of the SIGMM sponsored events.  I’ve learned a lot in a short period of time from colleagues in ACM SIGCHI whom I reached out to for advice, and I’ve looked at the practices and experiences of conferences like ACM CHI, ACM UIST, and others.  However these are just suggestions at the moment and need to be proposed and approved by the SIGMM Executive so I can’t say much more about them yet, but watch this space.

Tell us more about your vision and objectives behind your current roles? What do you hope to accomplish and how will you bring this about?

I hold a variety of roles in my Professional work. As a Professor and teacher I am responsible for delivering courses to first year first semester undergraduates which I love doing because these are the fresh-faced students just arriving at University. I also teach at advanced Masters level and that’s something else I love, albeit with different challenges. As a Board member of the Irish Research Council I help oversee the policies and procedures for Council’s funding of about 1,400 researchers from all disciplines in Ireland. I’m also on the inaugural Scientific Committee of COST, the EU funding agency which funds networking of researchers across more than 30 EU countries and further field. Each year COST funds networking activities for over 40,000 researchers across all disciplines, which is a phenomenal number and my role on the Scientific Committee is to oversee the policies and procedures and help select those areas (called Actions) that get funded.  

Apart from my own research team and working with them as part of the Insight Centre for Data Analytics, and the work I do each year in TRECVid, the other major responsibility I have is as Chair of ACM SIGMM, a role I took up in July 2017, just 2 months ago.  While I had a vision of what I believed should happen in SIGMM and I wrote some of this in my candidature statement (can be found at the bottom of the interview), since assuming the role and realising what SIGMM is like “from the inside” I am seeing that vision and objectives evolve as I learn more. Certainly there are some fundamentals like welcoming and supporting early career researchers, broadening our reach to new communities both geographical and in terms of research topics, ensuring our conferences maintain their very high standards, and being open to new initiatives and ideas, these fundamentals will remain as important.  We expect to announce a new annual conference in multimedia for Asia shortly and that will be added to the 4 existing annual events we run.   In addition I am realising that we need to increase our diversity, gender being one obvious instance of that but there are others.  Finally, I think we need to constantly monitor what is our identity as a community of researchers linked by the bond of working in Multimedia. As the area of Multimedia itself evolves, we have to lead and drive that evolution, and change with it.

I know that may not seem like a lot of aspiration without much detail but as I said earlier, that’s because I’m only in the role a couple of months and the details of these need to be worked out and agreed with the SIGMM Executive Committee, not just me alone, and that will happen over the next few months.

Prof. Alan Smeaton in 2017.

Prof. Alan Smeaton in 2017.

That multimedia evolves is an interesting statement. I often heard people discussing about the definition of multimedia research and they are quite diverse. What is your “current” definition of multimedia research?

The development of every technology has a similar pathway. Multimedia is not a single technology but a constellation of technologies but it has the same kind of pathway. It starts from a blue skies idea that somebody has, like lets put images and sound on computers, and then it becomes theoretical research perhaps involving modelling in some way. That then turns into basic research about the feasibility of the idea and gradually the research gets more and more practical. Somewhere along the way, not necessarily from the outset, applications of the technology are taken into consideration and that is important to sustain the research interest and funding. As applications for the technology start to roll out, this triggers a feedback loop with more and more interest directed back towards the theory and the modelling, improving the initial ideas and taking them further, pushing boundaries of the implementations and making the final applications more compelling, cheaper, faster, greater reach, more impact, etc.  Eventually, the technology may get overtaken by some new blue skies idea leading to some new theories and some new feasibilities and practical applications. Technology for personal transport is one such example with horse-drawn carriages leading to petrol-driven cars and as we are witnessing, into other forms of autonomous, electric-power vehicles.

Research into multimedia is in the mid-life stage of the cycle. We’re in that spiral where new foundational ideas, new theories, new models for those theories, new feasibility studies, new applications, and new impacts, are all valid areas to be working in, and so the long answer to your question about my definition of multimedia research is that it is all of the above.

At the conference people often talk about their experience that their research got criticized for being too applied which seems to be a general problem of multimedia hearing it from so many. Based on your experience in national and international funding panels it would be interesting hear your opinion about this issue and how researchers in the multimedia community could tackle it.

I’ve been there too, so I understand what they are talking about.  Within our field of multimedia we cover a broad church of research topics, application areas, theories and techniques and to say a piece of work is too applied is an inappropriate criterion for it not to be appreciated.  

“Too applied” should not be confused with research impact as research impact is something completely different.  Research impact refers to when our research contributes or generates some benefit outside of academic or research circles and starts to influence the economy or society or culture. That’s something we should all aspire to as members of our society and when it happens it is great. Yet not all research ideas will develop into technologies or implementations that have impact.  Funding agencies right across the world now like to include impact as part of their evaluation and assessment and researchers are now expected to include impact assessment as part of funding proposals.

I do have concerns that for really blue skies research the eventual impact cannot really be estimated. This is what we call high risk / high return and while some funding agencies like the European Research Council actively promote such high risk exploratory work, other agencies tend to go for the safer bet. Happily, we’re seeing more and more of the blue skies funding like the Australian Research Council’s and the Irish Research Council’s Laureate schemes

Can you profile your current research, its challenges, opportunities, and implications?

This is a difficult question for me to answer since the single most dominant characteristics of my research are that it is hugely varied and it is based on a large number of collaborations with researchers in diverse areas. I am not a solo researcher and while I respect and admire those who are, I am at the opposite end of that spectrum. I work with people.

For example today, as I write this interview, is been a busy day for me in terms of research.  I’ve done a bit of writing on a grant proposal I’m working on which proposes using data from a wearable electromyography coupled with other sensors, in determining the quality of a surgical procedure.  I’ve reviewed a report from a project I’m part of which uses low-grade virtual reality in a care home for people with dementia.  I’ve looked at some of the sample data we’ve just got where we’re applying our people-counting work to drone footage of crowds. I wrote a section of a paper describing our work on human-in-the-loop evaluation of video captioning and I met a Masters student who is doing work on propensity modelling for a large bank, and now at the end of the day I’m finishing this interview. That’s an atypical day for me but the range of topics is not unusual.  

What are the challenges and opportunities in this … well it is never difficult to get motivated because the variety of work makes it so interesting, so the challenge is in managing them so that they each get a decent slice of time and effort. Prioritisation of work tasks is a life skill which is best learned the hard way, it is something we can’t teach and while to some people it comes naturally for most of us it is something we need to be aware of.  So if I have a takeaway message for the young researcher it is this … always try to make your work interesting and to explore interesting things because then it is not a chore, it becomes a joy.

This was an very inspiring answer and I think described perfectly how diverse and interesting multimedia research is. Thinking about the list of your projects you describe it seems that all of them address societal important challenges (health care, security, etc.) How important do you think it is to address problems that are helpful for the society and do you think that more researchers in the field of multimedia should follow this path?

I didn’t deliberately set out to address societal challenges in my work and I don’t advocate that everyone should do so in all their work. The samples of my work I mentioned earlier just happen to be like that but sometimes it is worth doing something just because it is interesting even though it may end up as a cul-de-sac. We can learn so much from going down such cul-de-sacs both for ourselves as researchers, for our own development, as well as contributing to knowledge that something does not work.

In your whole interview so far you did not mention A.I. or deep learning. Could you please share your view on this hot topic and its influence on the multimedia community (if possible positive and negative aspects)?

Imagine, a whole conversation on multimedia without mentioning deep learning, so far !  Yes indeed it is a hot topic and there’s a mad scramble to use and try it for all kinds of applications because it is showing such improvement in many tasks and yes indeed it has raised the bar in terms of the quality of some tasks in multimedia, like concept indexing from visual media. However those of us around long enough will remember the “AI Winter” from a few decades ago, and we can’t let this great breakthrough raise expectations that we and others may have about what we can do with multi-modal and multimedia information.

So that’s the word of caution about expectations, but when this all settles down a bit and we analyse the “why” behind the success of deep learning we will realise that the breakthrough is as a result of closer modelling of our own neural processes. Early implementations of our own neural processing was in the form of  multi-connected networks, and things like the Connection Machine were effectively unstructured networks. What deep learning is doing is it is applying structure to the network by adding layers. Going forward, I believe we will turn more and more to neuroscience to inform us about other more sophisticated network structures besides layers, which reflect how the brain works and, just as today’s layered neural networks replicate one element we will use other neural structures for even more sophisticated (and deeper) learning.

ACM candidature statement:

I am honored to run for the position of Chair of SIGMM. I am an active member of ACM since I hosted the SIGIR conference in Dublin in 1994 and have served in various roles for SIGMM events since the early 2000s.

I see two ways in which we can maintain and grow SIGMM’s relevance and importance. The first is to grow collaborations we have with other areas. Multimedia technologies are now a foundation stone in many application areas, from digital humanities to educational technologies, from gaming to healthcare. If elected chair I will seek to reach out to other areas collaboratively, whereby their multimedia problems become our challenges, and developments in our area become their solutions.

My second priority will be to support a deepening of collaborations within our field. Already we have shown leadership in collaborative research with our Grand Challenges, Videolympics, and the huge leverage we get from shared datasets, but I believe this could be even better.
By reaching out to others and by deepening collaborations, this will improve SIGMM’s ability to attract and support new members while keeping existing members energised and rejuvenated, ensuring SIGMM is the leading special interest group on multimedia.

Multidisciplinary Column: An Interview with Suranga Nanayakkara

suranga

 

suranga

 

Could you tell us a bit about your background, and what the road to your current position was?

I was born and raised in Sri Lanka and my mother being an electrical engineer by profession, it always fascinated me to watch her tinkering around with the TV or the radio and other such things. At age of 19, I moved to Singapore to pursue my Bachelors degree at National University of Singapore (NUS) on electronics and computer engineering. I then wanted to go into a field of research that would help me to apply my skills into creating a meaningful solution. As such, for my PhD I started exploring ways of providing the most satisfying musical experience to profoundly deaf Children.

That gave me the inspiration to design something that provides a full body haptic sense.  We researched on various structures and materials, and did lots of user studies. The final design, which we call the Haptic Chair, was a wooden chair that has contact speakers embedded to it. Once you play music through this chair, the whole chair vibrates and a person sitting on the chair gets a full body vibration in tune with the music been played.

I was lucky to form a collaboration with one of the deaf schools in Sri Lanka, Savan Sahana Sewa, a College in Rawatawatte, Moratuwa. They gave me the opportunity to install the Haptic Chair in house, where there were about 90 hearing-impaired kids. I conducted user studies over a year and a half with these hearing-impaired kids, trying to figure out if this was really providing a satisfying musical experience. Haptic Chair has been in use for more than 8 years now and has provided a platform for deaf students and their hearing teachers to connect and communicate via vibrations generated from sound.

After my PhD, I met Professor Pattie Maes, who directs the Fluid Interfaces Group at the Media Lab. After talking to her about my research and future plans, she offered me a postdoctoral position in her group.  The 1.5 years at MIT Media Lab was a game changer in my research career where I was able to form the emphasis is on “enabling” rather than “fixing”. The technologies that I have developed there, for example, FingerReader, demonstrated this idea and have a potentially much broader range of applications.

At this time, Singapore government was setting up a new public university, Singapore University of Technology and Design (SUTD), in collaboration with MIT. I then moved to SUTD where I work as an Assistant Professor and direct the Augmented Human Lab (www.ahlab.org).

Your general agenda is towards humanizing technology. Can you tell us a bit about this mission and how it impacts your research?

When I started my bachelor’s degree in National University of Singapore, in 2001, I spoke no English and had not used a computer. My own “disability” to interact with computers gave me a chance to realize that there’s a lot of opportunity to create an impact with assistive human-computer interfaces.

This inspired me to establish ‘Augmented Human Lab’ with a broader vision of creating interfaces to enable people, connecting different user communities through technology and empowering them to go beyond what they think they could do. Our work has use cases for everyone regardless of where you stand in the continuum of sensorial ability and disability.   

In a short period of 6 years, our work resulted in over 11 million (SGD) research funding, more than 60 publications, 12 patents, more than 20 live demonstrations and most importantly the real-world deployments of my work that created a social impact.

How does multidisciplinary work play a role in your research?

My research focuses on design and development of new sensory-substitution systems, user interfaces and interactions to enhance sensorial and cognitive capabilities of humans.  This really is multidisciplinary in nature, including development of new hardware technologies, software algorithms, understanding the users and practical behavioral issues, understanding real-life contexts in which technologies function.

Can you tell us about your work on interactive installations, e.g. for Singapore’s 50th birthday? What are lessons learnt from working across disciplines?

I’ve always enjoyed working with people from different domains. Together with an interdisciplinary team, we designed an interactive light installation, iSwarm (http://ahlab.org/project/iswarm), for iLight Marina Bay, a light festival in Singapore. iSwarm consisted of 1600 addressable LEDs submerged in a bay area near the Singapore City center. iSwarm reacted to the presence of visitors with a modulation of its pattern and color.  This made a significant impact as more than 685,000 visitors came to see this (http://www.ura.gov.sg/uol/media-room/news/2014/apr/pr14-27.aspx).  Subsequently, the curators of the Wellington LUX festival invited us to feature a version of iSwarm (nZwarm) for their 2014 festival. Also, we were invited to create an interactive installation “SonicSG” (http://ahlab.org/project/sonicsg), for Singapore’s 50th anniversary SonicSG aimed at fostering a holistic understanding of the ways in which technology is changing our thinking about design in high-density contexts such as Singapore and how its creative use can reflect a sense of place. The project consisted of a large-scale interactive light installation that consisted on 1,800 floating LED lights in the Singapore River in the shape of the island nation.

Could you name a grand research challenge in your current field of work?

The idea of ‘universal design’, which, sometimes, is about creating main stream technology and adding a little ‘patch’ to label it being universal. Take the voiceover feature for example – it is better than nothing, but not really the ideal solution.  This is why despite efforts and the great variety of wearable assistive devices available, user acceptance is still quite low.  For example, the blind community is still so much dependent on the low-tech whitecane.

The grand challenge really is to develop assistive interfaces that feels like a natural extension of the body (i.e. Seamless to use), socially acceptable, works reliably in the complex, messy world of real situations and support independent and portable interaction.

When would you consider yourself successful in reaching your overall mission of humanizing technology?

We want to be able to create the assistive devices that set the de facto standard for people we work with – especially the blind community and deaf community.  We would like to be known as a team who “Provide a ray of light to the blind and a rhythm to the lives of the deaf”.

How and in what form do you feel we as academics can be most impactful?

For me it is very important to be able to understand where our academic work can be not just exciting or novel, but have a meaningful impact on the way people live.  The connection we have with the communities in which we live and with whom we work is a quality that will ensure our research will always have real relevance.

 

Editor Biographies

Cynthia_Liem_2017Dr. Cynthia C. S. Liem is an Assistant Professor in the Multimedia Computing Group of Delft University of Technology, The Netherlands, and pianist of the Magma Duo. She initiated and co-coordinated the European research project PHENICX (2013-2016), focusing on technological enrichment of symphonic concert recordings with partners such as the Royal Concertgebouw Orchestra. Her research interests consider music and multimedia search and recommendation, and increasingly shift towards making people discover new interests and content which would not trivially be retrieved. Beyond her academic activities, Cynthia gained industrial experience at Bell Labs Netherlands, Philips Research and Google. She was a recipient of the Lucent Global Science and Google Anita Borg Europe Memorial scholarships, the Google European Doctoral Fellowship 2010 in Multimedia, and a finalist of the New Scientist Science Talent Award 2016 for young scientists committed to public outreach.

 

 

 

jochen_huberDr. Jochen Huber is a Senior User Experience Researcher at Synaptics. Previously, he was an SUTD-MIT postdoctoral fellow in the Fluid Interfaces Group at MIT Media Lab and the Augmented Human Lab at Singapore University of Technology and Design. He holds a Ph.D. in Computer Science and degrees in both Mathematics (Dipl.-Math.) and Computer Science (Dipl.-Inform.), all from Technische Universität Darmstadt, Germany. Jochen’s work is situated at the intersection of Human-Computer Interaction and Human Augmentation. He designs, implements and studies novel input technology in the areas of mobile, tangible & non-visual interaction, automotive UX and assistive augmentation. He has co-authored over 60 academic publications and regularly serves as program committee member in premier HCI and multimedia conferences. He was program co-chair of ACM TVX 2016 and Augmented Human 2015 and chaired tracks of ACM Multimedia, ACM Creativity and Cognition and ACM International Conference on Interface Surfaces and Spaces, as well as numerous workshops at ACM CHI and IUI. Further information can be found on his personal homepage: http://jochenhuber.com

Report from ACM ICMR 2017

ACM ICMR 2017 in “Little Paris”

ACM ICMR is the premier International Conference on Multimedia Retrieval, and from 2011 it “illuminates the state of the arts in multimedia retrieval”. This year, ICMR was in an wonderful location: Bucharest, Romania also known as “Little Paris”. Every year at ICMR I learn something new. And here is what I learnt this year.

img1

Final Conference Shot at UP Bucharest


UNDERSTANDING THE TANGIBLE: object, scenes, semantic categories – everything we can see.

1) Objects (and YODA) can be easily tracked in videos.

Arnold Smeulders delivered a brilliant keynote on “things” retrieval: given an object in an image, can we find (and retrieve) it in other images, videos, and beyond? Very interesting technique for tracking objects (e.g. Yoda) in videos based on similarity learnt through siamese networks.

Tracking Yoda with Siamese Networks

Tracking Yoda with Siamese Networks

2) Wearables + computer vision help explore cultural heritage sites.

As showed in his keynote, at MICC University of Florence, Alberto del Bimbo and his amazing team have designed smart audio guides for indoor and outdoor spaces. The system detects, recognises, and describes landmarks and artworks from wearable camera inputs (and GPS coordinates, in case of outdoor spaces)

3) We can finally quantify how much images provide complementary semantics compared to text [BEST MULTIMODAL PAPER AWARD].

For ages, the community have asked how relevant different modalities are for multimedia analysis: this paper (http://dl.acm.org/citation.cfm?id=3078991) finally proposes solution to quantify information gaps between different modalities.

4) Exploring news corpuses is now very easy: news graphs are easy to navigate and aware of the type of relations between articles.

Remi Bois and his colleagues presented this framework (http://dl.acm.org/citation.cfm?id=3079023), made for professional journalists and the general public, for seamlessly browsing through large-scale news corpus. They built a graph where nodes are articles in a news corpus. The most relevant items to each article are chosen (and linked) based on an adaptive nearest neighbor technique. Each link is then characterised according to the type of relation of the 2 linked nodes.

5) Panorama outdoor images are much easier to localise.

In his beautiful work (https://t.co/3PHCZIrA4N), Ahmet Iscen from Inria developed an algorithm for location prediction from StreetView images, outperforming the state of the art thanks to an intelligent stitching pre-processing step: predicting locations from panoramas (stitched individual views) instead of individual street images improve performances dramatically!

UNDERSTANDING THE INTANGIBLE: artistic aspects, beauty, intent: everything we can perceive

1) Image search intent can be predicted by the way we look.

In his best paper candidate research work (http://dl.acm.org/citation.cfm?id=3078995), Mohammad Soleymani showed that image search intent (seeking information, finding content, or re-finding content) can be predicted from physisological responses (eye gaze) and implicit user interaction (mouse movements).

2) Real-time detection of fake tweets is now possible using user and textual cues.

Another best paper (http://dl.acm.org/citation.cfm?id=3078979) candidate, this time from CERTH. The team collected a large dataset of fake/real sample tweets spanning 17 events and built an effective model from misleading content detection from tweet content and user characteristics. A live demo here: http://reveal-mklab.iti.gr/reveal/fake/.

3) Music tracks have different functions in our daily lives.

Researchers from TU Delft have developed an algorithm (http://dl.acm.org/citation.cfm?id=3078997) which classifies music tracks according to their purpose in our daily activities: relax, study and workout.

4) By transferring image style we can make images more memorable!

The team at University of Trento built an automatic framework (https://arxiv.org/abs/1704.01745) to improve image memorability. A selector finds the style seeds (i.e. abstract paintings) which are likely to increase memorability of a given image, and after style transfer, the image will be more memorable!

5) Neural networks can help retrieve and discover child book illustrations.

In this (https://arxiv.org/pdf/1704.03057.pdf) amazing work, motivated by real children experiences, Pinar and her team from Hacettepe University collected a large dataset of children book illustrations and found that neural networks can predict and transfer style, allowing to make “Winnie the witch”-like many other illustrations.

Winnie the Witch

Winnie the Witch

6) Locals perceive their neighborhood as less interesting, more dangerous and dirtier compared to non-locals. 

In this (http://www.idiap.ch/~gatica/publications/SantaniRuizGatica-icmr17.pdf) wonderful work presented by Darshan Santain from IDIAP, researchers asked locals and crowd-workers to look at pictures from various neighborhoods in Guanajuato.

THE FUTURE: What’s Next?

1) We will be able to anonymize images of outdoor spaces thanks to Instagram filters, as proposed by this (http://dl.acm.org/citation.cfm?id=3080543) work in the Brave New Idea session.

When an image of an outdoor space is manipulated with appropriate Instagram filters, the location of the image can be masked from vision-based geolocation classifiers.

2) Soon we will be able to embed watermarks in our Deep Neural Network models in order to protect our intellectual property [BEST PAPER AWARD].

This is a disruptive, novel idea, and that is why this work from KDDI Research and Japan National Institute of Informatics won the best paper award. Congratulations!

3) Given an image view of an object, we will predict the other side of things (from Smeulders’ keynote). In the pic: predicting the other side of chairs. Beautiful.

Predicting the other side of things

Predicting the other side of things 

 

THANKS: To the organisers, to the volunteers, and to all the authors for their beautiful work :)

MPEG Column: 119th MPEG Meeting in Turin, Italy

The original blog post can be found at the Bitmovin Techblog and has been updated here to focus on and highlight research aspects.

The MPEG press release comprises the following topics:

  • Evidence of New Developments in Video Compression Coding
  • Call for Evidence on Transcoding for Network Distributed Video Coding
  • 2nd Edition of Storage of Sample Variants reaches Committee Draft
  • New Technical Report on Signalling, Backward Compatibility and Display Adaptation for HDR/WCG Video Coding
  • Draft Requirements for Hybrid Natural/Synthetic Scene Data Container

Evidence of New Developments in Video Compression Coding

At the 119th MPEG meeting, responses to the previously issued call for evidence have been evaluated and they have all successfully demonstrated evidence. The call requested responses for use cases of video coding technology in three categories:

  • standard dynamic range (SDR) — two responses;
  • high dynamic range (HDR) — two responses; and
  • 360° omnidirectional video — four responses.

The evaluation of the responses included subjective testing and an assessment of the performance of the “Joint Exploration Model” (JEM). The results indicate significant gains over HEVC for a considerable number of test cases with comparable subjective quality at 40-50% less bit rate compared to HEVC for the SDR and HDR test cases with some positive outliers (i.e., higher bit rate savings). Thus, the MPEG-VCEG Joint Video Exploration Team (JVET) concluded that evidence exists of compression technology that may significantly outperform HEVC after further development to establish a new standard. As a next step, the plan is to issue a call for proposals at 120th MPEG meeting (October 2017) and responses expected to be evaluated at the 122th MPEG meeting (April 2018).

We already witness an increase of research articles addressing video coding technologies with capabilities beyond HEVC which will further increase in the future. The main driving force is over the top (OTT) delivery which calls for more efficient bandwidth utilization. However, competition is also increasing with the emergence of AV1 of AOMedia and we may observe also an increasing number of articles in that direction including evaluations thereof. An interesting aspect is also that the number of use cases is also increasing (e.g., see different categories above), which adds further challenges to the “complex video problem”.

Call for Evidence on Transcoding for Network Distributed Video Coding

The call for evidence on transcoding for network distributed video coding targets interested parties possessing technology providing transcoding of video at lower computational complexity than transcoding done using a full re-encode. The primary application is adaptive bitrate streaming where a highest bitrate stream is transcoded into lower bitrate streams. It is expected that responses may use “side streams” (or side information, some may call it metadata) accompanying the highest bitrate stream to assist in the transcoding process. MPEG expects submissions for the 120th MPEG meeting where compression efficiency and computational complexity will be assessed.

Transcoding has been discussed already for a long time and I can certainly recommend this article from 2005 published in the Proceedings of the IEEE. The question is, what is different now, 12 years later, and what metadata (or side streams/information) is required for interoperability among different vendors (if any)?

A Brief Overview of Remaining Topics…

  • The 2nd edition of storage of sample variants reaches Committee Draft and expands its usage to MPEG-2 transport stream whereas the first edition primarily focused on ISO base media file format.
  • The new technical report for high dynamic range (HDR) and wide colour gamut (WCG) video coding comprises a survey of various signaling mechanisms including backward compatibility and display adaptation.
  • MPEG issues draft requirements for a scene representation media container enabling the interchange of content for authoring and rendering rich immersive experiences which is currently referred to as hybrid natural/synthetic scene (HNSS) data container.

Other MPEG (Systems) Activities at the 119th Meeting

DASH is in fully maintenance mode as only minor enhancements/corrections have been discussed including contributions to conformance and reference software. The omnidirectional media format (OMAF) is certainly the hottest topic within MPEG systems which is actually between two stages (i.e., between DIS and FDIS) and, thus, a study of DIS has been approved and national bodies are kindly requested to take this into account when casting their votes (incl. comments). The study of DIS comprises format definitions with respect to coding and storage of omnidirectional media including audio and video (aka 360°). The common media application format (CMAF) has been ratified at the last meeting and awaits publications by ISO. In the meantime CMAF is focusing on conformance and reference software as well as amendments regarding various media profiles. Finally, requirements for a multi-image application format (MiAF) are available since the last meeting and at the 119th MPEG meeting a work draft has been approved. MiAF will be based on HEIF and the goal is to define additional constraints to simplify its file format options.

We have successfully demonstrated live 360 adaptive streaming as described here but we expect various improvements from standards available and under development of MPEG. Research aspects in these areas are certainly interesting in the area of performance gains and evaluations with respect to bandwidth efficiency in open networks as well as how these standardization efforts could be used to enable new use cases. 

Publicly available documents from the 119th MPEG meeting can be found here (scroll down to the end of the page). The next MPEG meeting will be held in Macau, China, October 23-27, 2017. Feel free to contact me for any questions or comments.