Blog

5/1/23  MPEG 142 - Call for proposals for FCVCM and other developments

The 142nd MPEG meeting took place in Antalya, Turkey from April 24-28. This was the first time MPEG held its quarterly meeting in Turkey. The organization was good, and the impressions of experts and visitors were positive.

Feature Coding for Video Coding for Machines

MPEG issued the call for proposals (CfP) for Feature Coding for Video Coding for Machines (FCVCM). As we already mentioned in our blog, Video Coding for Machines (VCM) has been under development for a couple of meeting cycles. The main purpose of VCM is to standardize encoding of the images and videos that are processed (i.e. consumed) by the machines. Input to VCM is an image or video that comes straight from the camera or a storage device. In the case of FCVCM, the input to the standard will be feature stream that comes from the output of the layers of the neural network. Instead of encoding visual information such as videos and images, the FCVCM will encode features from the “middle” of neural network, while still producing bitstream that is in the end going to be consumed by the machines.

From the CfP’s introduction:

In 2019 MPEG started an investigation into the area of video coding for machines. The focus of this exploration was to study the case where images and videos are compressed not to be looked at and evaluated by humans, but rather by machine vision algorithms. These algorithms can serve different purposes such as object detection, instance segmentation, or object tracking. As video compression standards such as HEVC or VVC are developed and optimized towards the human visual system, the existing standards may not be optimal for applications where the video is analyzed by machines. One aspect is the compression of intermediate features seen in a neural network.

Regarding feature compression, a formal call for evidence was issued in July 2022 and provided evidence that this can be achieved in different ways. This call for proposals is the start of a process which has the creation of a new international standard as its goal.

This work on “Feature Compression for Video Coding for Machines” (FCVCM) aims at compressing features for machine tasks. As networks increase in complexity architectures such as ‘Collaborative Intelligence’ (whereby a network is distributed across an edge device and the cloud) become advantageous. With the rise of newer network architectures being deployed amongst a heterogenous population of edge devices, such architectures bring flexibility to systems implementers. As a consequence of such architectures, there is a need to efficiently compress intermediate feature information for transport over wide area networks (WANs). As feature information differs substantially from conventional image or video data, coding technologies and solutions could be different from conventional ones in order to achieve optimized performance for machine usage. With the rise of machine learning technologies and machine vision applications, the amount of video and images consumed by machines has been rapidly growing. Typical use cases include intelligent transportation, smart city, intelligent content management, etc., which incorporate machine vision tasks such as object detection, instance segmentation, and object tracking. Due to the large volume of video data, it is essential to extract and compress the features from video for efficient transmission and storage. Feature compression technology solicited in this CfP can also be helpful in some other regards, such as computational offloading and privacy protection. This call focuses on the compression of features and thus responses are expected to produce decoded features that will be used to complete execution of a pre-defined set of machine vision algorithm to generate the performance results.

This CfP welcomes submissions of proposals from companies and other organizations. Registration is required by the 3rd of July 2023; the submission of bitstream files, results, and decoder packages is required by the 13th of September 2023; and the submission of proponent documentation is due by the 9th of October 2023. Evaluation of the submissions in response to the CfP will be performed at the 144th MPEG meeting in October 2023.

OP Solutions plans to respond to the call with a joint proposal with Florida Atlantic University.


Video Coding for Machines

Video Coding for Machines (VCM) work continued with a main focus on refining core experiments, defining appropriate anchor (reference) and producing first output documents. Upon revision of the proposals in all Core Experiments (CEs), it was decided to merge some of the core experiments, resulting in 3 CEs as compared to 5 CEs before the meeting. The current CEs are :

CE 1 - Region-of-interest based coding methods,

CE 2 - Neural network based inner coding,

CE 3 - Spatial Resampling.

Work on the preliminary draft of the Technology-under-Consideration (TuC) document was initiated. This document will describe core technologies that are tested and are candidates to be adopted in the final standard.

Ad-hoc group (AhG) mandates were specified to guide the work before the next meeting and beyond – including:

-       Completion of output documents, including TuC.

-       Release VCM Reference Software v0.5.

-       Continue developing VCM technologies.

-       Continue collecting test and training materials.

-       Continue refining cross-check procedure.

Working Group for Video, which oversees the development of the VCM standard targets January 2024. as a date for release of the first Working Draft of the standard.

OP Solutions will continue participation in VCM jointly with Florida Atlantic University.


Some of the other developments

In the context of the WG2/Market Needs activity, in which the group is tasked to identify the MPEG standards and technologies applicable to Metaverse, potential use cases were collected, and matching MPEG technologies were identified. Use cases are collected following the template that includes the self-assessment of 7 characteristics linked to the idea of what the Metaverse is specific in: Real time aspects, 3D experiences, interactivity of user senses, user navigation, social aspects, persistence of events and representation of users and objects. Documented use cases include: Virtual dressing room, Online game enjoyed simultaneously on different immersive displays, Digital asset bank for online communities, Virtual museums, AR two-party call, Immersive Live Performances, B2B Digital twin systems in critical environments. Work on identifying additional use cases and development of the MPEG architectures to support current use cases will continue.

 

MPEG immersive video (MIV) conformance and reference software standard (ISO/IEC 23090-23) has been promoted to the Final draft international standard (FDIS) stage, the last formal milestone of its approval process. The document specifies how to conduct conformance tests and provides reference encoder and decoder software for ISO/IEC 23090-12 MPEG immersive video. This draft includes 23 verified and validated conformance bitstreams and encoding and decoding reference software based on version 15.1.1 of the Test model for MPEG immersive video (TMIV). The test model, objective metrics, and some other tools are publicly available at https://gitlab.com/mpeg-i-visual.

 

At this meeting, MPEG Systems (WG 3) reached the first milestone for ISO/IEC 23090-32 entitled “Carriage of haptics data” by promoting the text to Committee Draft (CD) status. This specification enables the storage and delivery of haptics data (defined by ISO/IEC 23090-31) in the ISO Base Media File Format (ISOBMFF; ISO/IEC 14496-12). Considering the nature of haptics data composed of spatial and temporal components, a data unit with various spatial or temporal data packets is used as a basic entity like an access unit of audio-visual media. Additionally, an explicit indication of a silent period considering the sparse nature of haptics data, has been introduced in this draft. The standard is planned to be completed, i.e., to reach the status of Final Draft International Standard (FDIS), by the end of 2024.

 

MPEG released a white paper titled “White paper on Geometry based Point Cloud Compression (G-PCC)”. This white paper is interesting from the aspect of describing technology that is potentially very useful in the automotive industry and other industries that use 3D modalities, sucj as LiDAR. Geometry-based Point Cloud Compression (G-PCC) provides a standard for coded representation of point cloud media. Point clouds may be created in various manners. Recently, 3D sensors such as Light Detection And Ranging (LiDAR) or Time of Flight (ToF) devices have been widely used to scan dynamic 3D scenes. To precisely describe 3D objects or real-world scenes, point clouds come with a large set of points in the 3D space with geometry information and attribute information. The geometry information represents the 3D coordinates of each point in the point cloud; the attribute information describes the characteristics (e.g., colour and reflectance) of each point. Point clouds require a large amount of data, bringing huge challenges to data storage and transmission. White paper is available at https://www.mpeg.org/whitepapers/ .

 

Finally, MPEG issued the roadmap for all the standards that reflects updates from the 142nd meeting, presented below.

 

The 143rd MPEG meeting will take place July 17-21 at CICG, in Geneva, Switzerland.


2/17/23 MPEG 141 - MPEG-AI is born, VCM further evolves

The 141st MPEG meeting, held in person and online in the third week of January, covered many topics, including the establishment of an over-arching project that will cover all machine learning-related initiatives within MPEG and Video Coding for Machines (VCM).

1. MPEG-AI

 

At the meeting, experts initiated MPEG-AI, an umbrella initiative for all the AI-related activities within MPEG (VCM, FCVCM, NNC, etc.). The project will officially be launched at the April 2023 MPEG meeting. More information can be found on the project’s website, at https://www.mpeg.org/standards/MPEG-AI/.

2. VCM

 

Activities on the Video Coding for Machines (VCM) standard continued during the ad-hoc group and break-out group meetings.

As a reminder, the VCM call for proposals issued in July of 2022. Excerpt from the MPEG’s press release (https://www.mpeg.org/meetings/mpeg-140/) stated:

At the 140th MPEG meeting, MPEG Technical Requirements (WG 2) evaluated the responses to the Call for Proposals (CfP) for technologies and solutions enabling efficient video coding for machine vision tasks. A total of 17 responses to this CfP were received, with responses providing various technologies such as (i) learning-based video codecs, (ii) block-based video codecs, (iii) hybrid solutions combining (i) and (ii), and (iv) novel video coding architectures. Several proposals use a region of interest-based approach, where different areas of the frames are coded in varying qualities.

The responses to the CfP reported an improvement in compression efficiency of up to 57% on object tracking, up to 45% on instance segmentation, and up to 39% on object detection, respectively, in terms of bit rate reduction for equivalent task performance. Notably, all requirements defined by WG 2 were addressed by a variety of proposals.

Given the success of this call, MPEG will continue working on video compression methods for machine vision tasks. The work will continue in MPEG Video Coding (WG 4) within a new standardization project. A test model will be developed based on technologies from the responses to the CfP and results from the first round of core experiments in one or two meeting cycles. At the same time, the Joint Video Team with ITU-T SG 16 (WG 5) will study encoder optimization methods for machine vision tasks on top of existing MPEG video compression standards.”

Furthermore, in the publc document “CfP response report for Video Coding for Machines” (https://www.mpeg.org/wp-content/uploads/mpeg_meetings/140_Mainz/w22071.zip), MPEG expressed acknowledgment of the participating organizations:

The following organizations are thanked for responding to this CfP:

·         Alibaba

·         Institute of Computing Technology, Chinese Academy of Sciences (CAS-ICT)

·         China Telecom

·         City University of Hong Kong

·         Ericsson

·         Electronics and Telecommunications Research Institute (ETRI)

·         Florida Atlantic University (FAU)

·         Konkuk University

·         Myongji University

·         Nokia

·         OP Solutions

·         Poznan University of Technology (PUT)

·         Tencent

·         V-Nova

·         Wuhan University

·         Zhejiang University


During the 141st meeting updated results from the proponents that responded to the Call for Proposals were reviewed, and the decision was made to continue the work on the reference software as well as five core experiments (CEs):

CE 1 – Region-of-interest based coding methods,

CE 2 – Neural network based inner coding,

CE 3 – Frame level spatial resampling,

CE 4 – Temporal resampling,

CE 5 – Post filtering.

OP Solutions, together with its partner institution, Florida Atlantic University, continues to participate in the development of the VCM standard, as a proponent of proposals directed to several core experiments. New and updated results of or proposed technology will be presented in the 142nd MPEG meeting in April.

In addition, the draft CfP was issued for the Feature Compression for Video Coding for Machines (FCVCM). In contrast to VCM, which as inputs takes the pixel domain picture or a frame of a video, the FCVCM takes as inputs the features from the arbitrary layer of the neural network processing the input picture. (We are planning to write additional blog posts explaining details of those technologies in the near future – stay tuned!).

The final CfP for FCVCM will issue in April. OP Solutions plans to respond to this CfP as well.

3.     MPEG roadmap

MPEG’s roadmap emphasizes the importance of the VCM, FCVCM, and related. This is a short-term plan that is result of MPEG experts’ assessment of current status and near-term viability of the ongoing standardization efforts.

In the accompanying presentation, MPEG gives following rationale for producing and publicizing the roadmap:

MPEG has created, and still produces, media standards that enable huge markets to flourish

•   MPEG works on requirements from industry.

•   Many industries are represented in MPEG, but not all of MPEG’s customers can or need to participate in the process.

•   MPEG wants to inform its customers about its long-term plans (~ 5 years out).

•   MPEG collects feedback and requirements from these customers.

The roadmap is shaped by significant developments

•   The relentless increase of IP-distributed and mobile media

•   Higher quality media

•   More immersion (UHD, VR, AR, Light Fields, Holography)

•   The Internet of Media Things & Wearables

•   Cloud-based media processing, storage and delivery

•   New high-speed networks including fiber, 5G mobile, and cable 10G

•   New emerging technologies (machine vision, AI)

The short-term plan for the MPEG’s roadmap, after the 141st meeting, is depicted in the picture accompanying this blog post.

We are glad to announce that OP Solutions will continue participating in the MPEG’s work on the exciting and promising new technologies.