Common Media Application Framework (CMAF) - What is it and how does it work?

When building a livestreaming or video-on-demand (VOD) app, you need it to work on as many devices as possible while minimizing latency or other issues that could send users to a competitor instead. Not that long ago, it was much harder to develop an app like this without needing to do double the work to reach Apple devices using one protocol and Windows and Android devices using another. The Common Media Application Framework (CMAF) is one of the reasons this has become less of a headache.

What is the Common Media Application Format (CMAF)?

The Common Media Application Format (CMAF) is a standard for encoding and delivering media via different HTTP-based streaming protocols. It was released in 2018 to enable greater compatibility and reduce costs.

As the Real-Time Messaging Protocol (RTMP) was gradually being replaced due to the approaching end of life for Flash Player, HTTP-based protocols became more prevalent for streaming.

In the early 2010s, Apple's proprietary HTTP Live Streaming (HLS) and the Moving Picture Expert Group's (MPEG) open-source MPEG-Dynamic Adaptive Streaming over HTTP (MPEG-DASH or simply DASH) became common. They offered improved performance compared to the protocols used in Flash Player and Apple's QuickTime. HLS was the favored protocol for Apple devices, and Microsoft and Android platforms mostly used DASH.

At this time, HLS used the .ts container format, and DASH used standard .MP4s and fragmented .MP4s (fMP4). This required encoding and storing two files for each media piece, doubling costs.

Microsoft, Apple, and MPEG teamed up to create CMAF, which enabled DASH and HLS to share a common container format: a standardized form of the fMP4. HLS's .ts and DASH's fMP4 formats were already fragmented, but the standardization of the fMP4 made these fragments uniform, which enabled greater interoperability.

How Does CMAF Work?

Instead of having one large file that combines your video, audio, and captions, there are tracks made up of many different files. Tracks correspond to media type, codec, and quality. There are individual tracks for audio, video, subtitles, closed captions, metadata, and more that come in different resolutions and bitrates and can use different codecs or formats.

For example, one short section of a movie might have H.264/AVC and H.265/HEVC video tracks with multiple resolution options, AAC audio tracks in different languages with varying bitrate options, and subtitle and closed caption tracks for different languages.

Tracks comprise initialization files and segments. The initialization file, a .MP4, has the information required to start the track. The segments, .m4s files, are the broken-up pieces of the media file, usually 2-10 seconds long.

Segments are further broken into shorter fragments that might be less than a second. They make the encoding and delivery process faster, which is why the format is called fragmented MP4. Instead of waiting for a full media file, smaller fragments for audio, video, and anything else used in playback play as soon as they're ready.

Along with tracks, you have manifests. These tell the player how to start the media (initialization files) and in what order the segments should be downloaded and played. HLS manifests use the .m3u8 file format, and DASH manifests use .mpd.

When following this standard, segments and fragments will be the same length and timing for both protocols. The manifests are formatted differently, but they play the same broken-up media pieces in the same order.

The track, manifest, initialization file, and segment structure apply to DASH and HLS with or without CMAF. Following the standard, a track is a CMAF track, an initialization file is a CMAF header, and a segment is a CMAF segment.

If you build a video-on-demand (VOD) service, users will open the page with your video or select it on your app, which will open the manifest file matching their platform's preferred protocol. This will supply the media player with the information necessary to keep the audio, video, subtitles, etc. tracks in sync and playing in the correct order, regardless of whether you utilize HLS or DASH. The player fetches the header and then starts loading the corresponding segments. If a user's network connection slows down, the proper segment(s) in a lower resolution will load and sync up for seamless playback, thanks to Adaptive Bitrate Streaming (ABR). Switching subtitles or languages midstream will also sync up without issue.

With livestream apps, the process is similar, but the live video and audio are encoded as they're captured and turned into segments and fragments when delivered to the CDN for the viewer.

Low Latency CMAF

CMAF enables low latency performance via chunked encoding and chunked transfer encoding. This reduces the delay from 20-30 seconds to roughly 3-5.

Chunked encoding is the process of splitting fragments into smaller pieces called chunks. Chunks are even shorter than fragments. For video, they can be several frames or as small as a single one. Chunked transfer encoding uses HTTP/1.1 to deliver these chunks without needing an entire fragment or segment to load.

What Are the Advantages and Disadvantages of CMAF?

Each streaming method comes with its distinct benefits and drawbacks. This standard is cost-effective and helps developers deliver content at scale to many platforms, but it can present some issues with complexity, legacy devices, and when real-time streaming is required.

Advantages

One standardized format for HLS and DASH means users can watch videos and streams on your app or site without duplicates. Standardization simplifies cross-platform compatibility and cuts down on costs.

With chunked encoding and chunked transfer encoding, CMAF is capable of low latency performance. Your users won't need to wait as long for the player to load chunks, reducing the likelihood of buffering.

Its use of ABR supports a wide range of user connections. Your users can stream from the comfort of home or on the go. The quality will adjust to match their bandwidth availability.

CMAF receives regular updates and is currently in its third edition. Updates have added support for features like 8K UHD, higher frame rates, and low latency. There have also been new brand profiles ('cmf1' and 'cmf2') with additional restrictions for when tighter performance control is required.

Disadvantages

Low latency isn't ultra low latency (ULL) like WebRTC offers. It isn't capable of sub-second latency, rendering it a poor choice in certain use cases where real-time interactions are required, like when building video chat apps.

With CMAF, you're working with two protocols. This can add complexity regarding codecs, subtitle and closed caption formats, encryption, and digital rights management (DRM). Looking at encryption specifically, HLS supports one type of Common Encryption Scheme (CENC) mode ('cbcs'), and DASH initially only supported the 'cenc' mode. This meant encryption on all platforms still required two versions of each segment before support for 'cbcs' with DASH was added.

It's compatible with most modern platforms, but it isn't supported by devices that predate it, and some might not be capable of receiving the updates needed for new capabilities. For example, any Apple devices below iOS and tvOS 10.0 or MacOS 10.12 are unlikely to be compatible at all. The need for duplicates to reach a user base with older devices might lessen or outright negate the savings allowed by lower encoding and storage costs.

The CMAF Standard vs Streaming Protocols

Although CMAF is not a protocol itself, it's worth comparing it to some popular protocols, including HLS and DASH:

HLS: While compatible with other containers without CMAF, HLS typically uses the MPEG-2 Transport Stream (.ts) container. This is the streaming protocol mainly used on Apple devices.
DASH: DASH already primarily used the fMP4 format, but the fragmentation is different without adhering to this standard. It's the streaming protocol most commonly used on Windows and Android devices.
RTMP: Based on the Transmission Control Protocol (TCP), Macromedia (later owned by Adobe) created RTMP. Its deprecation set the stage for the creation of CMAF. You can use RTMP for ingesting with WebRTC, as well as HLS and DASH, with or without the standard.
WebRTC: WebRTC is a P2P protocol, whereas the DASH and HLS protocols use HTTP. Using a peer connection makes it capable of ULL (sub-second delay), but it is unable to handle large audiences without using a Multipoint Control Unit (MCU) or Selective Forwarding Unit (SFU), which comes with additional costs.

When Should You Use CMAF?

When it comes to streaming, CMAF has two strong use cases: streaming with minimal delay on multiple modern platforms and scaling to accommodate a large audience.

You want low latency performance on a wide variety of platforms.

It's compatible with nearly all modern devices. Most users can log in to your site or app and access your videos without downloading special players or additional codecs.

ABR and chunked encoding will prevent long buffer times for your users, allowing them to watch without interruption. Your content creators and streamers won't need to worry about losing viewers because they get frustrated waiting for their videos to load.

You need to reach a large audience.

If your app has a large audience, your CDN's edge servers will allow quicker and more efficient delivery of videos and livestreams to your users. Chunked encoding also helps here by reaching these larger audiences with minimal delay. WebRTC offers ULL, which can make it seem like the default pick for livestreaming, but scaling becomes an issue due to its P2P nature, making it less effective without significant costs.

When Should You Not Use CMAF?

There are also cases where this standard isn't fast enough for your needs or isn't even worth implementing to begin with.

You need ultra-low latency for real-time interactions.

If you're building a conferencing app or a live video calling app, its lack of ULL performance makes it a poor choice. The 2+ second delay would only send your potential users into the arms of a competitor. In this situation, WebRTC is a better choice as P2P architecture delivers much lower latency.

Similarly, if you have a real-time livestreaming use case where there needs to be near-instantaneous interaction between the audience and streaming personality, CMAF just won't cut it on its own. For example, if you're developing an app to facilitate live auctions, delays could cause missed bids or similar issues. Depending on factors like the size of the audience and how much latency they can tolerate, you might utilize WebRTC with SFU cascading, Secure Reliable Transport Protocol (SRT), or you might use RTMP for ingest and CMAF for delivery.

You're building for only one platform and plan to stay there.

This standard was designed to reach Apple, Microsoft, Android, and other platforms. If your audience is only going to be on one of these, and you have no desire to expand beyond it, it's better to just stick with that platform's preferred protocol. There are LL-HLS and LL-DASH implementations that will be simpler to set up than having to accommodate both.

Best Practices for CMAF Implementation

Even with the right use case, poor implementation of this standard can mean issues like high latency, bad video quality, or server overload. Follow these best practices to capitalize on its strengths.

Low latency

The default implementation of this standard isn't low latency. You must modify the manifest files for both protocols and configure the settings in your encoding, transcoding, packaging, and CDN service(s) to enable it yourself. Some settings that may need to be changed include the segment length, fragment length, and CDN cache policy parameters.

Check the documentation of your media delivery platform for their specific recommendations.

Alignment vs compatibility

There is a tradeoff between deeper alignment and reaching users with older devices that should be discussed during the planning stages of the development cycle. You will have to choose one or the other because some cell phones, smart TVs, etc. cannot play DASH streams with 'cbcs' mode, meaning duplicates will still be required.

Performance testing

Test your streams on different devices and connection qualities, in various geographical regions, with HLS and DASH protocols, and under high and low load conditions. This will help you catch problems before your users have a chance during the initial release. After release, continuously test and monitor for issues that may come up, especially if you anticipate a higher number of active users than normal for special event streams.

You can utilize the tools available on your media delivery platform, like CloudWatch RUM on CloudFront or CloudTest on Akamai.

Frequently Asked Questions

What is the difference between MP4 and CMAF?

CMAF is a standard for encoding and streaming. MP4 is a container format most often used for video and audio. It can also store other content like chapter information, interactive menus, subtitles, and captions. Protocols following this standard use a variation of the MP4 called the fragmented MP4.

In the streaming process, CMAF is the standard, HLS or DASH is the protocol, and fragmented MP4 is the format.

What is the difference between CMAF and LL HLS?

CMAF is a standard that applies to protocols. Low latency HTTP Live Streaming (LL HLS) is a lower-latency variation of Apple’s HLS protocol.

You can configure LL HLS in a way that is compatible with this specification, but it doesn’t need to be. It can also use the .ts file container like regular HLS.

Who uses CMAF?

Various companies that offer VOD/OTT streaming and livestreaming use it. Some famous companies that use it include Netflix, Hulu, Disney+, and Twitch.

Is CMAF compatible with all streaming devices?

It’s compatible with most technology from recent years via either HLS or DASH. Platforms that support CMAF include:

Google Chrome
Firefox
Safari
Microsoft Edge
Apple devices
Windows devices
Android devices
Xbox One and Xbox Series S/X
PlayStation 5

Other browsers based on Chromium should be able to run it, but they might require some tweaking. Some Brave users have had issues with CMAF-based streaming services due to settings that block third-party cookies or Digital Rights Management (DRM) systems like Widevine.

How does CMAF improve content delivery efficiency?

Instead of caching two files, CDNs only need to cache one fragmented MP4 that can be further broken down into chunks. This saves money on storage and encoding and leads to a faster, seamless delivery process for the viewer of the content. CMAF Switching Sets and ABR make switching between different bitrates to accommodate the viewer’s bandwidth availability smoother.

What video and audio codecs are supported?

There is support for H.264/AVC, H.265/HEVC, VP9, and AV1 video codecs. You can use AAC, HE-AAC, AC3, and EAC3 for audio codecs.

What encryption and DRM options are supported?

Encryption is possible with CENC via the Advanced Encryption Standard Counter (AES-CTR) ‘cenc’ mode or AES Cipher Block Chaining (AES-CBC) ‘cbcs’ mode. For DRM, it supports Widevine, PlayReady, FairPlay, and ClearKey. Both encryption and DRM happen at the segment level.

How does CMAF handle subtitles and captions?

CMAF supports subtitles using IMSC and WebVTT formats and closed captions in CEA-608 and CEA-708 formats. Remember that the protocol used may not support all of these formats. For example, HLS only supports the IMSC1 text profile, not its image profile.

Are there any limitations to using CMAF?

It’s only capable of roughly a five-second delay, but it can’t go under a second. This means that while it is suited for VOD services and livestreams that won’t suffer from a few seconds of delay, it is not suited for live calling or more latency-sensitive livestreaming needs.

It can also present compatibility issues if you have a significant number of users on legacy devices.

What are the potential cost savings?

On paper, it eliminates the need for duplicates. If a company had previously been using HLS and DASH without CMAF, they would cut their costs in half in this area after implementing CMAF. However, this isn’t always the case.

In reality, if the company supports legacy devices and can’t always use CMAF, they may need to store duplicates. There would still be cost savings, just not as much as a pure CMAF implementation.