Video Encoding - What is it and how does it work?

From the earliest days of converting analog to digital video with the onset of DVDs, the amount of video content stored and streamed online has grown exponentially, with approximately 5.35 billion internet video consumers worldwide in 2024.

In response to this trend of growing viewership, video encoding addresses the crucial tasks of reducing video file sizes while optimizing playback and ensuring compatibility with the expanding variety of devices and platforms on the market.

What is Video Encoding?

Video encoding converts raw video files into a smaller, more manageable digital format. It applies complex compression algorithms that eliminate redundant data to reduce file size while preserving as much video quality as possible.

Video encoding makes videos easier to store, transmit, and playback across devices. It's used in live streaming apps, on-demand platforms like YouTube and Netflix, and for downloading and sharing video content online. Video encoding can also convert analog recordings into digital formats. The result is a digital file with video and audio data optimized for efficient storage and portability.

The encoding process uses video compression to create smaller files that are easier to upload, download, and transfer. Since it conserves server space, it's essential for web storage and video sharing.

After analyzing the content of frames to determine their significance, video compression uses encoding algorithms to reduce the size of digital video files --- potentially up to 50%. It eliminates non-essential frames and references identical frames instead of storing them.

Decoding

Decoding refers to the process of interpreting the encoded bitstream to reveal the video in its original form. This occurs on the viewer's end and is necessary to display the video properly after transmission.

Decoding is crucial because it prevents interruptions, provides smooth playback, and maintains the quality of the video. In essence, encoding prepares the video for transmission, while decoding allows the recipient to view it seamlessly.

Transcoding

Transcoding converts encoded video into different formats, resolutions, or bitrates to suit the needs of various devices and network conditions. This allows the video to adapt to different screen resolutions or bandwidths, providing flexibility for users with diverse viewing preferences.

Transcoding ensures that video content is accessible across multiple platforms and devices, optimizing the viewing experience for each specific environment.

Why is Video Encoding Needed?

Video encoding is needed to reduce the size of raw video files. Uncompressed files can be huge, with one minute of footage potentially needing several gigabytes of storage space. This isn't practical for storing, sharing, or streaming, particularly for long-form content like movies or TV shows.

Through lossy and lossless compression techniques, video encoding eliminates redundant data and reduces file sizes while retaining pertinent details and preserving quality. This size reduction also optimizes the video for efficient transmission by decreasing the required bandwidth, which prevents issues like buffering during streaming.

Video encoding also has niche applications in professional film production, broadcasting, and medical imaging, where maintaining full video quality is critical. In these cases, the encoding focuses on transmission and storage instead of compressing files to reduce their size. It often encrypts data during transfer to preserve video integrity without reducing quality. Hardware-based video encoding systems usually play a crucial role In these contexts.

For example, live TV production uses broadcast hardware encoders to encode video streams before transmission via satellite, cable, or over-the-air broadcasting. Other hardware encoders are integrated into cameras, drones, and other devices to encode video data as it's recorded for more efficient storage and better compatibility without relying on the internet for transmission. These applications emphasize video quality and real-time processing, prioritizing precision over file size reduction.

Finally, video encoding helps guarantee compatibility across various devices and platforms. Different devices and media players don't support all video formats and codecs, so encoding video files into specific formats is often necessary to ensure playback across devices. For example, the Common Media Application Format (CMAF) can guarantee playback in live-streaming apps on both Apple and Windows devices.

Key Concepts in Video Encoding

Now that we've looked at why video encoding is needed, let's explore two of the most important terms related to encoding: codecs and containers. These are two essential pieces in the puzzle of video encoding. They're related but not the same. To understand the difference, let's look at their definitions and applications.

Video Codecs

A codec, short for coder/decoder, is a tool used to compress and decompress digital data, enabling it to be efficiently transmitted and received by various applications. It converts raw video and audio data between analog and digital formats, reducing the file size to optimize storage and playback.

Codecs can be hardware devices or software programs. They encode and decode digital signals. Although separate codecs exist for video and audio, they compress and decompress data similarly, facilitating more efficient bandwidth and storage use. MP3 is one of the most common audio codecs available, and you've almost certainly encountered it.

Though device compatibility is one of the primary purposes of video encoding, not all devices support every codec. Everyone knows the frustration of trying to download a video only to encounter playback issues. This can happen when streaming software isn't compatible with the codec used to encode that video.

Selecting an appropriate codec is largely a matter of the application at hand. Certain lossy codecs are better for reducing file size and optimizing streaming bandwidth. Lossless codecs prioritize efficient transmission, emphasizing video quality above all else.

Video Containers

A video container is an essential encoding format component that combines audio and video codecs and relevant metadata into a single file. Container formats dictate how audio, video, and sometimes text tracks, such as captions, are organized within a file, allowing them to function cohesively despite being encoded with different codecs.

Different containers work with different codecs and video players, so broadcasters must select containers compatible with their entire streaming setup.

The names of file formats often reflect the containers they utilize; for example, a file with a .mov extension indicates it uses a MOV container. A standard container type is the MP4, typically containing the H.264 codec packaged within, resulting in files with the .mp4 extension. It is important to note that both codecs and containers can have similar names, such as "MPEG," so don't be confused by this overlapping naming convention.

Compression Methods

Compression is necessary for playback optimization across devices. By balancing file size and quality, compression allows for viewing on devices with varying processing power and network capabilities. This is particularly important for smooth live streaming, where encoded video can adjust to different bandwidth conditions to maintain playback quality.

Lossy Compression

The most common form of compression in video encoding, lossy compression, results in a loss of image quality. It uses methods that simplify the data in a file by retaining only essential parts, leading to smaller file sizes but reduced video quality.

In optimal cases, the loss of image quality may be unnoticeable to viewers, depending on the level of compression used. Still, videos using lossy compression may often appear pixelated or fuzzy.

Lossless Compression

Lossless compression maintains the full quality of the original video file by replicating every single piece of data without any alterations or omissions. While a less common approach, lossless compression is essential in its niche applications.

For instance, the uncompressed video format known as Clean HDMI is ideal for transmitting video to a display via an HDMI connection. Additionally, several high-end cameras can record video directly in this format.

Common Video Codecs

H.264 (AVC)

H.264 is one of the most common codecs. Also called MPEG-4 AVC or AVC (Advanced Video Coding), it provides a cost-efficient method for compressing video and has since become the basis for many later codecs.

Its popularity is mainly due to its compatibility with most platforms and devices. The x264 encoder, specifically, is known for effectively balancing video quality, fast encoding speed, and small file size. Adapted by both Netflix and YouTube, H.264 remains an extremely popular choice in video compression technologies.

H.265 (HEVC)

H.265, also known as High-Efficiency Video Coding (HEVC), was the successor to H.264. It significantly improved video compression over its predecessor, yielding reduced file sizes that allow smoother video streaming at lower bandwidths.

H.265 is, therefore, ideal for high-definition formats like 4K and 8K video and streaming platforms where maintaining high video quality is necessary.

Choosing the Right Codec

Choosing the right codec depends on the specific requirements of the use case and the devices involved. Some codecs are selected for their high compression efficiency, reducing file sizes while maintaining acceptable quality, which is important for live streaming or limited bandwidth environments.

Other codecs prioritize maintaining superior video quality with less compression, which may be necessary for professional editing or broadcasting. In certain cases, such as archival storage or specific technical applications, codecs without any compression are required to preserve the original content without loss.

Best Practices for Video Encoding

Typically, the average person employing video encoding relies on default or pre-selected settings provided by the software. Most consumer-level video encoding tools simplify the process, making it easier for non-experts to choose preset options like "High Quality" or "Small File Size" without diving into technical details like bitrate or frame rate.

However, customizable encoding options are often available in more advanced software (like Adobe Media Encoder, HandBrake, and FFmpeg) and professional situations, such as video editing, streaming platforms, or media production. These may include selecting codecs, resolution, frame rate, and bitrate to optimize video quality, file size, or compatibility with specific devices.

The best encoders generally offer adaptable settings, allowing users to fine-tune encoding parameters to meet specific requirements, whether maximizing quality for high-resolution videos or reducing file size for efficient streaming and storage.

Bitrate and Bit Depth Selection

Bitrate refers to the amount of data stored per second of video and is typically measured in Kbps, Mbps, or Gbps, depending on the video's size and quality.

Bit depth, on the other hand, indicates the number of bits within each pixel. 8-bit is the standard, and 10-bit is used for high-quality video formats like UHD 4K Blu-rays.

Adjusting these settings often requires experimentation to find the right balance for one's needs. Luckily, many technologies are available that help streamline the process.

Adaptive bitrate streaming automatically adjusts the bitrate based on the strength of the internet connection.
HTTP Live Streaming (HLS) is a universal protocol developed by Apple that's used with HTML5 players.
MPEG-DASH is an adaptive streaming solution that breaks up videos into chunks, each with its own bitrate.
Constant Bitrate (CBR) maintains a stable bitrate throughout the viewing experience in live audio room and video conferencing applications.

Resolution Guidelines

Resolution measures the number of pixels within each frame. Higher-resolution videos have more pixels and, therefore, greater detail. As a result, higher resolutions tend to improve the perceived quality of the video. But, higher resolution can also require larger file sizes, more streaming bandwidth, or storage space.

Frame Rate Considerations

From the earliest days of film, aka "moving pictures," videos were a series of still images creating the illusion of motion. Frame rate measures how quickly those images, or frames, are displayed, usually in frames per second (FPS).

There is no ideal frame rate; instead, the appropriate frame rate depends on the subject and the intended quality or realism of the video. Different content types may require varying frame rates to achieve the desired visual effect.

Frequently Asked Questions

What are the most common uses for video encoding?

The most common use for video encoding is compressing data and reducing file sizes to make videos easier to store, share, and stream.

It improves streaming performance by lowering bandwidth requirements, leading to smoother playback with less buffering, even on limited connections. Encoding can adapt videos to different screen sizes by adjusting the resolution and aspect ratio, ensuring optimal display quality.

It can adjust audio formats and quality to meet regional or device requirements, enhancing overall compatibility.

What are the most popular video encoding formats?

The most popular video encoding formats include H.264, H.265, and VP9.

H.264 is popular because of its wide compatibility and predictable royalty costs.
H.265, an improved version of H.264, offers better compression while maintaining video quality and reduced bandwidth for streaming.
VP9 is a royalty-free codec developed by Google, which follows VP8 and competes with the HEVC standard.

Which is the best format?

There's no single "best" video format. The choice depends on the field of use, purpose, and the devices involved. For general use, H.264/AVC is often considered the best overall for balancing quality and compatibility. It works across most platforms and devices, making it a popular choice.

How long does encoding take?

Encoding is the most time-consuming aspect of video processing. The duration depends on the video's length, file size, and the codec's complexity. A short video may take only a few seconds to encode, whereas long videos with large file sizes and advanced codecs could take several hours.

What is still to come in the field of video encoding?

The Multi-Codec Approach

The current landscape primarily relies on a few established codecs, such as AAC for audio and H.264 for video, which ensures broad compatibility. H.264 serves as a fallback for older devices, while VP9 lacks support on iOS, and AV1 is not yet ready for widespread production. This creates a scenario where relying on a single codec is insufficient, underscoring the need for a multi-codec strategy to accommodate various use cases and device requirements.

Versatile Video Coding (VVC)

One of the most promising developments is Versatile Video Coding, a new standard established by MPEG. VVC aims to enhance encoding efficiency by up to 50% compared to High-Efficiency Video Coding (HEVC). This new standard supports a range of applications, including gaming, screen sharing, video chat, 360-degree video, and dynamic resolution switching.

Essential Video Coding (EVC)

Another noteworthy advancement is Essential Video Encoding. EVC seeks to match the efficiency of HEVC while facilitating real-time encoding and decoding. This standard is tailored for offline encoding, making it suitable for Video on Demand (VOD) and live Over-The-Top (OTT) streaming.

Additionally, EVC aims to be a "licensing-friendly" codec, further enhancing its appeal in an industry that increasingly values accessibility and flexibility.