Media Resilience in WebRTC
Media resilience in WebRTC refers to the ability of the system to maintain communication quality and reliability even in challenging network conditions. This is usually done through specific algorithms before, during, and post transmission of data packets. Ideally, the focus should be on not losing data packets in the first place. However, these algorithms focus on not losing data and ensuring a smooth user experience, even in cases where data is lost.
Why Add Correction Algorithms?
Every WebRTC project has to expect and deal with potential packet loss. Since packet loss is inevitable, information to deal with it can be sent proactively via RED and FEC (seen later). If packets are lost during transmission, retries can be added to the system, and packets can be sent again via RTX (seen later). If both these methods don’t work, packet loss can be made less obvious by concealing it and filling the missing frames with interpolated data to make it feel more natural. The following part of this lesson thoroughly analyzes the aforementioned algorithms.
Before Transmission: RED and FEC
Anticipating packet loss ahead of time, the RED and FEC methods focus on adding redundant information to the data stream to enable the receiver to reconstruct some part of the frame or the whole frame in lower quality.
A Refresher on RED
The idea behind the Redundant Audio Data (RED) approach is to transmit the same audio information in multiple payloads within the RTP stream. If the primary audio data payload is lost or corrupted during transmission, the receiver can use the lower-quality redundant information instead to replace the missing or damaged portion, improving the overall quality and completeness of the audio stream.
Understanding FEC
Forward Error Correction (FEC) is a technique used in WebRTC to enhance the reliability of data transmission by adding redundant information. The purpose of FEC is to allow the receiver to correct errors in the received data without the need for retransmission. This is particularly useful in real-time communication scenarios where re-transmitting data could introduce unacceptable delays. In this article, we will focus on Opus FEC, built into the Opus Audio Codec. For more information on the Opus codec, check out our previous lesson on codecs. More information about FEC can be found in the RFC 8854 standard.
Opus FEC, which stands for Opus Forward Error Correction, is a technology built into the Opus audio codec that helps mitigate the effects of packet loss during audio transmission. FEC works by including a low-bitrate encoding of the previous audio frame within the next audio frame. If a packet containing audio data is lost, the receiver can still use the redundancy information in the following packet to partially reconstruct the missing audio data, reducing audio glitches and dropouts.
- Redundancy Generation: The sender applies an FEC algorithm to add redundant information before sending media data packets. This might involve creating additional packets based on XOR calculations with the original data, employing Reed-Solomon codes, or other mathematical approaches.
- Packet Transmission: Both the original data packets and the generated FEC packets are sent across the network to the receiver.
- Recovery at the Receiver: Upon receiving the data stream, the receiver checks for missing or corrupted packets. If some packets are missing, the receiver can utilize the redundant information in the FEC packets to reconstruct the missing data and complete the media stream.
During Transmission: Retransmission via RTP (RTX)
Retransmission (RTX) is a mechanism to recover lost or damaged packets by retransmitting them. RTX is defined in the RFC 4588 standard, for which you can find the paper here. RTX works alongside FEC and other mechanisms to enhance the reliability of real-time communication. It is especially useful in scenarios where FEC does not adequately address packet loss, and retransmitting the missing packets is deemed more efficient than relying solely on error correction.
Following is the structure of the RTX packet:
The OSN or Original Sequence Number identifies the missing packet that the resent packet replaces. A separate RTX stream needs to be established specifically for carrying retransmission packets. The RTX payload is dynamic. The RTX stream is linked to the original stream through an "apt" parameter in the SDP file, specifying the associated payload type.
a=fmtp: apt=;rtx-time=
The apt
(associated payload type) parameter maps the retransmission payload type to the original stream payload type. If multiple original payload types are used, multiple apt
parameters must be included to map each one to a different retransmission payload type.
An optional payload-format-specific parameter, rtx-time
, indicates the maximum time a sender will keep the original RTP packet in its buffers available for retransmission. This time starts with the initial transmission of the packet.
How RTX Works
- RTP Payload Format: As defined in the previous section, RTX defines a special format for carrying retransmitted packets. This format includes information about the original packet, such as its sequence number and timestamp, allowing the receiver to identify and correctly reassemble the media stream.
- Negative Acknowledgments (NACKs): When the receiver detects a missing packet, it sends a NACK to the sender. This NACK informs the sender about the specific packet that needs to be retransmitted.
- Retransmission: Upon receiving a NACK, the sender transmits the requested packet again using the RTX payload format. This retransmitted packet has a higher priority than regular media packets, ensuring it reaches the receiver as quickly as possible.
- Recovery at the Receiver: When the receiver receives the retransmitted packet, it can fill the gap in the media stream and continue playback without any noticeable interruption.
After Transmission: PLC
Packet Loss Concealment (PLC) is a technique used in WebRTC to mitigate the impact of packet loss on audio and video quality by concealing the effects of lost packets.
When data packets are transmitted over a network, some packets may be lost or arrive out of order due to network congestion, latency, or other issues. Losing packets can lead to glitches, artifacts, or disruptions in the media stream.
When a packet is not received, PLC attempts to reconstruct the missing or lost audio or video information. While it cannot replace the exact content of a lost packet, it can generate synthetic data to smooth out the playback and make the impact less noticeable to the end user.
Video PLC
Video PLC (VPLC) aims to mitigate the effects of dropped video packets during a call.
Here is a generalized working of Video PLC:
- Missing Packets Detected: The decoder continuously monitors the incoming video data stream. If it identifies missing packets due to network hiccups, VPLC kicks in.
- Filling the Gaps: Instead of leaving blank spaces or glitches where the missing packets should be, VPLC attempts to reconstruct the missing information. This can involve various strategies:
- Temporal Concealment: This uses information from the previous and/or next video frames to estimate the missing parts. It's like predicting what someone is saying based on the context of the conversation.
- Spatial Concealment: This analyses nearby pixels within the same frame to infer the missing values. It's like filling in the blanks in a picture based on the surrounding colors and patterns.
- Motion Compensated Interpolation: This technique uses knowledge of video motion to predict how existing pixels should move and fill in the gaps accordingly. It is similar to animating the missing pieces based on how the scene moved before the packet loss.
- Smoother Viewing: By replacing missing information with estimated data, VPLC aims to maintain a smooth and continuous video playback experience for the viewer. While the reconstructed portions might not be perfect replicas of the original data, they prevent noticeable disruptions and improve the overall quality of the video call.
Audio PLC
Audio packet loss concealment (PLC) is a technique used to minimize the impact of lost packets on the perceived audio quality. It attempts to mask the gaps left by missing data and create a smoother listening experience.
There are various ways to mask packet loss in an audio stream. Some of these are:
- Silence insertion: This is the simplest approach, where the gap is filled with silence. However, this may create a noticeable audio glitch, but is computationally efficient.
- Waveform substitution: Replaces the missing audio with a copy of the previously received packet or a nearby segment. This works better for short losses but can sound bad for longer gaps.
- Statistical methods: Analyse surrounding audio to estimate the missing content based on statistical properties of speech or music signals. This offers better quality but requires more complex processing.
- Machine learning-based methods: Utilise AI algorithms trained on real-world audio to generate more realistic concealment, potentially even synthesizing speech based on surrounding audio. This is a relatively new approach with promising results but is still under development. An example of this is Google’s generative model named WaveNetEQ.
In a previous lesson discussing Opus DTX, the SILK and CELT codecs deal with silent audio and try to create comfort noise or handle transitions between silent times and frames with audio. The same techniques can handle packet loss in an audio stream.
Conclusion
In this lesson, you explored various ways to manage packet loss in a WebRTC call. While there is no silver bullet to managing packet loss as a condition, combining these techniques can lead to the best possible experience for the users of your product.