Redundant Audio Data (RED)

Redundant Audio Data (RED)

Before we understand RED (REDundant Audio Data), we need to understand the underlying protocol which is modified to add RED: the Real-time Transport Protocol or RTP.

What is RTP?

The Real-time Transport Protocol or RTP, is a network protocol designed for the real-time transmission of audio and video over IP networks. It operates at the transport layer and is commonly used in applications such as voice-over IP (VoIP), video conferencing, and streaming media. RTP provides a framework for the delivery of real-time data, with mechanisms for time-stamping, sequence numbering, and payload identification. It works in conjunction with the RTP Control Protocol (RTCP) to monitor the quality of the data delivery and provide feedback on network conditions. RTP does not guarantee real-time delivery or quality of service, but it does offer features that help in the reconstruction of real-time media at the receiver, including time-stamping for synchronization and sequence numbering for packet ordering.

Understanding RED

The idea behind RED (REDundant audio data) is to transmit the same audio information in multiple payloads within the RTP stream. If the primary audio data payload is lost or corrupted during transmission, the receiver can use the lower-quality redundant information instead to replace the missing or damaged portion, improving the overall quality and completeness of the audio stream.

RED is not a core feature of RTP but rather an optional add-on to improve audio quality, particularly in unreliable network conditions. RED was first proposed through a paper in 1997. RED is not enabled by default in RTP, as it consumes extra bandwidth. It's typically used when high audio quality is crucial or when network conditions are unpredictable.

Since RED is not defined by default in RTP, there needs to be a way to communicate that the dynamic payload in the RTP packet contains redundant data. The SDP (Session Description Protocol) contains information about using RED and is added as an additional payload type. A header of 6 bytes in the packet before both payloads defines the primary payload and the additional redundant information.

The rtpmap attribute in the SDP can define a particular codec, sample rate, and the number of channels. An example of its use is:

m=audio 12345 RTP/AVP 121 0 5
a=rtpmap:121 red/8000/1

For more information on the Session Description Protocol, check out our previous lesson on SDP.

Packet Structure in RTP + RED

The following is a diagram of an RTP packet containing redundant audio data:

Key Features of the RTP Packet Structure:

  • Sequencing: Sequence numbers enable reordering and error detection, ensuring correct playback.
  • Timestamps: Facilitate synchronization of different media streams for a seamless experience.
  • Payload Types: Offer flexibility by supporting multiple media formats within a single session.
  • Header Extensions: Allow adapting to specific applications and network requirements.

The RTP (Real-time Transport Protocol) packet structure plays a crucial role in delivering real-time media data like audio and video efficiently and reliably. It has a well-defined format consisting of three main parts:

1. Fixed Header (12 bytes minimum):

  • Version (2 bits): Identifies the RTP version (currently 2).
  • Padding (1 bit): Indicates if additional bytes are added at the end to reach a certain payload size, optimizing network transmission.
  • Extension (1 bit): Signals if optional header extensions are present, providing additional information.
  • CSRCC count (4 bits): Defines the number of contributing sources for the data (usually 0 for single-source streams, 1 for multiplexed streams).
  • Marker (1 bit): Flags the end of a media unit within the payload, aiding in synchronization with other streams.
  • Payload Type (7 bits): Identifies the type of media data encoded in the payload (e.g., audio, video, text). Different formats have assigned payload types for standardized interpretation.
  • Sequence Number (16 bits): Uniquely identifies a packet within a media stream, enabling packet loss detection and reordering. It gets incremented with each sent packet.
  • Timestamp (32 bits): Provides the temporal location of the media data within the stream, crucial for synchronized audio and video playback.
  • SSRC (32 bits): Synchronization Source identifier uniquely identifies the source of the media stream, allowing multiplexing of different streams from the same source.

2. Optional Header Extensions (variable size):

  • RTP allows extensions to the header for functionalities like jitter compensation, encryption, and header authentication to be added. These extensions are negotiated during session setup and depend on specific application needs.

3. Payload (variable size):

  • This section carries the actual media data encoded in the format indicated by the payload type. For instance, an audio payload might contain raw audio samples, while a video payload could hold compressed video frames.
  • In this case, we use RED to transmit two copies of audio data: a DVI4 primary and a single block of redundancy encoded using 8KHz LPC (both 20ms packets), as defined in the RTP profile (through the audio and rtpmap variables).

How RED Works

  1. Encoding:
    • The original audio packet is encoded using a codec like Opus.
    • One or more redundant copies of the packet are created, typically with lower quality or bitrate.
    • These redundant copies are smaller in size than the original packet.
  2. Packaging:
    • The original packet and its redundant copies are bundled into a single RTP packet using the RED format.
    • The RED header within the packet indicates the number of redundant copies and their relative importance.
  3. Transmission:
    • The RTP packet with the RED payload is sent over the network.
  4. Reception:
    • The receiver first attempts to decode the original packet.
    • If the original packet is lost or corrupted, the receiver tries to decode one of the redundant copies.
    • The receiver uses the first successfully decoded packet, discarding the others.

Implementing RED with SFUs

In general, if all clients connected to an SFU with or without cascading support RED, there is no need to do anything special - streams can be forwarded from one client to another without changing any characteristics. However, if SFUs can be connected to RED and non-RED clients, several cases can be covered.

  • RED sender to RED receiver: Since both the sender and receiver can encode and decode RED streams, this is an easy process, and streams can be forwarded as they are.

  • Non-RED sender to Non-RED receiver: Neither the sender nor the receiver expects to transmit a RED stream. Hence, streams can again be forwarded as they are.

  • Non-RED sender to RED receiver: There are several approaches to handle this scenario. One common approach to solve this is to buffer audio packets from the sender on the SFU side and combine them to create RED packets for the receiver.

  • RED sender to non-RED receiver: The SFUs in the middle of the sender and receiver must produce RTP packets that a non-RED receiver can understand. Removing the redundant payload is the main way to tackle this. This is easy to handle if every RTP packet has an equal number of redundant packets. However, since the codec might choose to add redundant packets only for times when voice was detected, this is not easy to handle.

Advantages of RED

  • Improved audio quality: Redundancy helps mitigate the effects of packet loss, leading to clearer and more consistent audio.
  • Reduced artifacts: Audio artifacts like clicks, pops, and distortions are less noticeable when RED is used.
  • Enhanced user experience: Listeners experience smoother and more reliable audio, even in challenging network conditions.

Disadvantages of RED

  • Increased Bandwidth Consumption: Sending redundant copies of audio data consumes more bandwidth, which can be a limitation in scenarios with limited network capacity.
  • Potential Latency Impact: Processing and transmitting additional audio data adds to the overall processing load, potentially increasing latency.

Conclusion

In this lesson, you learned about RTP, a real-time transport protocol for video and audio, and RED, an addition to RTP that allows for redundancy and more reliable data in bad network conditions.