WebRTC For The Brave

A deep dive into SDP structure and munging

As discussed in module 1, Working With SDP Messages, SDP (Session Description Protocol) is a fundamental part of real-time communication systems. SDP is key in negotiating sessions between peers, outlining essential details such as codecs, media types, and network configurations. In this lesson, we will delve deeply into the structure of SDP and explore the fascinating process known as SDP munging.

Understanding SDP Structure

SDP is a text-based protocol by the IETF (Internet Engineering Task Force) in RFC 4566 that offers a standardized format for endpoints to articulate multimedia sessions. This protocol is pivotal for various functions such as session announcements, session invitations, and initiating other forms of multimedia sessions.

Let's break down the essential parts of an SDP:

1. Session Description

If you've attempted to debug your WebRTC application in Chrome, you may have encountered the session description, resembling the text below, through the chrome://webrtc-internals/ tool.

The session description comprehensively encapsulates key details of the multimedia session. It encompasses information such as the session name, timing information, and essential connection details. Each character encapsulated within the session description holds specific information, as illustrated below:

  • v: Protocol version.
  • o: Originator of the session (username, session ID, version, network type, address type, and address) and session identifier.
  • s: Session name.
  • i: Session information.
  • u: URI of description.
  • c: Connection information (network type, address type, and connection address).
  • t: Timing information (start time and stop time).
  • m: Media name and transport address (media description).
  • a: Zero or more media attribute lines.

This might resemble something out of cryptography, right? Now, you can better appreciate the aptly named Session Description Protocol, which provides a standardized format for endpoints in WebRTC.

2. Understanding Media Descriptions in Session Descriptions

A session description can include multiple media descriptions. Each media description begins with an m (media) field, followed by a sequence of detailed information as described below:

Let's dissect each sub-field of the media description in a session description:

  • media: This field specifies the type of media and can be designated as “audio”, “video”, “text”, “application”, or “message”.
  • port: Indicates the transport port to which the media stream will be sent.
  • proto: Represents the transport protocol. This is contingent upon the address type mentioned in the corresponding c (connection information) field. Common protocol types include “UDP”, “RTP/AVP,” and “RTP/SAVP”.
  • fmt: Details the media format. For instances where the sub-field is "RTP/AVP" or "RTP/SAVP", the sub-fields will contain RTP (Real-Time Protocol) payload type numbers.

Therefore, in a real-world example, it would appear as the text outlined below:

For a deeper understanding of each format type, you can refer to the RFC 4566 documentation, which offers detailed insights.

Munging in SDP

At times, it becomes necessary to intentionally alter SDP messages to accomplish specific results that aren't directly facilitated by standard WebRTC APIs. This is where tweaking SDP messages, commonly known as SDP munging is used.

Munging involves manually manipulating or tweaking SDP parameters to suit various network conditions, codec preferences, or other changing factors. This adaptation process is crucial for ensuring optimal communication between peers in WebRTC.

However, when munging SDP messages, there are several important points to keep in mind:

  • Risks and Unexpected Behaviors: While SDP munging provides flexibility, it also carries certain risks. Incorrect munging can result in issues such as incompatibilities, session failures, and unexpected behaviors across different browsers or environments. These issues can lead to deviations from the standard WebRTC behavior, potentially causing interoperability problems.
  • Future Compatibility: Munging is effective only if it aligns with the specific version and protocol you are targeting. However, as WebRTC standards continue to evolve, munged SDP may not stay compatible with newer versions or updates. This incompatibility can lead to a higher cost and effort for migrating to newer versions.
  • Complexity and Maintenance: Munging can increase the complexity of WebRTC applications, making it challenging to trace modifications in the SDP and complicating troubleshooting efforts. Consequently, this can lead to difficulty maintaining your WebRTC application as identifying and resolving issues becomes more intricate.

SDP munging provides WebRTC communication flexibility but adds complexity to your application. It can lead to unexpected behaviors and make maintenance challenging. Therefore, it's crucial to have well-documented munging processes with clear reasons to ensure alignment with your team or for future reference. This approach is essential for minimizing confusion and streamlining future updates or troubleshooting.

SDP Munging in Real-World Scenarios

Exploring practical use cases, Stream Video SDK provides global edge network solutions for implementing video/audio calls and live streams seamlessly. Specifically, the stream-video-js SDK leverages SDP munging. Let's delve into how this technology is applied in real-world scenarios.

In the sdp-munging.ts file, SDP munging is employed for various configurations such as setting up preferred codecs, removing unwanted codecs, toggling DTX, and enhancing audio quality. Let's examine these use cases one by one.

1. Setting Up Preferred Codec

When both peers in a communication session support multiple codecs, the arrangement of these codecs in the SDP message plays a crucial role in determining the codec that will be ultimately used. Prioritizing a preferred codec by placing it at the top of the list in the SDP increases the chances of it being selected. To achieve this, Stream SDK employs SDP munging, a technique to reorder codecs, thereby giving higher priority to preferred ones, as demonstrated in the following code example:

For instance, if we aim to prioritize the VP8 codec, we can modify the media description line in the Session Description Protocol (SDP). Consider the original line:

To prioritize the VP8 codec, represented by payload type 96, we rearrange the order of the payload types so that 96 comes first:

This alteration in the SDP line effectively prioritizes the VP8 codec for video transmission.

Now, let's dissect the process step by step, particularly if you wish to prioritize the VP8 codec in your WebRTC session:

  1. Identify the Video Media Specification: First, locate the video media specification in the SDP, which in this example is m=video 9 UDP/TLS/RTP/SAVPF 100 101 96 97 35 36 102 125 127. As discussed previously, the m media field follows the structure m= . The numbers listed are payload type identifiers and correspond to specific codecs or formats used for the video stream. For instance, the identifier 96 might represent the VP8 codec, while 97 could be for VP9.
  2. Find the Specified Codec (VP8): Search within the m field for the entry related to VP8, which could look like a=rtpmap:96 VP8/90000.
  3. Extract the Identifier for VP8: From the above step, identify “96” as the payload type number representing VP8.
  4. Reorder the m Field: Move the identifier 96 to the beginning of the list in the m field. This prioritizes the VP8 codec over others.
  5. Updated Media Specification: After reordering, the media specification line would now appear as m=video 9 UDP/TLS/RTP/SAVPF 96 100 101 97 35 36 102 125 127. This new order signals that VP8 is the preferred codec for the video stream.

By following these steps, you effectively prioritize the VP8 codec in your WebRTC session, which can influence the codec selection during the negotiation process between peers, potentially leading to improved compatibility, performance, or other desired outcomes.

2. Toggling DTX

In WebRTC, DTX (stands for Discontinuous Transmission), refers to a technique employed by audio codecs to lower bitrates during times of silence or inactivity. Rather than sending a continuous audio stream when no one is speaking, DTX enables the system to either stop sending data or transmit data packets at significantly reduced rates. This approach is highly effective in conserving bandwidth and minimizing network load, as it ensures that less data is sent when no significant audio is transmitted.

DTX toggling in WebRTC becomes necessary or advantageous in various scenarios, particularly for optimizing network resource utilization and improving the overall quality of communication. To facilitate this, Stream SDK implements SDP munging, a technique that allows for the modification of SDP messages to enable or disable DTX as per the requirements of the communication session. This approach ensures efficient use of network bandwidth and enhances audio transmission quality in WebRTC applications, as demonstrated in the following code example:

The provided code snippet demonstrates how to toggle Discontinuous Transmission (DTX) in WebRTC by modifying the Session Description Protocol (SDP). Here's a breakdown of the process in five steps:

  1. Function Definition: The function toggleDtx is defined, which takes two parameters: sdp, a string representing the SDP, and enable, a boolean indicating whether DTX should be enabled or disabled.
  2. Retrieving Opus Codec Configuration: The function calls getOpusFmtp(sdp) to retrieve the FMTP (Format Parameters) line for the Opus codec from the SDP. This line contains specific configurations for the codec, including whether DTX is used.
  3. Checking and Modifying DTX Setting:
    • The code checks if the Opus codec configuration (opusFmtp) exists.
    • It then searches for the usedtx parameter within this configuration using a regular expression (/usedtx=(\\d)/). This parameter indicates the current state of DTX (enabled or disabled).
    • Depending on whether DTX is already configured (matchDtx), the code either modifies the existing usedtx setting or appends it to the FMTP line.
  4. Updating the SDP:
    • The function constructs a new FMTP line (newFmtp) with the updated DTX setting (requiredDtxConfig).
    • The SDP is then updated by replacing the original FMTP line with the new one.
  5. Returning the Modified SDP: The function returns the modified SDP string with the updated DTX setting. If the Opus codec configuration was not found in the original SDP, it returns the SDP unchanged.

In summary, this function enables or disables DTX for the Opus codec in a WebRTC session by modifying the relevant configuration in the SDP and then returns the updated SDP.

So, you've explored how SDP munging is applied in real-world applications through the Stream Video SDK. For additional advanced use cases, especially if you're interested in delving deeper, consider reviewing the sdp-munging.ts file in Stream Video SDK for JavaScript. This file offers more insights and examples of SDP munging in action, providing a practical perspective on its application in various scenarios.

Conclusion

In this lesson, you've deeply understood SDP, a fundamental concept in WebRTC implementation. Now, with a better grasp of session description messages that are exchanged between peers, you will find it easier to trace and manage these crucial components.