As discussed in module 1, Working With SDP Messages, SDP (Session Description Protocol) is a fundamental part of real-time communication systems. SDP is key in negotiating sessions between peers, outlining essential details such as codecs, media types, and network configurations. In this lesson, we will delve deeply into the structure of SDP and explore the fascinating process known as SDP munging.
Understanding SDP Structure
SDP is a text-based protocol by the IETF (Internet Engineering Task Force) in RFC 4566 that offers a standardized format for endpoints to articulate multimedia sessions. This protocol is pivotal for various functions such as session announcements, session invitations, and initiating other forms of multimedia sessions.
Let's break down the essential parts of an SDP:
1. Session Description
If you've attempted to debug your WebRTC application in Chrome, you may have encountered the session description, resembling the text below, through the chrome://webrtc-internals/
tool.
v=0
o=jdoe 2890844526 2890842807 IN IP4 10.47.16.5
s=SDP Seminar
i=A Seminar on the session description protocol
u=http://www.example.com/seminars/sdp.pdf
e=j.doe@example.com (Jane Doe)
c=IN IP4 224.2.17.12/127
t=2873397496 2873404696
a=recvonly
m=audio 49170 RTP/AVP 0
m=video 51372 RTP/AVP 99
a=rtpmap:99 h263-1998/90000
The session description comprehensively encapsulates key details of the multimedia session. It encompasses information such as the session name, timing information, and essential connection details. Each character encapsulated within the session description holds specific information, as illustrated below:
v
: Protocol version.o
: Originator of the session (username, session ID, version, network type, address type, and address) and session identifier.s
: Session name.i
: Session information.u
: URI of description.c
: Connection information (network type, address type, and connection address).t
: Timing information (start time and stop time).m
: Media name and transport address (media description).a
: Zero or more media attribute lines.
This might resemble something out of cryptography, right? Now, you can better appreciate the aptly named Session Description Protocol, which provides a standardized format for endpoints in WebRTC.
2. Understanding Media Descriptions in Session Descriptions
A session description can include multiple media descriptions. Each media description begins with an m
(media) field, followed by a sequence of detailed information as described below:
m=<media> <port> <proto> <fmt> ...
Let's dissect each sub-field of the media description in a session description:
media
: This field specifies the type of media and can be designated as “audio”, “video”, “text”, “application”, or “message”.port
: Indicates the transport port to which the media stream will be sent.proto
: Represents the transport protocol. This is contingent upon the address type mentioned in the correspondingc
(connection information) field. Common protocol types include “UDP”, “RTP/AVP,” and “RTP/SAVP”.fmt
: Details the media format. For instances where the<proto>
sub-field is "RTP/AVP" or "RTP/SAVP", the<fmt>
sub-fields will contain RTP (Real-Time Protocol) payload type numbers.
Therefore, in a real-world example, it would appear as the text outlined below:
m=video 49170/2 RTP/AVP 31
For a deeper understanding of each format type, you can refer to the RFC 4566 documentation, which offers detailed insights.
Munging in SDP
At times, it becomes necessary to intentionally alter SDP messages to accomplish specific results that aren't directly facilitated by standard WebRTC APIs. This is where tweaking SDP messages, commonly known as SDP munging is used.
Munging involves manually manipulating or tweaking SDP parameters to suit various network conditions, codec preferences, or other changing factors. This adaptation process is crucial for ensuring optimal communication between peers in WebRTC.
However, when munging SDP messages, there are several important points to keep in mind:
- Risks and Unexpected Behaviors: While SDP munging provides flexibility, it also carries certain risks. Incorrect munging can result in issues such as incompatibilities, session failures, and unexpected behaviors across different browsers or environments. These issues can lead to deviations from the standard WebRTC behavior, potentially causing interoperability problems.
- Future Compatibility: Munging is effective only if it aligns with the specific version and protocol you are targeting. However, as WebRTC standards continue to evolve, munged SDP may not stay compatible with newer versions or updates. This incompatibility can lead to a higher cost and effort for migrating to newer versions.
- Complexity and Maintenance: Munging can increase the complexity of WebRTC applications, making it challenging to trace modifications in the SDP and complicating troubleshooting efforts. Consequently, this can lead to difficulty maintaining your WebRTC application as identifying and resolving issues becomes more intricate.
SDP munging provides WebRTC communication flexibility but adds complexity to your application. It can lead to unexpected behaviors and make maintenance challenging. Therefore, it's crucial to have well-documented munging processes with clear reasons to ensure alignment with your team or for future reference. This approach is essential for minimizing confusion and streamlining future updates or troubleshooting.
SDP Munging in Real-World Scenarios
Exploring practical use cases, Stream Video SDK provides global edge network solutions for implementing video/audio calls and live streams seamlessly. Specifically, the stream-video-js SDK leverages SDP munging. Let's delve into how this technology is applied in real-world scenarios.
In the sdp-munging.ts file, SDP munging is employed for various configurations such as setting up preferred codecs, removing unwanted codecs, toggling DTX, and enhancing audio quality. Let's examine these use cases one by one.
1. Setting Up Preferred Codec
When both peers in a communication session support multiple codecs, the arrangement of these codecs in the SDP message plays a crucial role in determining the codec that will be ultimately used. Prioritizing a preferred codec by placing it at the top of the list in the SDP increases the chances of it being selected. To achieve this, Stream SDK employs SDP munging, a technique to reorder codecs, thereby giving higher priority to preferred ones, as demonstrated in the following code example:
export const setPreferredCodec = (
sdp: string,
mediaType: 'video' | 'audio',
preferredCodec: string,
) => {
const section = getMediaSection(sdp, mediaType);
if (!section) return sdp;
const rtpMap = section.rtpMap.find(
(r) => r.codec.toLowerCase() === preferredCodec.toLowerCase(),
);
const codecId = rtpMap?.payload;
if (!codecId) return sdp;
const newCodecOrder = moveCodecToFront(section.media.codecOrder, codecId);
return sdp.replace(
section.media.original,
`${section.media.mediaWithPorts} ${newCodecOrder}`,
);
};
For instance, if we aim to prioritize the VP8 codec, we can modify the media description line in the Session Description Protocol (SDP). Consider the original line:
m=video 9 UDP/TLS/RTP/SAVPF 100 101 **96** 97 35 36 102 125 127
To prioritize the VP8 codec, represented by payload type 96, we rearrange the order of the payload types so that 96 comes first:
m=video 9 UDP/TLS/RTP/SAVPF **96** 100 101 97 35 36 102 125 127
This alteration in the SDP line effectively prioritizes the VP8 codec for video transmission.
Now, let's dissect the process step by step, particularly if you wish to prioritize the VP8 codec in your WebRTC session:
- Identify the Video Media Specification: First, locate the video media specification in the SDP, which in this example is
m=video 9 UDP/TLS/RTP/SAVPF 100 101 96 97 35 36 102 125 127
. As discussed previously, them
media field follows the structurem=<media> <port> <proto> <fmt>
. The numbers listed are payload type identifiers and correspond to specific codecs or formats used for the video stream. For instance, the identifier 96 might represent the VP8 codec, while 97 could be for VP9. - Find the Specified Codec (VP8): Search within the
m
field for the entry related to VP8, which could look likea=rtpmap:96 VP8/90000
. - Extract the Identifier for VP8: From the above step, identify “96” as the payload type number representing VP8.
- Reorder the
m
Field: Move the identifier 96 to the beginning of the list in them
field. This prioritizes the VP8 codec over others. - Updated Media Specification: After reordering, the media specification line would now appear as
m=video 9 UDP/TLS/RTP/SAVPF 96 100 101 97 35 36 102 125 127
. This new order signals that VP8 is the preferred codec for the video stream.
By following these steps, you effectively prioritize the VP8 codec in your WebRTC session, which can influence the codec selection during the negotiation process between peers, potentially leading to improved compatibility, performance, or other desired outcomes.
2. Toggling DTX
In WebRTC, DTX (stands for Discontinuous Transmission), refers to a technique employed by audio codecs to lower bitrates during times of silence or inactivity. Rather than sending a continuous audio stream when no one is speaking, DTX enables the system to either stop sending data or transmit data packets at significantly reduced rates. This approach is highly effective in conserving bandwidth and minimizing network load, as it ensures that less data is sent when no significant audio is transmitted.
DTX toggling in WebRTC becomes necessary or advantageous in various scenarios, particularly for optimizing network resource utilization and improving the overall quality of communication. To facilitate this, Stream SDK implements SDP munging, a technique that allows for the modification of SDP messages to enable or disable DTX as per the requirements of the communication session. This approach ensures efficient use of network bandwidth and enhances audio transmission quality in WebRTC applications, as demonstrated in the following code example:
export const toggleDtx = (sdp: string, enable: boolean): string => {
const opusFmtp = getOpusFmtp(sdp);
if (opusFmtp) {
const matchDtx = /usedtx=(\d)/.exec(opusFmtp.config);
const requiredDtxConfig = `usedtx=${enable ? '1' : '0'}`;
if (matchDtx) {
const newFmtp = opusFmtp.original.replace(
/usedtx=(\d)/,
requiredDtxConfig,
);
return sdp.replace(opusFmtp.original, newFmtp);
} else {
const newFmtp = `${opusFmtp.original};${requiredDtxConfig}`;
return sdp.replace(opusFmtp.original, newFmtp);
}
}
return sdp;
};
The provided code snippet demonstrates how to toggle Discontinuous Transmission (DTX) in WebRTC by modifying the Session Description Protocol (SDP). Here's a breakdown of the process in five steps:
- Function Definition: The function
toggleDtx
is defined, which takes two parameters:sdp
, a string representing the SDP, andenable
, a boolean indicating whether DTX should be enabled or disabled. - Retrieving Opus Codec Configuration: The function calls
getOpusFmtp(sdp)
to retrieve the FMTP (Format Parameters) line for the Opus codec from the SDP. This line contains specific configurations for the codec, including whether DTX is used. - Checking and Modifying DTX Setting:
- The code checks if the Opus codec configuration (
opusFmtp
) exists. - It then searches for the
usedtx
parameter within this configuration using a regular expression (/usedtx=(\\d)/
). This parameter indicates the current state of DTX (enabled or disabled). - Depending on whether DTX is already configured (
matchDtx
), the code either modifies the existingusedtx
setting or appends it to the FMTP line.
- The code checks if the Opus codec configuration (
- Updating the SDP:
- The function constructs a new FMTP line (
newFmtp
) with the updated DTX setting (requiredDtxConfig
). - The SDP is then updated by replacing the original FMTP line with the new one.
- The function constructs a new FMTP line (
- Returning the Modified SDP: The function returns the modified SDP string with the updated DTX setting. If the Opus codec configuration was not found in the original SDP, it returns the SDP unchanged.
In summary, this function enables or disables DTX for the Opus codec in a WebRTC session by modifying the relevant configuration in the SDP and then returns the updated SDP.
So, you've explored how SDP munging is applied in real-world applications through the Stream Video SDK. For additional advanced use cases, especially if you're interested in delving deeper, consider reviewing the sdp-munging.ts file in Stream Video SDK for JavaScript. This file offers more insights and examples of SDP munging in action, providing a practical perspective on its application in various scenarios.
Conclusion
In this lesson, you've deeply understood SDP, a fundamental concept in WebRTC implementation. Now, with a better grasp of session description messages that are exchanged between peers, you will find it easier to trace and manage these crucial components.