In our Introduction to WebRTC Module two, we covered the basics of STUN and TURN servers. This lesson will delve deeper into their functionalities, implementation strategies, and best practices for effectively balancing STUN and TURN usage in production WebRTC applications.
The Challenge: NAT and Firewalls in WebRTC
Before exploring STUN and TURN in depth, it's essential to understand the core networking challenge they solve. Most devices on the internet today operate behind Network Address Translation (NAT) systems and firewalls. These technologies provide security and address conservation but create significant obstacles for peer-to-peer communication - the foundation of WebRTC.
Network Address Translation (NAT)
NAT is a technique used by routers to allow multiple devices on a private network to share a single public IP address. While efficient for outbound connections, NAT creates a fundamental challenge for WebRTC:
- Devices on private networks are assigned private IP addresses (e.g., 192.168.1.x) that aren't directly accessible from the internet
- The router performs address translation, masking the internal IP with its public IP when traffic goes out
- Return traffic must be properly mapped back to the correct internal device
- Incoming connection requests have no established mapping and are typically blocked
This creates the "NAT traversal problem" - how can two peers behind different NATs establish a direct connection when neither can initiate a connection to the other's private IP?
Understanding STUN in Depth
STUN (Session Traversal Utilities for NAT) provides a solution to the NAT traversal problem by helping devices discover their public-facing network information.
How STUN Works
- Public IP Discovery: A WebRTC client sends a request to a STUN server on the public internet
- NAT Mapping Analysis: The STUN server examines the request and determines the public IP address and port from which the request originated
- Response Generation: The STUN server sends back this information to the client
- Exchange via Signaling: WebRTC peers exchange their STUN-discovered addresses through the signaling channel
- Connection Attempt: Peers attempt to establish direct connections using the discovered public endpoints
STUN Server Implementation
A basic STUN server implementation requires:
// Example configuration of STUN servers in a WebRTC application
const configuration = {
iceServers: [
{
urls: [
'stun:stun1.l.google.com:19302',
'stun:stun2.l.google.com:19302',
]
}
],
iceCandidatePoolSize: 10,
};
// Create peer connection with this configuration
const peerConnection = new RTCPeerConnection(configuration);
STUN Success Rates
STUN is often effective, but recent real-world data provides a more nuanced picture of connectivity patterns. In open-internet consumer traffic, direct connections (host + server-reflexive) succeed in roughly 75-80% of sessions; the remaining 20-25% require a relay. This is according to Chrome usage metrics analysis (UMA) data, making it a reliable indicator of real-world conditions.
NAT Types and STUN Compatibility
NAT Type | Description | STUN Success Rate |
---|---|---|
Full Cone | Most permissive; maps an internal IP:port to a public IP:port for all external hosts | High (90%+) |
Restricted Cone (Address & Port) | Only allows incoming packets from IPs/ports that internal clients have previously sent outbound packets to | Moderate (70-80%) |
Symmetric NAT | Creates a unique mapping for each internal IP:port to destination IP:port combination | Low (20-40%) |
Note: Most consumer routers today behave as port-restricted cone NATs. Carrier-grade NATs (CGNs) used by mobile providers and some ISPs are almost always symmetric NATs, which explains why mobile networks often require TURN relays.
When STUN Isn't Enough: TURN as a Fallback
TURN (Traversal Using Relays around NAT) servers provide a fallback mechanism when direct peer-to-peer connections cannot be established with STUN. Unlike STUN, which merely facilitates the discovery of network information, TURN actually relays media data between peers.
How TURN Works
- Client Authentication: A WebRTC client authenticates with a TURN server (typically using username/password credentials)
- Allocation Request: The client requests the allocation of resources on the TURN server
- Relay Address Assignment: The TURN server assigns a public relay address that can be shared with peers
- Media Relay: All media traffic is sent to the TURN server, which then forwards it to the destination peer
TURN Server Implementation
// Example configuration including both STUN and TURN servers
const configuration = {
iceServers: [
{
urls: ['stun:stun1.l.google.com:19302', 'stun:stun2.l.google.com:19302']
},
{
urls: ['turn:your-turn-server.com:443'],
username: 'username',
credential: 'credential'
// This is using a long-term HMAC credential
}
],
iceTransportPolicy: 'all', // ensure default ICE policy allows UDP first
iceCandidatePoolSize: 10
};
// Create peer connection with this configuration
const peerConnection = new RTCPeerConnection(configuration);
Understanding the Cost of TURN Relaying
While TURN servers ensure near-universal connectivity, they come with significant tradeoffs:
- Increased Latency: Adding a relay server into the communication path introduces additional network hops and processing delays
- Bandwidth Costs: TURN servers must handle all media traffic between peers, leading to significant bandwidth consumption
- Infrastructure Requirements: Running reliable TURN servers requires robust infrastructure and careful capacity planning
- Quality of Service Impact: TURN over TCP/TLS reduces QoS due to lack of congestion feedback; throughput for video can stall behind BBR/QUIC flows, especially in congested networks
A properly designed WebRTC application should use TURN as a last resort, only falling back to it when direct connections cannot be established.
The ICE Framework: Coordinating STUN and TURN
Interactive Connectivity Establishment (ICE) is the framework that orchestrates the use of STUN and TURN in WebRTC applications. ICE systematically attempts different connection methods, prioritizing direct connections when possible and falling back to relayed connections when necessary.
ICE Candidate Generation and Processing
-
Gathering Candidates: The WebRTC client collects various address candidates:
- Local addresses (from network interfaces)
- Server-reflexive addresses (obtained via STUN)
- Relay addresses (obtained via TURN)
-
Candidate Exchange: Candidates are exchanged between peers via the signaling channel
-
Connectivity Checks: The ICE algorithm performs connectivity checks between candidate pairs
-
Connection Selection: ICE selects the best working connection, prioritizing direct paths over relayed ones
// Listening for ICE candidates in a WebRTC implementation
peerConnection.addEventListener('icecandidate', event => {
if (event.candidate) {
// Send this candidate to the remote peer via signaling channel
signalingChannel.send({ 'candidate': event.candidate });
// Log candidate type for monitoring (using regex for consistent extraction)
const typeMatch = event.candidate.candidate.match(/ typ ([a-z]+)/);
const candidateType = typeMatch && typeMatch[1] ? typeMatch[1] : 'unknown';
console.log(`Generated candidate of type: ${candidateType}`);
}
});
// Monitoring ICE connection state
peerConnection.addEventListener('iceconnectionstatechange', event => {
console.log(`ICE connection state: ${peerConnection.iceConnectionState}`);
// Detailed logging for production monitoring
if (peerConnection.iceConnectionState === 'connected' ||
peerConnection.iceConnectionState === 'completed') {
// Get active connection information
peerConnection.getStats().then(stats => {
stats.forEach(report => {
if (report.type === 'candidate-pair' && report.state === 'succeeded') {
console.log(`Connection established using candidate types:
Local: ${report.localCandidateType},
Remote: ${report.remoteCandidateType}`);
}
});
});
}
});
IPv6 and Happy-Eyeballs in ICE
The transition to IPv6 has significant implications for WebRTC NAT traversal. While IPv4 addresses are typically behind NAT due to address scarcity, IPv6 addresses are often globally routable without NAT.
IPv6 Host Candidates
Modern WebRTC implementations gather both IPv4 and IPv6 candidates:
- Direct Connectivity: IPv6 host candidates can often connect directly without NAT traversal
- Firewall Challenges: Despite no NAT, IPv6 traffic is frequently blocked by corporate firewalls
- Uneven Support: TURN-IPv6 support varies across implementations and service providers
- Connection Management: IPv6 host candidates bypass NAT but still need ICE consent-freshening; some middleboxes drop long-idle v6 flows
ICE Happy-Eyeballs
Similar to DNS Happy-Eyeballs, WebRTC employs a strategy to efficiently select between IPv4 and IPv6 paths:
- Parallel Connectivity Checks: ICE attempts both IPv4 and IPv6 paths simultaneously
- Race Conditions: The fastest working connection typically wins
- Address Family Prioritization: Some browsers prioritize IPv6 candidates for environmental reasons
// Example: Log address families during ICE gathering
peerConnection.addEventListener('icecandidate', event => {
if (event.candidate) {
// Identify address family (IPv4 vs IPv6)
const candidateStr = event.candidate.candidate;
const isIPv6 = candidateStr.includes(':');
const addrFamily = isIPv6 ? 'IPv6' : 'IPv4';
const candidateType = candidateStr.match(/ typ ([a-z]+)/)[1];
console.log(`Generated ${addrFamily} ${candidateType} candidate`);
}
});
Practical Implications
- Dual-Stack Deployment: TURN servers should support both IPv4 and IPv6 for maximum compatibility
- Mobile Networks: Many mobile carriers deploy IPv6-only networks internally with NAT64 for IPv4 communication
- Fallback Patterns: Even when IPv6 host candidates are available, TURN-UDP/IPv6 remains useful when direct IPv6 paths are blocked
TCP/TLS vs. UDP Relays
TURN supports multiple transport protocols, each with distinct trade-offs:
Protocol | Advantages | Disadvantages | Best Use Cases |
---|---|---|---|
TURN-UDP | Lowest latency Best media quality Congestion control works properly |
Blocked in some corporate networks May require fallbacks |
Default choice when available |
TURN-TCP | Higher connection success rate Works in most networks |
Head-of-line blocking Higher latency Less efficient for media |
Fallback when UDP is blocked |
TURN-TLS (over TCP, port 443) | Highest success rate Firewall-friendly Looks like HTTPS traffic |
Highest latency TLS handshake overhead CPU cost on server |
Last resort for strict networks |
Protocol Selection Strategy
For optimal performance and connectivity:
- Prioritize UDP: Configure ICE to try UDP first for best media quality
- Fallback Chain: If UDP fails, try TCP, then TLS as a last resort
- Multiple Server Configuration: Provide multiple TURN server options with different protocols
// Example: Configuring multiple TURN transport options
const configuration = {
iceServers: [
{ urls: 'stun:stun.example.org' },
{
urls: 'turn:turn.example.org:3478?transport=udp',
username: 'user',
credential: 'cred'
},
{
urls: 'turn:turn.example.org:3478?transport=tcp',
username: 'user',
credential: 'cred'
},
{
urls: 'turns:turn.example.org:443', // TLS over TCP
username: 'user',
credential: 'cred'
}
]
};
Best Practices for STUN and TURN Deployment
Deploying STUN and TURN servers for production WebRTC applications requires careful planning and implementation. Here are best practices to ensure reliable, secure, and efficient operation.
Server Reliability and Scalability
1. Geographic Distribution
Deploy STUN and TURN servers across multiple geographic regions to:
- Reduce latency by connecting users to the nearest server
- Improve resilience through redundancy
- Balance load across infrastructure
This geographic distribution is a key component of a global edge network strategy, allowing services to be delivered as close to users as possible regardless of their location.
2. Scalable Resource Management
TURN servers in particular face variable resource demands:
- Implement dynamic scaling based on metrics like connected users and bandwidth utilization
- Consider containerization for flexible deployment and scaling
- Establish monitoring to track resource usage patterns and anticipate scaling needs
3. Bandwidth Management
TURN servers relay media traffic, making bandwidth a critical resource:
- Implement bandwidth limiting policies per user/session
- Optimize relay protocols for efficient bandwidth usage
- Consider network provider costs in different regions
4. High Availability Design
To ensure reliable service:
- Implement server redundancy with failover mechanisms
- Use load balancers to distribute traffic across server instances
- Design for graceful degradation during partial outages
Security Implementation
1. Authentication and Authorization
TURN servers must implement robust access controls:
- Use time-limited credentials following the TURN protocol specifications
- Implement username and password authentication
- Consider integrating with existing authentication systems
// Example: Generating time-limited TURN credentials
function generateTurnCredentials(username, secretKey, ttlSeconds = 86400) {
const expiresAt = Math.floor(Date.now() / 1000) + ttlSeconds;
const credential = `${expiresAt}:${computeHmac(secretKey, `${username}:${expiresAt}`)}`;
return { username, credential, ttl: ttlSeconds };
}
// On client side:
const turnConfig = {
urls: ['turn:your-turn-server.com:443'],
username: turnCredentials.username,
credential: turnCredentials.credential
};
2. Transport Security
Protect STUN and TURN traffic:
- Use TURN-TLS on port 443 for firewall traversal (WebRTC traffic itself is already DTLS-SRTP)
- Configure proper certificate validation
- Remember that DTLS is mandatory for all WebRTC media transport, regardless of TURN protocol
3. Rate Limiting and Abuse Prevention
Prevent service abuse:
- Implement rate limiting for STUN/TURN requests
- Monitor for unusual traffic patterns
- Set reasonable allocation quotas per user
4. Regular Security Updates
Maintain server security:
- Keep server software updated to address vulnerabilities
- Regularly audit server configurations for security issues
- Follow security best practices for the hosting environment
TURN Cost Model Worksheet
Understanding the cost implications of TURN usage is critical for planning WebRTC deployments at scale.
Bandwidth Consumption Calculation
For a typical high-quality video call:
- Video Bitrate: ~1 Mbps (720p30) = 450 GB/month if running 24/7
- Audio Bitrate: ~50 Kbps = 16.2 GB/month if running 24/7
- TURN Usage Factor: ~20-25% of all calls require TURN
- Average Call Duration: Varies by application (e.g., 30 minutes for meetings)
Cost Estimation Example
For a service with 10,000 monthly active users:
Assumptions:
- Average call duration: 60 minutes/month/user
- Video bitrate: 1 Mbps
- TURN usage rate: 25%
- Cloud egress cost: $0.08/GB (varies by region)
Calculation:
- Data per user-hour: 1 Mbps × 60 minutes × 60 seconds ÷ 8 bits/byte = 450 MB (per user-hour)
- Monthly TURN data: 450 MB × 10,000 users × 25% TURN rate = 1,125 GB
- Monthly cost: 1,125 GB × $0.08/GB = $90
Per-user cost: $90 ÷ 10,000 = $0.009 per user per month
For applications with higher usage patterns (e.g., contact centers, all-day meetings), these costs scale accordingly. At high volumes, self-hosted TURN infrastructure often becomes more economical than cloud services.
Monitoring and Troubleshooting
Key Metrics to Monitor
For effective operation of STUN and TURN services:
-
Connection Success Rates
- Track the percentage of sessions using STUN vs. TURN
- Monitor ICE connection establishment times
- Measure connection failure rates
-
Resource Utilization
- Bandwidth consumption per server
- CPU and memory usage
- Active allocations and connections
-
Geographic Distribution
- Connection quality by region
- Server utilization across locations
- Regional failure patterns
Troubleshooting Common Issues
-
High TURN Usage Rates
- Investigate network configurations causing STUN failures
- Check for restrictive firewalls blocking UDP
- Consider if corporate networks are blocking traffic
-
Excessive Latency
- Analyze network paths between clients and servers
- Check for geographic mismatches in server selection
- Monitor for server overload conditions
-
Connection Failures
- Examine ICE candidate generation and exchange
- Verify server reachability from client networks
- Check authentication and credential issues
// Example: Monitoring code for TURN server usage
peerConnection.addEventListener('icegatheringstatechange', () => {
if (peerConnection.iceGatheringState === 'complete') {
let candidateTypes = {
host: 0,
srflx: 0, // STUN reflexive
relay: 0 // TURN relay
};
// Count candidate types
const candidates = peerConnection.localDescription.sdp.split('\r\n')
.filter(line => line.indexOf('a=candidate:') === 0);
candidates.forEach(candidate => {
// Safer extraction of candidate type using regex
const typeMatch = candidate.match(/ typ ([a-z]+)/);
if (typeMatch && typeMatch[1]) {
const type = typeMatch[1];
if (type === 'host') candidateTypes.host++;
else if (type === 'srflx') candidateTypes.srflx++;
else if (type === 'relay') candidateTypes.relay++;
}
});
// Log or send to analytics
console.log('ICE Candidate Types:', candidateTypes);
// Monitor selected candidate type when connection established
peerConnection.addEventListener('iceconnectionstatechange', () => {
if (peerConnection.iceConnectionState === 'connected') {
peerConnection.getStats().then(stats => {
// Find the active candidate pair
stats.forEach(report => {
if (report.type === 'candidate-pair' && report.selected) {
console.log('Selected candidate pair:', report);
// Log or send to analytics
}
});
});
}
});
}
});
QoS & Congestion Control Interplay
The interaction between TURN relays and WebRTC's congestion control mechanisms has important implications for media quality.
Impact on Congestion Control
- RTT Measurement: Relayed paths hide the true round-trip time between peers
- Bandwidth Estimation: Google Congestion Control (GCC) depends on accurate RTT and packet arrival patterns
- Adaptation Lag: TURN relays can delay or distort congestion signals
Optimization Strategies
For better media quality over TURN:
- Regional Proximity: Deploy TURN servers in the same region as at least one peer to keep RTT < 150ms
- Protocol Selection: Prefer UDP over TCP/TLS when possible for more accurate congestion feedback
- Bandwidth Caps: Consider explicit bandwidth limits for TURN sessions to prevent quality oscillation
// Example: Setting explicit bitrate limits for relayed connections
peerConnection.addEventListener('iceconnectionstatechange', async () => {
if (peerConnection.iceConnectionState === 'connected') {
const stats = await peerConnection.getStats();
let usingRelay = false;
stats.forEach(report => {
if (report.type === 'candidate-pair' && report.selected) {
if (report.localCandidateType === 'relay' ||
report.remoteCandidateType === 'relay') {
usingRelay = true;
}
}
});
// If using a relay, apply more conservative bandwidth limits
if (usingRelay) {
const sender = peerConnection.getSenders().find(s => s.track.kind === 'video');
if (sender) {
const params = sender.getParameters();
params.encodings.forEach(encoding => {
// Apply more conservative limit for relayed connections
encoding.maxBitrate = 800000; // 800 kbps
});
await sender.setParameters(params);
}
}
}
});
IPv4 Exhaustion & Aggressive Peer-Reflexive Pruning
Recent WebRTC implementations have adopted strategies to reduce the pressure on NAT binding tables:
Chrome's Candidate Pruning
Starting with Chrome 123:
- Idle Timeout: Server-reflexive (srflx) candidates are dropped after 1 minute of inactivity
- Impact: Long-lived data channels may experience connectivity issues after periods of inactivity
- Mitigation: Implement application-level keepalive for persistent connections
Configuration Adjustments
For applications requiring long-lived connections:
// Example: Configure ICE candidate policy
const configuration = {
iceServers: [...],
iceCandidatePoolSize: 10,
iceTransportPolicy: 'all', // or 'relay' for TURN-only
// Some browsers may support additional ICE configuration options
// for keepalive behavior or timeout values
};
Practical Recommendations
- Connection Monitoring: Implement connection state monitoring to detect dropped connections
- Periodic Activity: Send small keepalive packets over data channels during idle periods
- Reconnection Logic: Develop robust reconnection mechanisms for interrupted sessions
mDNS ICE Candidates
Modern browsers implement privacy-enhancing techniques for local IP address discovery:
mDNS Obfuscation
- Local IP Privacy: Instead of exposing local IPs like 192.168.1.5, browsers generate random mDNS names (e.g., abc123.local)
- Impact on Debugging: Makes troubleshooting more challenging as raw IPs aren't visible
- Local Network Only: This technique only affects host candidates, not STUN/TURN
- Debugging Access: Chromium passes the real private IP in
relatedAddress
andrelatedPort
properties in the candidate JSON API (not in SDP), which can be useful for debugging
Identifying mDNS Candidates
// Example: Detecting mDNS candidates
peerConnection.addEventListener('icecandidate', event => {
if (event.candidate) {
const candidateStr = event.candidate.candidate;
const isMDNS = candidateStr.includes('.local');
if (isMDNS) {
console.log('mDNS candidate detected:', candidateStr);
}
}
});
Implementation Considerations
- Local Testing: mDNS resolution only works on the local network, affecting local development
- Logging Patterns: Update logging and monitoring to handle mDNS format
- Compatibility: Some older clients might not understand mDNS candidates
Self-Hosting vs. Third-Party Services
When implementing STUN and TURN for WebRTC applications, organizations face a critical decision: build and maintain their own infrastructure or leverage third-party services.
Self-Hosting Considerations
Advantages:
- Complete control over infrastructure and configurations
- Potentially lower long-term costs for high-volume applications
- Ability to customize for specific requirements
- No dependency on external service providers
Challenges:
- Requires significant expertise to deploy and maintain
- Necessitates global infrastructure for optimal performance
- Demands ongoing monitoring and maintenance
- Requires scaling infrastructure with usage growth
Third-Party STUN/TURN Services
Advantages:
- Reduced implementation complexity
- Global infrastructure already in place
- Managed scaling and maintenance
- Often includes additional features like analytics
Challenges:
- Recurring costs that scale with usage
- Less control over specific configurations
- Potential privacy and compliance considerations
- Dependency on service provider reliability
Hybrid Approaches
Many organizations adopt a hybrid approach:
- Use public STUN servers (like Google's) which are free and reliable
- Deploy private TURN servers for sensitive internal applications
- Leverage third-party TURN services for global coverage
Conclusion
STUN and TURN servers are essential components in the WebRTC infrastructure, working together to overcome the challenges posed by NATs and firewalls. While STUN facilitates direct peer-to-peer connections in many scenarios, TURN provides critical fallback capabilities when direct connections aren't possible.
Effective implementation requires:
- Understanding the underlying network challenges
- Properly configuring ICE to coordinate STUN and TURN usage
- Deploying reliable, scalable, and secure server infrastructure
- Monitoring usage patterns and connection success rates
- Balancing performance, cost, and reliability requirements
By following these best practices, developers can build WebRTC applications that deliver reliable real-time communication experiences across diverse network environments.