WebRTC Stun vs Turn Servers

In our Introduction to WebRTC Module two, we covered the basics of STUN and TURN servers. This lesson will delve deeper into their functionalities, implementation strategies, and best practices for effectively balancing STUN and TURN usage in production WebRTC applications.

The Challenge: NAT and Firewalls in WebRTC

Before exploring STUN and TURN in depth, it's essential to understand the core networking challenge they solve. Most devices on the internet today operate behind Network Address Translation (NAT) systems and firewalls. These technologies provide security and address conservation but create significant obstacles for peer-to-peer communication - the foundation of WebRTC.

Network Address Translation (NAT)

NAT is a technique used by routers to allow multiple devices on a private network to share a single public IP address. While efficient for outbound connections, NAT creates a fundamental challenge for WebRTC:

Devices on private networks are assigned private IP addresses (e.g., 192.168.1.x) that aren't directly accessible from the internet
The router performs address translation, masking the internal IP with its public IP when traffic goes out
Return traffic must be properly mapped back to the correct internal device
Incoming connection requests have no established mapping and are typically blocked

This creates the "NAT traversal problem" - how can two peers behind different NATs establish a direct connection when neither can initiate a connection to the other's private IP?

Understanding STUN in Depth

STUN (Session Traversal Utilities for NAT) provides a solution to the NAT traversal problem by helping devices discover their public-facing network information.

How STUN Works

Public IP Discovery: A WebRTC client sends a request to a STUN server on the public internet
NAT Mapping Analysis: The STUN server examines the request and determines the public IP address and port from which the request originated
Response Generation: The STUN server sends back this information to the client
Exchange via Signaling: WebRTC peers exchange their STUN-discovered addresses through the signaling channel
Connection Attempt: Peers attempt to establish direct connections using the discovered public endpoints

STUN Server Implementation

A basic STUN server implementation requires:

javascript

            // Example configuration of STUN servers in a WebRTC application
const configuration = {
  iceServers: [
    {
      urls: [
        'stun:stun1.l.google.com:19302',
        'stun:stun2.l.google.com:19302',
      ]
    }
  ],
  iceCandidatePoolSize: 10,
};

// Create peer connection with this configuration
const peerConnection = new RTCPeerConnection(configuration);

STUN Success Rates

STUN is often effective, but recent real-world data provides a more nuanced picture of connectivity patterns. In open-internet consumer traffic, direct connections (host + server-reflexive) succeed in roughly 75-80% of sessions; the remaining 20-25% require a relay. This is according to Chrome usage metrics analysis (UMA) data, making it a reliable indicator of real-world conditions.

NAT Types and STUN Compatibility

NAT Type	Description	STUN Success Rate
Full Cone	Most permissive; maps an internal IP:port to a public IP:port for all external hosts	High (90%+)
Restricted Cone (Address & Port)	Only allows incoming packets from IPs/ports that internal clients have previously sent outbound packets to	Moderate (70-80%)
Symmetric NAT	Creates a unique mapping for each internal IP:port to destination IP:port combination	Low (20-40%)

Note: Most consumer routers today behave as port-restricted cone NATs. Carrier-grade NATs (CGNs) used by mobile providers and some ISPs are almost always symmetric NATs, which explains why mobile networks often require TURN relays.

When STUN Isn't Enough: TURN as a Fallback

TURN (Traversal Using Relays around NAT) servers provide a fallback mechanism when direct peer-to-peer connections cannot be established with STUN. Unlike STUN, which merely facilitates the discovery of network information, TURN actually relays media data between peers.

How TURN Works

Client Authentication: A WebRTC client authenticates with a TURN server (typically using username/password credentials)
Allocation Request: The client requests the allocation of resources on the TURN server
Relay Address Assignment: The TURN server assigns a public relay address that can be shared with peers
Media Relay: All media traffic is sent to the TURN server, which then forwards it to the destination peer

TURN Server Implementation

javascript

            // Example configuration including both STUN and TURN servers
const configuration = {
  iceServers: [
    {
      urls: ['stun:stun1.l.google.com:19302', 'stun:stun2.l.google.com:19302']
    },
    {
      urls: ['turn:your-turn-server.com:443'],
      username: 'username',
      credential: 'credential'
      // This is using a long-term HMAC credential
    }
  ],
  iceTransportPolicy: 'all', // ensure default ICE policy allows UDP first
  iceCandidatePoolSize: 10
};

// Create peer connection with this configuration
const peerConnection = new RTCPeerConnection(configuration);

Understanding the Cost of TURN Relaying

While TURN servers ensure near-universal connectivity, they come with significant tradeoffs:

Increased Latency: Adding a relay server into the communication path introduces additional network hops and processing delays
Bandwidth Costs: TURN servers must handle all media traffic between peers, leading to significant bandwidth consumption
Infrastructure Requirements: Running reliable TURN servers requires robust infrastructure and careful capacity planning
Quality of Service Impact: TURN over TCP/TLS reduces QoS due to lack of congestion feedback; throughput for video can stall behind BBR/QUIC flows, especially in congested networks

A properly designed WebRTC application should use TURN as a last resort, only falling back to it when direct connections cannot be established.

The ICE Framework: Coordinating STUN and TURN

Interactive Connectivity Establishment (ICE) is the framework that orchestrates the use of STUN and TURN in WebRTC applications. ICE systematically attempts different connection methods, prioritizing direct connections when possible and falling back to relayed connections when necessary.

ICE Candidate Generation and Processing

Gathering Candidates: The WebRTC client collects various address candidates:
- Local addresses (from network interfaces)
- Server-reflexive addresses (obtained via STUN)
- Relay addresses (obtained via TURN)
Candidate Exchange: Candidates are exchanged between peers via the signaling channel
Connectivity Checks: The ICE algorithm performs connectivity checks between candidate pairs
Connection Selection: ICE selects the best working connection, prioritizing direct paths over relayed ones

javascript

            // Listening for ICE candidates in a WebRTC implementation
peerConnection.addEventListener('icecandidate', event => {
  if (event.candidate) {
    // Send this candidate to the remote peer via signaling channel
    signalingChannel.send({ 'candidate': event.candidate });

    // Log candidate type for monitoring (using regex for consistent extraction)
    const typeMatch = event.candidate.candidate.match(/ typ ([a-z]+)/);
    const candidateType = typeMatch && typeMatch[1] ? typeMatch[1] : 'unknown';
    console.log(`Generated candidate of type: ${candidateType}`);
  }
});

// Monitoring ICE connection state
peerConnection.addEventListener('iceconnectionstatechange', event => {
  console.log(`ICE connection state: ${peerConnection.iceConnectionState}`);

  // Detailed logging for production monitoring
  if (peerConnection.iceConnectionState === 'connected' || 
      peerConnection.iceConnectionState === 'completed') {

    // Get active connection information
    peerConnection.getStats().then(stats => {
      stats.forEach(report => {
        if (report.type === 'candidate-pair' && report.state === 'succeeded') {
          console.log(`Connection established using candidate types: 
            Local: ${report.localCandidateType}, 
            Remote: ${report.remoteCandidateType}`);
        }
      });
    });
  }
});

IPv6 and Happy-Eyeballs in ICE

The transition to IPv6 has significant implications for WebRTC NAT traversal. While IPv4 addresses are typically behind NAT due to address scarcity, IPv6 addresses are often globally routable without NAT.

IPv6 Host Candidates

Modern WebRTC implementations gather both IPv4 and IPv6 candidates:

Direct Connectivity: IPv6 host candidates can often connect directly without NAT traversal
Firewall Challenges: Despite no NAT, IPv6 traffic is frequently blocked by corporate firewalls
Uneven Support: TURN-IPv6 support varies across implementations and service providers
Connection Management: IPv6 host candidates bypass NAT but still need ICE consent-freshening; some middleboxes drop long-idle v6 flows

ICE Happy-Eyeballs

Similar to DNS Happy-Eyeballs, WebRTC employs a strategy to efficiently select between IPv4 and IPv6 paths:

Parallel Connectivity Checks: ICE attempts both IPv4 and IPv6 paths simultaneously
Race Conditions: The fastest working connection typically wins
Address Family Prioritization: Some browsers prioritize IPv6 candidates for environmental reasons

javascript

            // Example: Log address families during ICE gathering
peerConnection.addEventListener('icecandidate', event => {
  if (event.candidate) {
    // Identify address family (IPv4 vs IPv6)
    const candidateStr = event.candidate.candidate;
    const isIPv6 = candidateStr.includes(':');
    const addrFamily = isIPv6 ? 'IPv6' : 'IPv4';
    const candidateType = candidateStr.match(/ typ ([a-z]+)/)[1];

    console.log(`Generated ${addrFamily} ${candidateType} candidate`);
  }
});

Practical Implications

Dual-Stack Deployment: TURN servers should support both IPv4 and IPv6 for maximum compatibility
Mobile Networks: Many mobile carriers deploy IPv6-only networks internally with NAT64 for IPv4 communication
Fallback Patterns: Even when IPv6 host candidates are available, TURN-UDP/IPv6 remains useful when direct IPv6 paths are blocked

TCP/TLS vs. UDP Relays

TURN supports multiple transport protocols, each with distinct trade-offs:

Protocol	Advantages	Disadvantages	Best Use Cases
TURN-UDP	Lowest latency Best media quality Congestion control works properly	Blocked in some corporate networks May require fallbacks	Default choice when available
TURN-TCP	Higher connection success rate Works in most networks	Head-of-line blocking Higher latency Less efficient for media	Fallback when UDP is blocked
TURN-TLS (over TCP, port 443)	Highest success rate Firewall-friendly Looks like HTTPS traffic	Highest latency TLS handshake overhead CPU cost on server	Last resort for strict networks

Protocol Selection Strategy

For optimal performance and connectivity:

Prioritize UDP: Configure ICE to try UDP first for best media quality
Fallback Chain: If UDP fails, try TCP, then TLS as a last resort
Multiple Server Configuration: Provide multiple TURN server options with different protocols

javascript

            // Example: Configuring multiple TURN transport options
const configuration = {
  iceServers: [
    { urls: 'stun:stun.example.org' },
    { 
      urls: 'turn:turn.example.org:3478?transport=udp',
      username: 'user',
      credential: 'cred'
    },
    { 
      urls: 'turn:turn.example.org:3478?transport=tcp',
      username: 'user',
      credential: 'cred'
    },
    { 
      urls: 'turns:turn.example.org:443',  // TLS over TCP
      username: 'user',
      credential: 'cred'
    }
  ]
};

Best Practices for STUN and TURN Deployment

Deploying STUN and TURN servers for production WebRTC applications requires careful planning and implementation. Here are best practices to ensure reliable, secure, and efficient operation.

Server Reliability and Scalability

1. Geographic Distribution

Deploy STUN and TURN servers across multiple geographic regions to:

Reduce latency by connecting users to the nearest server
Improve resilience through redundancy
Balance load across infrastructure

This geographic distribution is a key component of a global edge network strategy, allowing services to be delivered as close to users as possible regardless of their location.

2. Scalable Resource Management

TURN servers in particular face variable resource demands:

Implement dynamic scaling based on metrics like connected users and bandwidth utilization
Consider containerization for flexible deployment and scaling
Establish monitoring to track resource usage patterns and anticipate scaling needs

3. Bandwidth Management

TURN servers relay media traffic, making bandwidth a critical resource:

Implement bandwidth limiting policies per user/session
Optimize relay protocols for efficient bandwidth usage
Consider network provider costs in different regions

4. High Availability Design

To ensure reliable service:

Implement server redundancy with failover mechanisms
Use load balancers to distribute traffic across server instances
Design for graceful degradation during partial outages

Security Implementation

1. Authentication and Authorization

TURN servers must implement robust access controls:

Use time-limited credentials following the TURN protocol specifications
Implement username and password authentication
Consider integrating with existing authentication systems

javascript

            // Example: Generating time-limited TURN credentials
function generateTurnCredentials(username, secretKey, ttlSeconds = 86400) {
  const expiresAt = Math.floor(Date.now() / 1000) + ttlSeconds;
  const credential = `${expiresAt}:${computeHmac(secretKey, `${username}:${expiresAt}`)}`;
  return { username, credential, ttl: ttlSeconds };
}

// On client side:
const turnConfig = {
  urls: ['turn:your-turn-server.com:443'],
  username: turnCredentials.username,
  credential: turnCredentials.credential
};

2. Transport Security

Protect STUN and TURN traffic:

Use TURN-TLS on port 443 for firewall traversal (WebRTC traffic itself is already DTLS-SRTP)
Configure proper certificate validation
Remember that DTLS is mandatory for all WebRTC media transport, regardless of TURN protocol

3. Rate Limiting and Abuse Prevention

Prevent service abuse:

Implement rate limiting for STUN/TURN requests
Monitor for unusual traffic patterns
Set reasonable allocation quotas per user

4. Regular Security Updates

Maintain server security:

Keep server software updated to address vulnerabilities
Regularly audit server configurations for security issues
Follow security best practices for the hosting environment

TURN Cost Model Worksheet

Understanding the cost implications of TURN usage is critical for planning WebRTC deployments at scale.

Bandwidth Consumption Calculation

For a typical high-quality video call:

Video Bitrate: ~1 Mbps (720p30) = 450 GB/month if running 24/7
Audio Bitrate: ~50 Kbps = 16.2 GB/month if running 24/7
TURN Usage Factor: ~20-25% of all calls require TURN
Average Call Duration: Varies by application (e.g., 30 minutes for meetings)

Cost Estimation Example

For a service with 10,000 monthly active users:

Assumptions:
- Average call duration: 60 minutes/month/user
- Video bitrate: 1 Mbps
- TURN usage rate: 25%
- Cloud egress cost: $0.08/GB (varies by region)

Calculation:
- Data per user-hour: 1 Mbps × 60 minutes × 60 seconds ÷ 8 bits/byte = 450 MB (per user-hour)
- Monthly TURN data: 450 MB × 10,000 users × 25% TURN rate = 1,125 GB
- Monthly cost: 1,125 GB × $0.08/GB = $90

Per-user cost: $90 ÷ 10,000 = $0.009 per user per month

For applications with higher usage patterns (e.g., contact centers, all-day meetings), these costs scale accordingly. At high volumes, self-hosted TURN infrastructure often becomes more economical than cloud services.

Monitoring and Troubleshooting

Key Metrics to Monitor

For effective operation of STUN and TURN services:

Connection Success Rates
- Track the percentage of sessions using STUN vs. TURN
- Monitor ICE connection establishment times
- Measure connection failure rates
Resource Utilization
- Bandwidth consumption per server
- CPU and memory usage
- Active allocations and connections
Geographic Distribution
- Connection quality by region
- Server utilization across locations
- Regional failure patterns

Troubleshooting Common Issues

High TURN Usage Rates
- Investigate network configurations causing STUN failures
- Check for restrictive firewalls blocking UDP
- Consider if corporate networks are blocking traffic
Excessive Latency
- Analyze network paths between clients and servers
- Check for geographic mismatches in server selection
- Monitor for server overload conditions
Connection Failures
- Examine ICE candidate generation and exchange
- Verify server reachability from client networks
- Check authentication and credential issues

javascript

            // Example: Monitoring code for TURN server usage
peerConnection.addEventListener('icegatheringstatechange', () => {
  if (peerConnection.iceGatheringState === 'complete') {
    let candidateTypes = {
      host: 0,
      srflx: 0, // STUN reflexive
      relay: 0  // TURN relay
    };

    // Count candidate types
    const candidates = peerConnection.localDescription.sdp.split('\r\n')
      .filter(line => line.indexOf('a=candidate:') === 0);

    candidates.forEach(candidate => {
      // Safer extraction of candidate type using regex
      const typeMatch = candidate.match(/ typ ([a-z]+)/);
      if (typeMatch && typeMatch[1]) {
        const type = typeMatch[1];
        if (type === 'host') candidateTypes.host++;
        else if (type === 'srflx') candidateTypes.srflx++;
        else if (type === 'relay') candidateTypes.relay++;
      }
    });

    // Log or send to analytics
    console.log('ICE Candidate Types:', candidateTypes);

    // Monitor selected candidate type when connection established
    peerConnection.addEventListener('iceconnectionstatechange', () => {
      if (peerConnection.iceConnectionState === 'connected') {
        peerConnection.getStats().then(stats => {
          // Find the active candidate pair
          stats.forEach(report => {
            if (report.type === 'candidate-pair' && report.selected) {
              console.log('Selected candidate pair:', report);
              // Log or send to analytics
            }
          });
        });
      }
    });
  }
});

QoS & Congestion Control Interplay

The interaction between TURN relays and WebRTC's congestion control mechanisms has important implications for media quality.

Impact on Congestion Control

RTT Measurement: Relayed paths hide the true round-trip time between peers
Bandwidth Estimation: Google Congestion Control (GCC) depends on accurate RTT and packet arrival patterns
Adaptation Lag: TURN relays can delay or distort congestion signals

Optimization Strategies

For better media quality over TURN:

Regional Proximity: Deploy TURN servers in the same region as at least one peer to keep RTT < 150ms
Protocol Selection: Prefer UDP over TCP/TLS when possible for more accurate congestion feedback
Bandwidth Caps: Consider explicit bandwidth limits for TURN sessions to prevent quality oscillation

javascript

            // Example: Setting explicit bitrate limits for relayed connections
peerConnection.addEventListener('iceconnectionstatechange', async () => {
  if (peerConnection.iceConnectionState === 'connected') {
    const stats = await peerConnection.getStats();
    let usingRelay = false;

    stats.forEach(report => {
      if (report.type === 'candidate-pair' && report.selected) {
        if (report.localCandidateType === 'relay' || 
            report.remoteCandidateType === 'relay') {
          usingRelay = true;
        }
      }
    });

    // If using a relay, apply more conservative bandwidth limits
    if (usingRelay) {
      const sender = peerConnection.getSenders().find(s => s.track.kind === 'video');
      if (sender) {
        const params = sender.getParameters();
        params.encodings.forEach(encoding => {
          // Apply more conservative limit for relayed connections
          encoding.maxBitrate = 800000; // 800 kbps
        });
        await sender.setParameters(params);
      }
    }
  }
});

IPv4 Exhaustion & Aggressive Peer-Reflexive Pruning

Recent WebRTC implementations have adopted strategies to reduce the pressure on NAT binding tables:

Chrome's Candidate Pruning

Starting with Chrome 123:

Idle Timeout: Server-reflexive (srflx) candidates are dropped after 1 minute of inactivity
Impact: Long-lived data channels may experience connectivity issues after periods of inactivity
Mitigation: Implement application-level keepalive for persistent connections

Configuration Adjustments

For applications requiring long-lived connections:

javascript

            // Example: Configure ICE candidate policy
const configuration = {
  iceServers: [...],
  iceCandidatePoolSize: 10,
  iceTransportPolicy: 'all', // or 'relay' for TURN-only
  // Some browsers may support additional ICE configuration options
  // for keepalive behavior or timeout values
};

Practical Recommendations

Connection Monitoring: Implement connection state monitoring to detect dropped connections
Periodic Activity: Send small keepalive packets over data channels during idle periods
Reconnection Logic: Develop robust reconnection mechanisms for interrupted sessions

mDNS ICE Candidates

Modern browsers implement privacy-enhancing techniques for local IP address discovery:

mDNS Obfuscation

Local IP Privacy: Instead of exposing local IPs like 192.168.1.5, browsers generate random mDNS names (e.g., abc123.local)
Impact on Debugging: Makes troubleshooting more challenging as raw IPs aren't visible
Local Network Only: This technique only affects host candidates, not STUN/TURN
Debugging Access: Chromium passes the real private IP in relatedAddress and relatedPort properties in the candidate JSON API (not in SDP), which can be useful for debugging

Identifying mDNS Candidates

javascript

            // Example: Detecting mDNS candidates
peerConnection.addEventListener('icecandidate', event => {
  if (event.candidate) {
    const candidateStr = event.candidate.candidate;
    const isMDNS = candidateStr.includes('.local');

    if (isMDNS) {
      console.log('mDNS candidate detected:', candidateStr);
    }
  }
});

Implementation Considerations

Local Testing: mDNS resolution only works on the local network, affecting local development
Logging Patterns: Update logging and monitoring to handle mDNS format
Compatibility: Some older clients might not understand mDNS candidates

Self-Hosting vs. Third-Party Services

When implementing STUN and TURN for WebRTC applications, organizations face a critical decision: build and maintain their own infrastructure or leverage third-party services.

Self-Hosting Considerations

Advantages:

Complete control over infrastructure and configurations
Potentially lower long-term costs for high-volume applications
Ability to customize for specific requirements
No dependency on external service providers

Challenges:

Requires significant expertise to deploy and maintain
Necessitates global infrastructure for optimal performance
Demands ongoing monitoring and maintenance
Requires scaling infrastructure with usage growth

Third-Party STUN/TURN Services

Advantages:

Reduced implementation complexity
Global infrastructure already in place
Managed scaling and maintenance
Often includes additional features like analytics

Challenges:

Recurring costs that scale with usage
Less control over specific configurations
Potential privacy and compliance considerations
Dependency on service provider reliability

Hybrid Approaches

Many organizations adopt a hybrid approach:

Use public STUN servers (like Google's) which are free and reliable
Deploy private TURN servers for sensitive internal applications
Leverage third-party TURN services for global coverage

Conclusion

STUN and TURN servers are essential components in the WebRTC infrastructure, working together to overcome the challenges posed by NATs and firewalls. While STUN facilitates direct peer-to-peer connections in many scenarios, TURN provides critical fallback capabilities when direct connections aren't possible.

Effective implementation requires:

Understanding the underlying network challenges
Properly configuring ICE to coordinate STUN and TURN usage
Deploying reliable, scalable, and secure server infrastructure
Monitoring usage patterns and connection success rates
Balancing performance, cost, and reliability requirements

By following these best practices, developers can build WebRTC applications that deliver reliable real-time communication experiences across diverse network environments.

Learn STUN & TURN Servers on WebRTC

The Challenge: NAT and Firewalls in WebRTC

Network Address Translation (NAT)

Understanding STUN in Depth

How STUN Works

STUN Server Implementation

STUN Success Rates

NAT Types and STUN Compatibility

When STUN Isn't Enough: TURN as a Fallback

How TURN Works

TURN Server Implementation

Understanding the Cost of TURN Relaying

The ICE Framework: Coordinating STUN and TURN

ICE Candidate Generation and Processing

IPv6 and Happy-Eyeballs in ICE

IPv6 Host Candidates

ICE Happy-Eyeballs

Practical Implications

TCP/TLS vs. UDP Relays

Protocol Selection Strategy

Best Practices for STUN and TURN Deployment

Server Reliability and Scalability

Security Implementation

TURN Cost Model Worksheet

Bandwidth Consumption Calculation

Cost Estimation Example

Monitoring and Troubleshooting

Key Metrics to Monitor

Troubleshooting Common Issues

QoS & Congestion Control Interplay

Impact on Congestion Control

Optimization Strategies

IPv4 Exhaustion & Aggressive Peer-Reflexive Pruning

Chrome's Candidate Pruning

Configuration Adjustments

Practical Recommendations

mDNS ICE Candidates

mDNS Obfuscation

Identifying mDNS Candidates

Implementation Considerations

Self-Hosting vs. Third-Party Services

Self-Hosting Considerations

Third-Party STUN/TURN Services

Hybrid Approaches

Conclusion

Additional Resources