Did you know? All Video & Audio API plans include a $100 free usage credit each month so you can build and test risk-free. View Plans ->

WebRTC For The Brave

Learn STUN & TURN Servers on WebRTC

In this lesson, you'll learn the how STUN and TURN works under the hood on WebRTC.

In our Introduction to WebRTC Module two, we covered the basics of STUN and TURN servers. This lesson will delve deeper into their functionalities, implementation strategies, and best practices for effectively balancing STUN and TURN usage in production WebRTC applications.

The Challenge: NAT and Firewalls in WebRTC

Before exploring STUN and TURN in depth, it's essential to understand the core networking challenge they solve. Most devices on the internet today operate behind Network Address Translation (NAT) systems and firewalls. These technologies provide security and address conservation but create significant obstacles for peer-to-peer communication - the foundation of WebRTC.

Network Address Translation (NAT)

NAT is a technique used by routers to allow multiple devices on a private network to share a single public IP address. While efficient for outbound connections, NAT creates a fundamental challenge for WebRTC:

  • Devices on private networks are assigned private IP addresses (e.g., 192.168.1.x) that aren't directly accessible from the internet
  • The router performs address translation, masking the internal IP with its public IP when traffic goes out
  • Return traffic must be properly mapped back to the correct internal device
  • Incoming connection requests have no established mapping and are typically blocked

This creates the "NAT traversal problem" - how can two peers behind different NATs establish a direct connection when neither can initiate a connection to the other's private IP?

Understanding STUN in Depth

STUN (Session Traversal Utilities for NAT) provides a solution to the NAT traversal problem by helping devices discover their public-facing network information.

How STUN Works

  1. Public IP Discovery: A WebRTC client sends a request to a STUN server on the public internet
  2. NAT Mapping Analysis: The STUN server examines the request and determines the public IP address and port from which the request originated
  3. Response Generation: The STUN server sends back this information to the client
  4. Exchange via Signaling: WebRTC peers exchange their STUN-discovered addresses through the signaling channel
  5. Connection Attempt: Peers attempt to establish direct connections using the discovered public endpoints

STUN Server Implementation

A basic STUN server implementation requires:

javascript
            // Example configuration of STUN servers in a WebRTC application
const configuration = {
  iceServers: [
    {
      urls: [
        'stun:stun1.l.google.com:19302',
        'stun:stun2.l.google.com:19302',
      ]
    }
  ],
  iceCandidatePoolSize: 10,
};

// Create peer connection with this configuration
const peerConnection = new RTCPeerConnection(configuration);
        

STUN Success Rates

STUN is often effective, but recent real-world data provides a more nuanced picture of connectivity patterns. In open-internet consumer traffic, direct connections (host + server-reflexive) succeed in roughly 75-80% of sessions; the remaining 20-25% require a relay. This is according to Chrome usage metrics analysis (UMA) data, making it a reliable indicator of real-world conditions.

NAT Types and STUN Compatibility

NAT Type Description STUN Success Rate
Full Cone Most permissive; maps an internal IP:port to a public IP:port for all external hosts High (90%+)
Restricted Cone (Address & Port) Only allows incoming packets from IPs/ports that internal clients have previously sent outbound packets to Moderate (70-80%)
Symmetric NAT Creates a unique mapping for each internal IP:port to destination IP:port combination Low (20-40%)

Note: Most consumer routers today behave as port-restricted cone NATs. Carrier-grade NATs (CGNs) used by mobile providers and some ISPs are almost always symmetric NATs, which explains why mobile networks often require TURN relays.

When STUN Isn't Enough: TURN as a Fallback

TURN (Traversal Using Relays around NAT) servers provide a fallback mechanism when direct peer-to-peer connections cannot be established with STUN. Unlike STUN, which merely facilitates the discovery of network information, TURN actually relays media data between peers.

How TURN Works

  1. Client Authentication: A WebRTC client authenticates with a TURN server (typically using username/password credentials)
  2. Allocation Request: The client requests the allocation of resources on the TURN server
  3. Relay Address Assignment: The TURN server assigns a public relay address that can be shared with peers
  4. Media Relay: All media traffic is sent to the TURN server, which then forwards it to the destination peer

TURN Server Implementation

javascript
            // Example configuration including both STUN and TURN servers
const configuration = {
  iceServers: [
    {
      urls: ['stun:stun1.l.google.com:19302', 'stun:stun2.l.google.com:19302']
    },
    {
      urls: ['turn:your-turn-server.com:443'],
      username: 'username',
      credential: 'credential'
      // This is using a long-term HMAC credential
    }
  ],
  iceTransportPolicy: 'all', // ensure default ICE policy allows UDP first
  iceCandidatePoolSize: 10
};

// Create peer connection with this configuration
const peerConnection = new RTCPeerConnection(configuration);
        

Understanding the Cost of TURN Relaying

While TURN servers ensure near-universal connectivity, they come with significant tradeoffs:

  1. Increased Latency: Adding a relay server into the communication path introduces additional network hops and processing delays
  2. Bandwidth Costs: TURN servers must handle all media traffic between peers, leading to significant bandwidth consumption
  3. Infrastructure Requirements: Running reliable TURN servers requires robust infrastructure and careful capacity planning
  4. Quality of Service Impact: TURN over TCP/TLS reduces QoS due to lack of congestion feedback; throughput for video can stall behind BBR/QUIC flows, especially in congested networks

A properly designed WebRTC application should use TURN as a last resort, only falling back to it when direct connections cannot be established.

The ICE Framework: Coordinating STUN and TURN

Interactive Connectivity Establishment (ICE) is the framework that orchestrates the use of STUN and TURN in WebRTC applications. ICE systematically attempts different connection methods, prioritizing direct connections when possible and falling back to relayed connections when necessary.

ICE Candidate Generation and Processing

  1. Gathering Candidates: The WebRTC client collects various address candidates:

    • Local addresses (from network interfaces)
    • Server-reflexive addresses (obtained via STUN)
    • Relay addresses (obtained via TURN)
  2. Candidate Exchange: Candidates are exchanged between peers via the signaling channel

  3. Connectivity Checks: The ICE algorithm performs connectivity checks between candidate pairs

  4. Connection Selection: ICE selects the best working connection, prioritizing direct paths over relayed ones

javascript
            // Listening for ICE candidates in a WebRTC implementation
peerConnection.addEventListener('icecandidate', event => {
  if (event.candidate) {
    // Send this candidate to the remote peer via signaling channel
    signalingChannel.send({ 'candidate': event.candidate });

    // Log candidate type for monitoring (using regex for consistent extraction)
    const typeMatch = event.candidate.candidate.match(/ typ ([a-z]+)/);
    const candidateType = typeMatch && typeMatch[1] ? typeMatch[1] : 'unknown';
    console.log(`Generated candidate of type: ${candidateType}`);
  }
});

// Monitoring ICE connection state
peerConnection.addEventListener('iceconnectionstatechange', event => {
  console.log(`ICE connection state: ${peerConnection.iceConnectionState}`);

  // Detailed logging for production monitoring
  if (peerConnection.iceConnectionState === 'connected' || 
      peerConnection.iceConnectionState === 'completed') {

    // Get active connection information
    peerConnection.getStats().then(stats => {
      stats.forEach(report => {
        if (report.type === 'candidate-pair' && report.state === 'succeeded') {
          console.log(`Connection established using candidate types: 
            Local: ${report.localCandidateType}, 
            Remote: ${report.remoteCandidateType}`);
        }
      });
    });
  }
});
        

IPv6 and Happy-Eyeballs in ICE

The transition to IPv6 has significant implications for WebRTC NAT traversal. While IPv4 addresses are typically behind NAT due to address scarcity, IPv6 addresses are often globally routable without NAT.

IPv6 Host Candidates

Modern WebRTC implementations gather both IPv4 and IPv6 candidates:

  1. Direct Connectivity: IPv6 host candidates can often connect directly without NAT traversal
  2. Firewall Challenges: Despite no NAT, IPv6 traffic is frequently blocked by corporate firewalls
  3. Uneven Support: TURN-IPv6 support varies across implementations and service providers
  4. Connection Management: IPv6 host candidates bypass NAT but still need ICE consent-freshening; some middleboxes drop long-idle v6 flows

ICE Happy-Eyeballs

Similar to DNS Happy-Eyeballs, WebRTC employs a strategy to efficiently select between IPv4 and IPv6 paths:

  1. Parallel Connectivity Checks: ICE attempts both IPv4 and IPv6 paths simultaneously
  2. Race Conditions: The fastest working connection typically wins
  3. Address Family Prioritization: Some browsers prioritize IPv6 candidates for environmental reasons
javascript
            // Example: Log address families during ICE gathering
peerConnection.addEventListener('icecandidate', event => {
  if (event.candidate) {
    // Identify address family (IPv4 vs IPv6)
    const candidateStr = event.candidate.candidate;
    const isIPv6 = candidateStr.includes(':');
    const addrFamily = isIPv6 ? 'IPv6' : 'IPv4';
    const candidateType = candidateStr.match(/ typ ([a-z]+)/)[1];

    console.log(`Generated ${addrFamily} ${candidateType} candidate`);
  }
});
        

Practical Implications

  1. Dual-Stack Deployment: TURN servers should support both IPv4 and IPv6 for maximum compatibility
  2. Mobile Networks: Many mobile carriers deploy IPv6-only networks internally with NAT64 for IPv4 communication
  3. Fallback Patterns: Even when IPv6 host candidates are available, TURN-UDP/IPv6 remains useful when direct IPv6 paths are blocked

TCP/TLS vs. UDP Relays

TURN supports multiple transport protocols, each with distinct trade-offs:

Protocol Advantages Disadvantages Best Use Cases
TURN-UDP Lowest latency
Best media quality
Congestion control works properly
Blocked in some corporate networks
May require fallbacks
Default choice when available
TURN-TCP Higher connection success rate
Works in most networks
Head-of-line blocking
Higher latency
Less efficient for media
Fallback when UDP is blocked
TURN-TLS (over TCP, port 443) Highest success rate
Firewall-friendly
Looks like HTTPS traffic
Highest latency
TLS handshake overhead
CPU cost on server
Last resort for strict networks

Protocol Selection Strategy

For optimal performance and connectivity:

  1. Prioritize UDP: Configure ICE to try UDP first for best media quality
  2. Fallback Chain: If UDP fails, try TCP, then TLS as a last resort
  3. Multiple Server Configuration: Provide multiple TURN server options with different protocols
javascript
            // Example: Configuring multiple TURN transport options
const configuration = {
  iceServers: [
    { urls: 'stun:stun.example.org' },
    { 
      urls: 'turn:turn.example.org:3478?transport=udp',
      username: 'user',
      credential: 'cred'
    },
    { 
      urls: 'turn:turn.example.org:3478?transport=tcp',
      username: 'user',
      credential: 'cred'
    },
    { 
      urls: 'turns:turn.example.org:443',  // TLS over TCP
      username: 'user',
      credential: 'cred'
    }
  ]
};
        

Best Practices for STUN and TURN Deployment

Deploying STUN and TURN servers for production WebRTC applications requires careful planning and implementation. Here are best practices to ensure reliable, secure, and efficient operation.

Server Reliability and Scalability

1. Geographic Distribution

Deploy STUN and TURN servers across multiple geographic regions to:

  • Reduce latency by connecting users to the nearest server
  • Improve resilience through redundancy
  • Balance load across infrastructure

This geographic distribution is a key component of a global edge network strategy, allowing services to be delivered as close to users as possible regardless of their location.

2. Scalable Resource Management

TURN servers in particular face variable resource demands:

  • Implement dynamic scaling based on metrics like connected users and bandwidth utilization
  • Consider containerization for flexible deployment and scaling
  • Establish monitoring to track resource usage patterns and anticipate scaling needs

3. Bandwidth Management

TURN servers relay media traffic, making bandwidth a critical resource:

  • Implement bandwidth limiting policies per user/session
  • Optimize relay protocols for efficient bandwidth usage
  • Consider network provider costs in different regions

4. High Availability Design

To ensure reliable service:

  • Implement server redundancy with failover mechanisms
  • Use load balancers to distribute traffic across server instances
  • Design for graceful degradation during partial outages

Security Implementation

1. Authentication and Authorization

TURN servers must implement robust access controls:

  • Use time-limited credentials following the TURN protocol specifications
  • Implement username and password authentication
  • Consider integrating with existing authentication systems
javascript
            // Example: Generating time-limited TURN credentials
function generateTurnCredentials(username, secretKey, ttlSeconds = 86400) {
  const expiresAt = Math.floor(Date.now() / 1000) + ttlSeconds;
  const credential = `${expiresAt}:${computeHmac(secretKey, `${username}:${expiresAt}`)}`;
  return { username, credential, ttl: ttlSeconds };
}

// On client side:
const turnConfig = {
  urls: ['turn:your-turn-server.com:443'],
  username: turnCredentials.username,
  credential: turnCredentials.credential
};
        

2. Transport Security

Protect STUN and TURN traffic:

  • Use TURN-TLS on port 443 for firewall traversal (WebRTC traffic itself is already DTLS-SRTP)
  • Configure proper certificate validation
  • Remember that DTLS is mandatory for all WebRTC media transport, regardless of TURN protocol

3. Rate Limiting and Abuse Prevention

Prevent service abuse:

  • Implement rate limiting for STUN/TURN requests
  • Monitor for unusual traffic patterns
  • Set reasonable allocation quotas per user

4. Regular Security Updates

Maintain server security:

  • Keep server software updated to address vulnerabilities
  • Regularly audit server configurations for security issues
  • Follow security best practices for the hosting environment

TURN Cost Model Worksheet

Understanding the cost implications of TURN usage is critical for planning WebRTC deployments at scale.

Bandwidth Consumption Calculation

For a typical high-quality video call:

  1. Video Bitrate: ~1 Mbps (720p30) = 450 GB/month if running 24/7
  2. Audio Bitrate: ~50 Kbps = 16.2 GB/month if running 24/7
  3. TURN Usage Factor: ~20-25% of all calls require TURN
  4. Average Call Duration: Varies by application (e.g., 30 minutes for meetings)

Cost Estimation Example

For a service with 10,000 monthly active users:

Assumptions:
- Average call duration: 60 minutes/month/user
- Video bitrate: 1 Mbps
- TURN usage rate: 25%
- Cloud egress cost: $0.08/GB (varies by region)

Calculation:
- Data per user-hour: 1 Mbps × 60 minutes × 60 seconds ÷ 8 bits/byte = 450 MB (per user-hour)
- Monthly TURN data: 450 MB × 10,000 users × 25% TURN rate = 1,125 GB
- Monthly cost: 1,125 GB × $0.08/GB = $90

Per-user cost: $90 ÷ 10,000 = $0.009 per user per month

For applications with higher usage patterns (e.g., contact centers, all-day meetings), these costs scale accordingly. At high volumes, self-hosted TURN infrastructure often becomes more economical than cloud services.

Monitoring and Troubleshooting

Key Metrics to Monitor

For effective operation of STUN and TURN services:

  1. Connection Success Rates

    • Track the percentage of sessions using STUN vs. TURN
    • Monitor ICE connection establishment times
    • Measure connection failure rates
  2. Resource Utilization

    • Bandwidth consumption per server
    • CPU and memory usage
    • Active allocations and connections
  3. Geographic Distribution

    • Connection quality by region
    • Server utilization across locations
    • Regional failure patterns

Troubleshooting Common Issues

  1. High TURN Usage Rates

    • Investigate network configurations causing STUN failures
    • Check for restrictive firewalls blocking UDP
    • Consider if corporate networks are blocking traffic
  2. Excessive Latency

    • Analyze network paths between clients and servers
    • Check for geographic mismatches in server selection
    • Monitor for server overload conditions
  3. Connection Failures

    • Examine ICE candidate generation and exchange
    • Verify server reachability from client networks
    • Check authentication and credential issues
javascript
            // Example: Monitoring code for TURN server usage
peerConnection.addEventListener('icegatheringstatechange', () => {
  if (peerConnection.iceGatheringState === 'complete') {
    let candidateTypes = {
      host: 0,
      srflx: 0, // STUN reflexive
      relay: 0  // TURN relay
    };

    // Count candidate types
    const candidates = peerConnection.localDescription.sdp.split('\r\n')
      .filter(line => line.indexOf('a=candidate:') === 0);

    candidates.forEach(candidate => {
      // Safer extraction of candidate type using regex
      const typeMatch = candidate.match(/ typ ([a-z]+)/);
      if (typeMatch && typeMatch[1]) {
        const type = typeMatch[1];
        if (type === 'host') candidateTypes.host++;
        else if (type === 'srflx') candidateTypes.srflx++;
        else if (type === 'relay') candidateTypes.relay++;
      }
    });

    // Log or send to analytics
    console.log('ICE Candidate Types:', candidateTypes);

    // Monitor selected candidate type when connection established
    peerConnection.addEventListener('iceconnectionstatechange', () => {
      if (peerConnection.iceConnectionState === 'connected') {
        peerConnection.getStats().then(stats => {
          // Find the active candidate pair
          stats.forEach(report => {
            if (report.type === 'candidate-pair' && report.selected) {
              console.log('Selected candidate pair:', report);
              // Log or send to analytics
            }
          });
        });
      }
    });
  }
});
        

QoS & Congestion Control Interplay

The interaction between TURN relays and WebRTC's congestion control mechanisms has important implications for media quality.

Impact on Congestion Control

  1. RTT Measurement: Relayed paths hide the true round-trip time between peers
  2. Bandwidth Estimation: Google Congestion Control (GCC) depends on accurate RTT and packet arrival patterns
  3. Adaptation Lag: TURN relays can delay or distort congestion signals

Optimization Strategies

For better media quality over TURN:

  1. Regional Proximity: Deploy TURN servers in the same region as at least one peer to keep RTT < 150ms
  2. Protocol Selection: Prefer UDP over TCP/TLS when possible for more accurate congestion feedback
  3. Bandwidth Caps: Consider explicit bandwidth limits for TURN sessions to prevent quality oscillation
javascript
            // Example: Setting explicit bitrate limits for relayed connections
peerConnection.addEventListener('iceconnectionstatechange', async () => {
  if (peerConnection.iceConnectionState === 'connected') {
    const stats = await peerConnection.getStats();
    let usingRelay = false;

    stats.forEach(report => {
      if (report.type === 'candidate-pair' && report.selected) {
        if (report.localCandidateType === 'relay' || 
            report.remoteCandidateType === 'relay') {
          usingRelay = true;
        }
      }
    });

    // If using a relay, apply more conservative bandwidth limits
    if (usingRelay) {
      const sender = peerConnection.getSenders().find(s => s.track.kind === 'video');
      if (sender) {
        const params = sender.getParameters();
        params.encodings.forEach(encoding => {
          // Apply more conservative limit for relayed connections
          encoding.maxBitrate = 800000; // 800 kbps
        });
        await sender.setParameters(params);
      }
    }
  }
});
        

IPv4 Exhaustion & Aggressive Peer-Reflexive Pruning

Recent WebRTC implementations have adopted strategies to reduce the pressure on NAT binding tables:

Chrome's Candidate Pruning

Starting with Chrome 123:

  1. Idle Timeout: Server-reflexive (srflx) candidates are dropped after 1 minute of inactivity
  2. Impact: Long-lived data channels may experience connectivity issues after periods of inactivity
  3. Mitigation: Implement application-level keepalive for persistent connections

Configuration Adjustments

For applications requiring long-lived connections:

javascript
            // Example: Configure ICE candidate policy
const configuration = {
  iceServers: [...],
  iceCandidatePoolSize: 10,
  iceTransportPolicy: 'all', // or 'relay' for TURN-only
  // Some browsers may support additional ICE configuration options
  // for keepalive behavior or timeout values
};
        

Practical Recommendations

  1. Connection Monitoring: Implement connection state monitoring to detect dropped connections
  2. Periodic Activity: Send small keepalive packets over data channels during idle periods
  3. Reconnection Logic: Develop robust reconnection mechanisms for interrupted sessions

mDNS ICE Candidates

Modern browsers implement privacy-enhancing techniques for local IP address discovery:

mDNS Obfuscation

  1. Local IP Privacy: Instead of exposing local IPs like 192.168.1.5, browsers generate random mDNS names (e.g., abc123.local)
  2. Impact on Debugging: Makes troubleshooting more challenging as raw IPs aren't visible
  3. Local Network Only: This technique only affects host candidates, not STUN/TURN
  4. Debugging Access: Chromium passes the real private IP in relatedAddress and relatedPort properties in the candidate JSON API (not in SDP), which can be useful for debugging

Identifying mDNS Candidates

javascript
            // Example: Detecting mDNS candidates
peerConnection.addEventListener('icecandidate', event => {
  if (event.candidate) {
    const candidateStr = event.candidate.candidate;
    const isMDNS = candidateStr.includes('.local');

    if (isMDNS) {
      console.log('mDNS candidate detected:', candidateStr);
    }
  }
});
        

Implementation Considerations

  1. Local Testing: mDNS resolution only works on the local network, affecting local development
  2. Logging Patterns: Update logging and monitoring to handle mDNS format
  3. Compatibility: Some older clients might not understand mDNS candidates

Self-Hosting vs. Third-Party Services

When implementing STUN and TURN for WebRTC applications, organizations face a critical decision: build and maintain their own infrastructure or leverage third-party services.

Self-Hosting Considerations

Advantages:

  • Complete control over infrastructure and configurations
  • Potentially lower long-term costs for high-volume applications
  • Ability to customize for specific requirements
  • No dependency on external service providers

Challenges:

  • Requires significant expertise to deploy and maintain
  • Necessitates global infrastructure for optimal performance
  • Demands ongoing monitoring and maintenance
  • Requires scaling infrastructure with usage growth

Third-Party STUN/TURN Services

Advantages:

  • Reduced implementation complexity
  • Global infrastructure already in place
  • Managed scaling and maintenance
  • Often includes additional features like analytics

Challenges:

  • Recurring costs that scale with usage
  • Less control over specific configurations
  • Potential privacy and compliance considerations
  • Dependency on service provider reliability

Hybrid Approaches

Many organizations adopt a hybrid approach:

  • Use public STUN servers (like Google's) which are free and reliable
  • Deploy private TURN servers for sensitive internal applications
  • Leverage third-party TURN services for global coverage

Conclusion

STUN and TURN servers are essential components in the WebRTC infrastructure, working together to overcome the challenges posed by NATs and firewalls. While STUN facilitates direct peer-to-peer connections in many scenarios, TURN provides critical fallback capabilities when direct connections aren't possible.

Effective implementation requires:

  • Understanding the underlying network challenges
  • Properly configuring ICE to coordinate STUN and TURN usage
  • Deploying reliable, scalable, and secure server infrastructure
  • Monitoring usage patterns and connection success rates
  • Balancing performance, cost, and reliability requirements

By following these best practices, developers can build WebRTC applications that deliver reliable real-time communication experiences across diverse network environments.

Additional Resources