SFU Cascading

Introduction to SFU Cascading

SFU Cascading represents the evolution of single-SFU architectures into a distributed, mesh-like network of interconnected SFUs. While the basic SFU architecture provides efficient stream distribution within a single server's reach, cascading extends these capabilities across multiple geographic regions and scales to support massive participant counts. This approach transforms isolated SFU islands into a cohesive, global network capable of handling enterprise-scale video conferencing and streaming applications.

The Core Concept

Imagine a network of regional post offices instead of a single central facility. Each post office serves its local area efficiently, but they're also connected to each other, allowing mail to flow between regions. Similarly, SFU cascading creates a mesh of interconnected servers, each handling local participants while seamlessly exchanging streams with other servers in the network. This distributed approach solves the fundamental limitations of single-SFU deployments while maintaining the architectural advantages that make SFUs superior to MCUs for modern applications.

Understanding the Limitations of Single SFU

To appreciate why SFU cascading is necessary, we must first understand the fundamental limitations of single-SFU deployments. While a single SFU works well for small to medium-sized applications, it encounters significant challenges as applications grow in scale and geographic reach.

Single Point of Failure

The most critical limitation of a single-SFU architecture is its vulnerability to failure. When all participants connect to one server, any technical issue—whether hardware failure, software crash, or network disruption—affects every user in the system. This creates an unacceptable risk for business-critical applications where reliability is paramount.

In production environments, this manifests in several ways:

Server crashes terminate all ongoing calls
Network issues at the data center affect all users globally
Maintenance windows require complete service interruption
No graceful degradation—the system either works perfectly or fails completely

Geographic Latency Challenges

Physical distance introduces unavoidable latency in network communications. When users from different continents connect to a single SFU, some participants inevitably experience high latency due to the physical distance between their location and the server. This latency compounds at each step of the communication path:

Upload Latency: Time for a participant's media to reach the SFU
Processing Latency: SFU's internal routing and forwarding delays
Download Latency: Time for media to reach other participants
Round-Trip Impact: Total latency can exceed acceptable thresholds for real-time communication

For live streaming applications, this geographic latency can be particularly problematic, creating significant delays between the broadcaster and viewers in distant regions.

Scalability Constraints

A single SFU faces inherent scalability limitations that become apparent as participant counts grow:

CPU Limitations: Even though SFUs don't transcode, they still perform significant processing for packet routing, encryption/decryption, and quality management.
Memory Constraints: Each participant requires memory for connection state, buffering, and routing tables.
Network Bandwidth: The server's network interface becomes a bottleneck as more streams flow through it.
Connection Limits: Operating systems and hardware have practical limits on simultaneous connections.

These constraints create a ceiling on the number of participants a single SFU can effectively handle, typically in the hundreds to low thousands range.

How SFU Cascading Works

SFU cascading addresses these limitations by creating a distributed network of interconnected SFUs that work together as a unified system. This architecture enables global scale while maintaining low latency and high reliability.

Architectural Overview

The cascaded SFU architecture consists of several key components working together to create a unified, distributed system:

Regional SFUs: Servers deployed in different geographic locations, each serving local participants
Inter-SFU Connections: High-bandwidth, low-latency links between SFUs for stream exchange
Global State Management: Distributed systems for maintaining consistent state across all SFUs
Intelligent Routing: Algorithms for optimal path selection and stream forwarding

Each regional SFU acts as both a local media server for nearby participants and a relay node in the global network. These SFUs maintain persistent connections with each other, forming a mesh topology that allows streams to flow efficiently between regions. The global state management system ensures that all SFUs have a consistent view of the call state, participant information, and routing tables.

The architecture must balance several competing concerns: minimizing latency by keeping streams close to participants, reducing bandwidth costs by avoiding unnecessary forwarding, maintaining system reliability through redundancy, and ensuring scalability as the network grows.

javascript

            class CascadedSFUNetwork {
  constructor() {
    this.regions = new Map();
    this.interSFUConnections = new Map();
    this.globalState = new DistributedStateManager();
  }

  async initializeRegion(regionId, config) {
    const regionalSFU = new RegionalSFU(regionId, config);
    this.regions.set(regionId, regionalSFU);

    // Establish connections to existing regions
    for (const [existingRegionId, existingSFU] of this.regions) {
      if (existingRegionId !== regionId) {
        await this.createInterSFUConnection(regionalSFU, existingSFU);
      }
    }

    // Register with global state
    await this.globalState.registerSFU(regionId, regionalSFU);
  }

  async createInterSFUConnection(sfuA, sfuB) {
    const connection = new InterSFUConnection(sfuA, sfuB);
    await connection.establish();

    this.interSFUConnections.set(
      `${sfuA.regionId}-${sfuB.regionId}`,
      connection
    );
  }
}

Stream Routing and Forwarding

In a cascaded architecture, stream routing becomes more complex as it must consider both local and remote participants. The system must make intelligent decisions about which streams to forward between SFUs and how to optimize for quality and latency.

Local vs. Remote Routing

When a participant publishes a stream, the local SFU must determine which other SFUs need to receive it:

javascript

            class StreamRouter {
  constructor(localSFU) {
    this.localSFU = localSFU;
    this.routingTable = new Map();
  }

  async handleNewStream(publisherId, stream) {
    // Handle local subscribers
    const localSubscribers = this.findLocalSubscribers(publisherId);
    for (const subscriber of localSubscribers) {
      await this.forwardLocally(stream, subscriber);
    }

    // Determine which remote SFUs need this stream
    const remoteRegions = await this.findRemoteSubscribers(publisherId);
    for (const region of remoteRegions) {
      await this.forwardToRemoteSFU(stream, region);
    }
  }

  async forwardToRemoteSFU(stream, remoteRegion) {
    const connection = this.localSFU.getConnectionTo(remoteRegion);
    if (connection) {
      await connection.forwardStream(stream);
      this.updateRoutingTable(stream.id, remoteRegion);
    }
  }
}

Global State Management

Maintaining consistent state across a distributed system is one of the most challenging aspects of SFU cascading. Each SFU must have an accurate view of the global call state while minimizing synchronization overhead.

The challenge lies in the CAP theorem - in a distributed system, you can only guarantee two of three properties: Consistency, Availability, and Partition tolerance. For video conferencing, availability is critical (calls must continue working), and partition tolerance is necessary (network splits can happen), so we must make trade-offs with consistency.

Most cascaded SFU systems use eventual consistency models where state updates propagate asynchronously. This means there may be brief periods where different SFUs have slightly different views of the call state, but they eventually converge to the same state. The key is designing the system so these temporary inconsistencies don't disrupt the user experience.

Common approaches include:

Using distributed databases like Redis or Cassandra for state storage
Implementing vector clocks or version vectors for conflict detection
Employing consensus algorithms for critical state changes
Using gossip protocols for efficient state propagation

javascript

            class DistributedStateManager {
  constructor() {
    this.redisClient = new RedisClient();
    this.localCache = new Map();
    this.syncInterval = 100; // ms
  }

  async publishStateUpdate(update) {
    // Publish to Redis stream
    await this.redisClient.xadd('call-state-updates', '*', {
      type: update.type,
      data: JSON.stringify(update.data),
      timestamp: Date.now()
    });

    // Update local cache
    this.localCache.set(update.key, update.data);
  }

  async syncState() {
    // Poll for updates every syncInterval
    setInterval(async () => {
      const updates = await this.redisClient.xread(
        'STREAMS', 'call-state-updates', this.lastReadId
      );

      for (const update of updates) {
        await this.applyStateUpdate(update);
      }
    }, this.syncInterval);
  }
}

Advanced Cascading Features

Dynamic Load Balancing

SFU cascading enables sophisticated load balancing strategies that distribute participants across the network based on multiple factors. Unlike traditional load balancers that only consider server load, cascaded SFU load balancing must account for geographic proximity, network conditions, and inter-SFU communication costs.

The load balancing decision process involves several steps:

Candidate Selection: Identify SFUs that can serve the participant
Metric Collection: Gather performance data from each candidate
Score Calculation: Apply weighted scoring based on multiple factors
Assignment Decision: Select the optimal SFU and route the participant

Factors considered in the scoring algorithm typically include:

Geographic proximity (latency to participant)
Current server load (CPU, memory, bandwidth usage)
Network path quality (packet loss, jitter)
Cost considerations (bandwidth pricing in different regions)
Existing participant distribution (keeping related users together)

javascript

            class LoadBalancer {
  constructor(networkTopology) {
    this.topology = networkTopology;
    this.loadMetrics = new Map();
  }

  async assignParticipantToSFU(participant) {
    const candidateSFUs = await this.findCandidateSFUs(participant);

    // Evaluate each SFU based on multiple criteria
    const scores = candidateSFUs.map(sfu => ({
      sfu: sfu,
      score: this.calculateScore(sfu, participant)
    }));

    // Select the best SFU
    const bestSFU = scores.reduce((best, current) => 
      current.score > best.score ? current : best
    ).sfu;

    return bestSFU;
  }

  calculateScore(sfu, participant) {
    const factors = {
      proximity: this.calculateProximityScore(sfu, participant),
      load: this.calculateLoadScore(sfu),
      reliability: this.calculateReliabilityScore(sfu),
      cost: this.calculateCostScore(sfu)
    };

    // Weighted scoring
    return factors.proximity * 0.4 +
           factors.load * 0.3 +
           factors.reliability * 0.2 +
           factors.cost * 0.1;
  }
}

Optimal Path Selection

When streams traverse multiple SFUs, the system must select optimal paths to minimize latency and maximize quality:

javascript

            class PathOptimizer {
  constructor(networkGraph) {
    this.graph = networkGraph;
  }

  findOptimalPath(source, destination) {
    // Use Dijkstra's algorithm with custom weight function
    return this.dijkstra(source, destination, (edge) => {
      // Weight based on latency, bandwidth, and reliability
      return edge.latency * 0.5 +
             (1 / edge.bandwidth) * 0.3 +
             (1 - edge.reliability) * 0.2;
    });
  }

  async optimizeExistingPaths() {
    for (const [streamId, path] of this.activePaths) {
      const currentCost = this.calculatePathCost(path);
      const optimalPath = this.findOptimalPath(path.source, path.destination);
      const optimalCost = this.calculatePathCost(optimalPath);

      // Switch to better path if significant improvement
      if (optimalCost < currentCost * 0.8) {
        await this.switchPath(streamId, optimalPath);
      }
    }
  }
}

Failover and Redundancy

Cascaded architectures provide natural redundancy that can be leveraged for automatic failover:

javascript

            class FailoverManager {
  constructor(sfuNetwork) {
    this.network = sfuNetwork;
    this.healthChecks = new Map();
  }

  async monitorSFUHealth() {
    for (const [regionId, sfu] of this.network.regions) {
      const health = await this.checkHealth(sfu);

      if (!health.isHealthy) {
        await this.handleSFUFailure(sfu);
      }

      this.healthChecks.set(regionId, health);
    }
  }

  async handleSFUFailure(failedSFU) {
    // Find affected participants
    const affectedParticipants = failedSFU.getConnectedParticipants();

    // Redistribute participants to healthy SFUs
    for (const participant of affectedParticipants) {
      const newSFU = await this.findFailoverSFU(participant, failedSFU);
      await this.migrateParticipant(participant, failedSFU, newSFU);
    }

    // Update routing tables
    await this.updateGlobalRouting(failedSFU);
  }
}

Real-World Implementation Challenges

Implementing SFU cascading presents several technical challenges that require careful consideration and sophisticated solutions.

DTLS Stream Management

Managing DTLS (Datagram Transport Layer Security) streams between SFUs is particularly challenging because DTLS was designed for securing communication between two endpoints, not for relay scenarios. When forwarding media between SFUs, several complex issues arise:

Security Context: Each SFU needs to decrypt incoming streams and re-encrypt them for forwarding, creating potential security vulnerabilities
Stream Identification: With multiple streams flowing between SFUs, accurately identifying and routing each stream becomes critical
SSRC Mapping: Source identifiers (SSRCs) must be properly mapped when forwarding to prevent conflicts
Performance Impact: The decrypt-encrypt cycle adds latency and CPU overhead

The solution involves creating secure tunnels between SFUs while maintaining proper stream identification and minimizing the performance impact of security operations. This requires careful key management, efficient packet processing, and sophisticated routing logic.

javascript

            class DTLSManager {
  constructor() {
    this.dtlsConnections = new Map();
    this.streamIdentifiers = new Map();
  }

  async establishDTLSConnection(remoteSFU) {
    const dtlsParams = await this.generateDTLSParameters();

    // Exchange fingerprints and establish secure connection
    const connection = new DTLSConnection(dtlsParams);
    await connection.handshake(remoteSFU);

    this.dtlsConnections.set(remoteSFU.id, connection);
    return connection;
  }

  async forwardMediaStream(stream, connection) {
    // Tag streams with unique identifiers for remote SFU
    const streamId = this.generateStreamIdentifier(stream);

    // Set up RTP/RTCP forwarding with proper SSRC mapping
    const forwarder = new MediaForwarder(stream, connection);
    await forwarder.setupSSRCMapping();

    this.streamIdentifiers.set(streamId, forwarder);
    await forwarder.start();
  }
}

Bandwidth Management

Inter-SFU links require sophisticated bandwidth management to prevent congestion. Unlike client connections where bandwidth constraints are relatively simple, inter-SFU links carry aggregated traffic from multiple participants, making bandwidth management significantly more complex.

The challenges include:

Aggregate Traffic: Each link carries streams from many participants
Priority Management: Different streams have different importance levels
Congestion Detection: Identifying bottlenecks before they impact quality
Fair Allocation: Ensuring equitable distribution of available bandwidth

Effective bandwidth management requires continuous monitoring of link conditions, intelligent prioritization of streams, and rapid response to congestion events. The system must balance competing demands while maintaining quality for all participants.

Key strategies include:

Implementing congestion control algorithms like BBR or GCC
Using priority queues for different types of traffic
Applying selective forwarding to reduce bandwidth usage
Dynamically adjusting quality based on available capacity

javascript

            class BandwidthManager {
  constructor() {
    this.congestionControl = new CongestionController();
    this.bandwidthAllocator = new BandwidthAllocator();
  }

  async manageBandwidth(interSFUConnection) {
    // Monitor connection statistics
    const stats = await interSFUConnection.getStats();

    // Detect congestion
    const congestionLevel = this.congestionControl.analyze(stats);

    if (congestionLevel > 0.7) {
      // Apply congestion control
      await this.applyCongestionControl(interSFUConnection);
    }

    // Reallocate bandwidth among streams
    await this.reallocateBandwidth(interSFUConnection);
  }

  async applyCongestionControl(connection) {
    const streams = connection.getActiveStreams();

    for (const stream of streams) {
      // Reduce bitrate or drop layers based on priority
      if (stream.priority === 'low') {
        await stream.dropTemporalLayer();
      } else {
        await stream.reduceBitrate(0.8);
      }
    }
  }
}

State Synchronization

Maintaining consistent state across distributed SFUs requires careful synchronization to handle the inherent challenges of distributed systems. The primary challenge is dealing with network partitions, concurrent updates, and the need for eventual consistency.

Key synchronization challenges include:

Conflict Resolution: When multiple SFUs update the same state simultaneously
Ordering Guarantees: Ensuring updates are applied in the correct sequence
Partition Tolerance: Handling network splits gracefully
Recovery Mechanisms: Restoring consistency after failures

The synchronization system must detect conflicts reliably, resolve them deterministically, and propagate resolutions efficiently. Common techniques include:

Version vectors or vector clocks for tracking causality
Conflict-free replicated data types (CRDTs) for automatic resolution
Consensus protocols for critical operations
Event sourcing for reliable state reconstruction

javascript

            class StateSynchronizer {
  constructor() {
    this.versionVector = new VersionVector();
    this.conflictResolver = new ConflictResolver();
  }

  async synchronizeState(localState, remoteStates) {
    // Detect conflicts using version vectors
    const conflicts = this.detectConflicts(localState, remoteStates);

    if (conflicts.length > 0) {
      // Resolve conflicts using application-specific logic
      const resolvedState = await this.conflictResolver.resolve(conflicts);

      // Apply resolved state
      await this.applyState(resolvedState);

      // Broadcast resolution to other SFUs
      await this.broadcastResolution(resolvedState);
    }
  }

  detectConflicts(localState, remoteStates) {
    const conflicts = [];

    for (const [key, value] of localState) {
      for (const remoteState of remoteStates) {
        if (remoteState.has(key)) {
          const remoteValue = remoteState.get(key);
          if (!this.versionVector.happensBefore(value, remoteValue)) {
            conflicts.push({ key, localValue: value, remoteValue });
          }
        }
      }
    }

    return conflicts;
  }
}

How Stream uses SFU Cascading

A video-calling experience based on WebRTC is notoriously difficult to get right, and building it in-house is a massive task for any development team. There are several companies that have spent a lot of time and resources trying to build a solid calling experience and still didn't get it quite right. There are also other variations of products based on WebRTC, such as live streaming and audio calling, that contain their own nuances.

For us, it was almost obvious that we needed to make creating video experiences easier by creating a robust video backend, SDKs on all popular platforms, and components that cover all aspects of a video calling experience. We also decided to cover all popular use-cases, such as meet-style video calling, end-to-end calling experiences integrating with OS call systems, livestreaming (both WebRTC and HLS-based), and audiorooms.

Building all the aforementioned features and ensuring scalability required a lot of thought into the infrastructure that we needed to use to build our systems. We considered several architectures but eventually settled on SFUs with cascading as our primary choice. This was important since users around the world connecting to a single SFU would be problematic and would increase latency for everyone on a single call. The SFUs in our systems work like a mesh, and each individual SFU talks to and relays information to all other SFUs about the participant connected to the node. This ensures we can do adequate load-balancing and add redundancy while reducing latency for all users.

There were several technical challenges along the way. For one, making the SFUs relay information to all other nodes is not an easy task to achieve. We used Redis streams to publish updates to the call state and made every node check for new updates every 100ms to maintain a valid call state. The DTLS (Datagram Transport Layer Security) streams used to transfer video and audio streams between SFUs are difficult to work with. Distinguishing all tracks and layers sent across SFUs was critical and needed some work to achieve. Debugging SFU issues is difficult on its own, and adding cascading also adds a layer of complexity. Then there was the issue of bandwidth: video is egress-heavy since every incoming video stream needs to be sent to all participants, increasing the outgoing bandwidth requirements. There were also additional challenges, such as adding a congestion control algorithm to the cascading implementation.

There were additional technical challenges that were too verbose to note down here, but the gist of it is that building video is a complex challenge. We pushed hard to make sure any development team can build the video experiences they desire without facing the complex issues associated with WebRTC and perfecting cascading.

Best Practices for SFU Cascading

Network Design

When designing a cascaded SFU network, consider these best practices:

Regional Placement: Deploy SFUs close to user concentrations
Redundancy Planning: Ensure each region has backup capacity
Network Topology: Choose between full mesh, hub-and-spoke, or hybrid
Capacity Planning: Size each SFU based on expected regional load

Performance Optimization

Optimize cascaded SFU performance through:

javascript

            class PerformanceOptimizer {
  optimizeForLatency(network) {
    // Minimize inter-SFU hops
    this.reducePathLength(network);

    // Prioritize direct connections for active speakers
    this.optimizeActiveSpeakerRouting(network);

    // Use predictive routing for mobile participants
    this.enablePredictiveRouting(network);
  }

  optimizeForBandwidth(network) {
    // Enable selective forwarding between SFUs
    this.enableSelectiveForwarding(network);

    // Implement bandwidth-aware routing
    this.configureBandwidthAwareRouting(network);

    // Use adaptive bitrate for inter-SFU links
    this.enableAdaptiveBitrate(network);
  }
}

Monitoring and Debugging

Effective monitoring is crucial for cascaded systems due to their distributed nature and complex interactions. Unlike single-SFU deployments where monitoring is straightforward, cascaded systems require comprehensive observability across multiple dimensions.

The monitoring system must track:

Network Health: Latency, packet loss, and bandwidth between SFUs
Server Metrics: CPU, memory, and I/O usage at each node
Call Quality: End-to-end metrics for participant experience
System Behavior: Routing decisions, failover events, and state consistency

Debugging challenges in cascaded systems include:

Tracing issues across multiple servers
Correlating events in distributed logs
Identifying performance bottlenecks
Understanding cascade effects of failures

A robust monitoring solution provides:

Real-time dashboards for system health
Automated alerting for anomalies
Distributed tracing for request flows
Historical data for trend analysis

javascript

            class CascadeMonitor {
  constructor() {
    this.metrics = new MetricsCollector();
    this.alerts = new AlertManager();
  }

  async monitorNetwork() {
    // Collect metrics from all SFUs
    const networkMetrics = await this.collectNetworkMetrics();

    // Analyze for anomalies
    const anomalies = this.detectAnomalies(networkMetrics);

    // Generate alerts for critical issues
    if (anomalies.length > 0) {
      await this.alerts.notify(anomalies);
    }

    // Update dashboards
    await this.updateDashboards(networkMetrics);
  }

  collectNetworkMetrics() {
    return {
      interSFULatency: this.measureInterSFULatency(),
      participantDistribution: this.getParticipantDistribution(),
      bandwidthUtilization: this.getBandwidthUtilization(),
      errorRates: this.getErrorRates()
    };
  }
}

Conclusion

SFU cascading transforms the limitations of single-SFU architectures into a scalable, globally distributed solution for modern video applications. By creating a mesh of interconnected SFUs, this architecture enables low-latency communication across geographic boundaries while maintaining the efficiency advantages of selective forwarding.

The implementation challenges—from DTLS stream management to state synchronization—are significant but manageable with proper architectural design and robust engineering practices. Stream's implementation demonstrates that these challenges can be overcome to provide a reliable, developer-friendly video infrastructure.

For developers building global-scale video applications, SFU cascading offers the best balance of performance, scalability, and reliability. While the complexity of implementation is higher than single-SFU deployments, the benefits in terms of user experience and system resilience make it the preferred architecture for enterprise-grade video solutions.

SFU Cascading

Introduction to SFU Cascading

The Core Concept

Understanding the Limitations of Single SFU

Single Point of Failure

Geographic Latency Challenges

Scalability Constraints

How SFU Cascading Works

Architectural Overview

Stream Routing and Forwarding

Local vs. Remote Routing

Global State Management

Advanced Cascading Features

Dynamic Load Balancing

Optimal Path Selection

Failover and Redundancy

Real-World Implementation Challenges

DTLS Stream Management

Bandwidth Management

State Synchronization

How Stream uses SFU Cascading

Best Practices for SFU Cascading

Network Design

Performance Optimization

Monitoring and Debugging

Conclusion

Further Reading