Introduction to SFU Cascading
SFU Cascading represents the evolution of single-SFU architectures into a distributed, mesh-like network of interconnected SFUs. While the basic SFU architecture provides efficient stream distribution within a single server's reach, cascading extends these capabilities across multiple geographic regions and scales to support massive participant counts. This approach transforms isolated SFU islands into a cohesive, global network capable of handling enterprise-scale video conferencing and streaming applications.
The Core Concept
Imagine a network of regional post offices instead of a single central facility. Each post office serves its local area efficiently, but they're also connected to each other, allowing mail to flow between regions. Similarly, SFU cascading creates a mesh of interconnected servers, each handling local participants while seamlessly exchanging streams with other servers in the network. This distributed approach solves the fundamental limitations of single-SFU deployments while maintaining the architectural advantages that make SFUs superior to MCUs for modern applications.
Understanding the Limitations of Single SFU
To appreciate why SFU cascading is necessary, we must first understand the fundamental limitations of single-SFU deployments. While a single SFU works well for small to medium-sized applications, it encounters significant challenges as applications grow in scale and geographic reach.
Single Point of Failure
The most critical limitation of a single-SFU architecture is its vulnerability to failure. When all participants connect to one server, any technical issue—whether hardware failure, software crash, or network disruption—affects every user in the system. This creates an unacceptable risk for business-critical applications where reliability is paramount.
In production environments, this manifests in several ways:
- Server crashes terminate all ongoing calls
- Network issues at the data center affect all users globally
- Maintenance windows require complete service interruption
- No graceful degradation—the system either works perfectly or fails completely
Geographic Latency Challenges
Physical distance introduces unavoidable latency in network communications. When users from different continents connect to a single SFU, some participants inevitably experience high latency due to the physical distance between their location and the server. This latency compounds at each step of the communication path:
- Upload Latency: Time for a participant's media to reach the SFU
- Processing Latency: SFU's internal routing and forwarding delays
- Download Latency: Time for media to reach other participants
- Round-Trip Impact: Total latency can exceed acceptable thresholds for real-time communication
For live streaming applications, this geographic latency can be particularly problematic, creating significant delays between the broadcaster and viewers in distant regions.
Scalability Constraints
A single SFU faces inherent scalability limitations that become apparent as participant counts grow:
- CPU Limitations: Even though SFUs don't transcode, they still perform significant processing for packet routing, encryption/decryption, and quality management.
- Memory Constraints: Each participant requires memory for connection state, buffering, and routing tables.
- Network Bandwidth: The server's network interface becomes a bottleneck as more streams flow through it.
- Connection Limits: Operating systems and hardware have practical limits on simultaneous connections.
These constraints create a ceiling on the number of participants a single SFU can effectively handle, typically in the hundreds to low thousands range.
How SFU Cascading Works
SFU cascading addresses these limitations by creating a distributed network of interconnected SFUs that work together as a unified system. This architecture enables global scale while maintaining low latency and high reliability.
Architectural Overview
The cascaded SFU architecture consists of several key components working together to create a unified, distributed system:
- Regional SFUs: Servers deployed in different geographic locations, each serving local participants
- Inter-SFU Connections: High-bandwidth, low-latency links between SFUs for stream exchange
- Global State Management: Distributed systems for maintaining consistent state across all SFUs
- Intelligent Routing: Algorithms for optimal path selection and stream forwarding
Each regional SFU acts as both a local media server for nearby participants and a relay node in the global network. These SFUs maintain persistent connections with each other, forming a mesh topology that allows streams to flow efficiently between regions. The global state management system ensures that all SFUs have a consistent view of the call state, participant information, and routing tables.
The architecture must balance several competing concerns: minimizing latency by keeping streams close to participants, reducing bandwidth costs by avoiding unnecessary forwarding, maintaining system reliability through redundancy, and ensuring scalability as the network grows.
class CascadedSFUNetwork {
constructor() {
this.regions = new Map();
this.interSFUConnections = new Map();
this.globalState = new DistributedStateManager();
}
async initializeRegion(regionId, config) {
const regionalSFU = new RegionalSFU(regionId, config);
this.regions.set(regionId, regionalSFU);
// Establish connections to existing regions
for (const [existingRegionId, existingSFU] of this.regions) {
if (existingRegionId !== regionId) {
await this.createInterSFUConnection(regionalSFU, existingSFU);
}
}
// Register with global state
await this.globalState.registerSFU(regionId, regionalSFU);
}
async createInterSFUConnection(sfuA, sfuB) {
const connection = new InterSFUConnection(sfuA, sfuB);
await connection.establish();
this.interSFUConnections.set(
`${sfuA.regionId}-${sfuB.regionId}`,
connection
);
}
}
Stream Routing and Forwarding
In a cascaded architecture, stream routing becomes more complex as it must consider both local and remote participants. The system must make intelligent decisions about which streams to forward between SFUs and how to optimize for quality and latency.
Local vs. Remote Routing
When a participant publishes a stream, the local SFU must determine which other SFUs need to receive it:
class StreamRouter {
constructor(localSFU) {
this.localSFU = localSFU;
this.routingTable = new Map();
}
async handleNewStream(publisherId, stream) {
// Handle local subscribers
const localSubscribers = this.findLocalSubscribers(publisherId);
for (const subscriber of localSubscribers) {
await this.forwardLocally(stream, subscriber);
}
// Determine which remote SFUs need this stream
const remoteRegions = await this.findRemoteSubscribers(publisherId);
for (const region of remoteRegions) {
await this.forwardToRemoteSFU(stream, region);
}
}
async forwardToRemoteSFU(stream, remoteRegion) {
const connection = this.localSFU.getConnectionTo(remoteRegion);
if (connection) {
await connection.forwardStream(stream);
this.updateRoutingTable(stream.id, remoteRegion);
}
}
}
Global State Management
Maintaining consistent state across a distributed system is one of the most challenging aspects of SFU cascading. Each SFU must have an accurate view of the global call state while minimizing synchronization overhead.
The challenge lies in the CAP theorem - in a distributed system, you can only guarantee two of three properties: Consistency, Availability, and Partition tolerance. For video conferencing, availability is critical (calls must continue working), and partition tolerance is necessary (network splits can happen), so we must make trade-offs with consistency.
Most cascaded SFU systems use eventual consistency models where state updates propagate asynchronously. This means there may be brief periods where different SFUs have slightly different views of the call state, but they eventually converge to the same state. The key is designing the system so these temporary inconsistencies don't disrupt the user experience.
Common approaches include:
- Using distributed databases like Redis or Cassandra for state storage
- Implementing vector clocks or version vectors for conflict detection
- Employing consensus algorithms for critical state changes
- Using gossip protocols for efficient state propagation
class DistributedStateManager {
constructor() {
this.redisClient = new RedisClient();
this.localCache = new Map();
this.syncInterval = 100; // ms
}
async publishStateUpdate(update) {
// Publish to Redis stream
await this.redisClient.xadd('call-state-updates', '*', {
type: update.type,
data: JSON.stringify(update.data),
timestamp: Date.now()
});
// Update local cache
this.localCache.set(update.key, update.data);
}
async syncState() {
// Poll for updates every syncInterval
setInterval(async () => {
const updates = await this.redisClient.xread(
'STREAMS', 'call-state-updates', this.lastReadId
);
for (const update of updates) {
await this.applyStateUpdate(update);
}
}, this.syncInterval);
}
}
Advanced Cascading Features
Dynamic Load Balancing
SFU cascading enables sophisticated load balancing strategies that distribute participants across the network based on multiple factors. Unlike traditional load balancers that only consider server load, cascaded SFU load balancing must account for geographic proximity, network conditions, and inter-SFU communication costs.
The load balancing decision process involves several steps:
- Candidate Selection: Identify SFUs that can serve the participant
- Metric Collection: Gather performance data from each candidate
- Score Calculation: Apply weighted scoring based on multiple factors
- Assignment Decision: Select the optimal SFU and route the participant
Factors considered in the scoring algorithm typically include:
- Geographic proximity (latency to participant)
- Current server load (CPU, memory, bandwidth usage)
- Network path quality (packet loss, jitter)
- Cost considerations (bandwidth pricing in different regions)
- Existing participant distribution (keeping related users together)
class LoadBalancer {
constructor(networkTopology) {
this.topology = networkTopology;
this.loadMetrics = new Map();
}
async assignParticipantToSFU(participant) {
const candidateSFUs = await this.findCandidateSFUs(participant);
// Evaluate each SFU based on multiple criteria
const scores = candidateSFUs.map(sfu => ({
sfu: sfu,
score: this.calculateScore(sfu, participant)
}));
// Select the best SFU
const bestSFU = scores.reduce((best, current) =>
current.score > best.score ? current : best
).sfu;
return bestSFU;
}
calculateScore(sfu, participant) {
const factors = {
proximity: this.calculateProximityScore(sfu, participant),
load: this.calculateLoadScore(sfu),
reliability: this.calculateReliabilityScore(sfu),
cost: this.calculateCostScore(sfu)
};
// Weighted scoring
return factors.proximity * 0.4 +
factors.load * 0.3 +
factors.reliability * 0.2 +
factors.cost * 0.1;
}
}
Optimal Path Selection
When streams traverse multiple SFUs, the system must select optimal paths to minimize latency and maximize quality:
class PathOptimizer {
constructor(networkGraph) {
this.graph = networkGraph;
}
findOptimalPath(source, destination) {
// Use Dijkstra's algorithm with custom weight function
return this.dijkstra(source, destination, (edge) => {
// Weight based on latency, bandwidth, and reliability
return edge.latency * 0.5 +
(1 / edge.bandwidth) * 0.3 +
(1 - edge.reliability) * 0.2;
});
}
async optimizeExistingPaths() {
for (const [streamId, path] of this.activePaths) {
const currentCost = this.calculatePathCost(path);
const optimalPath = this.findOptimalPath(path.source, path.destination);
const optimalCost = this.calculatePathCost(optimalPath);
// Switch to better path if significant improvement
if (optimalCost < currentCost * 0.8) {
await this.switchPath(streamId, optimalPath);
}
}
}
}
Failover and Redundancy
Cascaded architectures provide natural redundancy that can be leveraged for automatic failover:
class FailoverManager {
constructor(sfuNetwork) {
this.network = sfuNetwork;
this.healthChecks = new Map();
}
async monitorSFUHealth() {
for (const [regionId, sfu] of this.network.regions) {
const health = await this.checkHealth(sfu);
if (!health.isHealthy) {
await this.handleSFUFailure(sfu);
}
this.healthChecks.set(regionId, health);
}
}
async handleSFUFailure(failedSFU) {
// Find affected participants
const affectedParticipants = failedSFU.getConnectedParticipants();
// Redistribute participants to healthy SFUs
for (const participant of affectedParticipants) {
const newSFU = await this.findFailoverSFU(participant, failedSFU);
await this.migrateParticipant(participant, failedSFU, newSFU);
}
// Update routing tables
await this.updateGlobalRouting(failedSFU);
}
}
Real-World Implementation Challenges
Implementing SFU cascading presents several technical challenges that require careful consideration and sophisticated solutions.
DTLS Stream Management
Managing DTLS (Datagram Transport Layer Security) streams between SFUs is particularly challenging because DTLS was designed for securing communication between two endpoints, not for relay scenarios. When forwarding media between SFUs, several complex issues arise:
- Security Context: Each SFU needs to decrypt incoming streams and re-encrypt them for forwarding, creating potential security vulnerabilities
- Stream Identification: With multiple streams flowing between SFUs, accurately identifying and routing each stream becomes critical
- SSRC Mapping: Source identifiers (SSRCs) must be properly mapped when forwarding to prevent conflicts
- Performance Impact: The decrypt-encrypt cycle adds latency and CPU overhead
The solution involves creating secure tunnels between SFUs while maintaining proper stream identification and minimizing the performance impact of security operations. This requires careful key management, efficient packet processing, and sophisticated routing logic.
class DTLSManager {
constructor() {
this.dtlsConnections = new Map();
this.streamIdentifiers = new Map();
}
async establishDTLSConnection(remoteSFU) {
const dtlsParams = await this.generateDTLSParameters();
// Exchange fingerprints and establish secure connection
const connection = new DTLSConnection(dtlsParams);
await connection.handshake(remoteSFU);
this.dtlsConnections.set(remoteSFU.id, connection);
return connection;
}
async forwardMediaStream(stream, connection) {
// Tag streams with unique identifiers for remote SFU
const streamId = this.generateStreamIdentifier(stream);
// Set up RTP/RTCP forwarding with proper SSRC mapping
const forwarder = new MediaForwarder(stream, connection);
await forwarder.setupSSRCMapping();
this.streamIdentifiers.set(streamId, forwarder);
await forwarder.start();
}
}
Bandwidth Management
Inter-SFU links require sophisticated bandwidth management to prevent congestion. Unlike client connections where bandwidth constraints are relatively simple, inter-SFU links carry aggregated traffic from multiple participants, making bandwidth management significantly more complex.
The challenges include:
- Aggregate Traffic: Each link carries streams from many participants
- Priority Management: Different streams have different importance levels
- Congestion Detection: Identifying bottlenecks before they impact quality
- Fair Allocation: Ensuring equitable distribution of available bandwidth
Effective bandwidth management requires continuous monitoring of link conditions, intelligent prioritization of streams, and rapid response to congestion events. The system must balance competing demands while maintaining quality for all participants.
Key strategies include:
- Implementing congestion control algorithms like BBR or GCC
- Using priority queues for different types of traffic
- Applying selective forwarding to reduce bandwidth usage
- Dynamically adjusting quality based on available capacity
class BandwidthManager {
constructor() {
this.congestionControl = new CongestionController();
this.bandwidthAllocator = new BandwidthAllocator();
}
async manageBandwidth(interSFUConnection) {
// Monitor connection statistics
const stats = await interSFUConnection.getStats();
// Detect congestion
const congestionLevel = this.congestionControl.analyze(stats);
if (congestionLevel > 0.7) {
// Apply congestion control
await this.applyCongestionControl(interSFUConnection);
}
// Reallocate bandwidth among streams
await this.reallocateBandwidth(interSFUConnection);
}
async applyCongestionControl(connection) {
const streams = connection.getActiveStreams();
for (const stream of streams) {
// Reduce bitrate or drop layers based on priority
if (stream.priority === 'low') {
await stream.dropTemporalLayer();
} else {
await stream.reduceBitrate(0.8);
}
}
}
}
State Synchronization
Maintaining consistent state across distributed SFUs requires careful synchronization to handle the inherent challenges of distributed systems. The primary challenge is dealing with network partitions, concurrent updates, and the need for eventual consistency.
Key synchronization challenges include:
- Conflict Resolution: When multiple SFUs update the same state simultaneously
- Ordering Guarantees: Ensuring updates are applied in the correct sequence
- Partition Tolerance: Handling network splits gracefully
- Recovery Mechanisms: Restoring consistency after failures
The synchronization system must detect conflicts reliably, resolve them deterministically, and propagate resolutions efficiently. Common techniques include:
- Version vectors or vector clocks for tracking causality
- Conflict-free replicated data types (CRDTs) for automatic resolution
- Consensus protocols for critical operations
- Event sourcing for reliable state reconstruction
class StateSynchronizer {
constructor() {
this.versionVector = new VersionVector();
this.conflictResolver = new ConflictResolver();
}
async synchronizeState(localState, remoteStates) {
// Detect conflicts using version vectors
const conflicts = this.detectConflicts(localState, remoteStates);
if (conflicts.length > 0) {
// Resolve conflicts using application-specific logic
const resolvedState = await this.conflictResolver.resolve(conflicts);
// Apply resolved state
await this.applyState(resolvedState);
// Broadcast resolution to other SFUs
await this.broadcastResolution(resolvedState);
}
}
detectConflicts(localState, remoteStates) {
const conflicts = [];
for (const [key, value] of localState) {
for (const remoteState of remoteStates) {
if (remoteState.has(key)) {
const remoteValue = remoteState.get(key);
if (!this.versionVector.happensBefore(value, remoteValue)) {
conflicts.push({ key, localValue: value, remoteValue });
}
}
}
}
return conflicts;
}
}
How Stream uses SFU Cascading
A video-calling experience based on WebRTC is notoriously difficult to get right, and building it in-house is a massive task for any development team. There are several companies that have spent a lot of time and resources trying to build a solid calling experience and still didn't get it quite right. There are also other variations of products based on WebRTC, such as live streaming and audio calling, that contain their own nuances.
For us, it was almost obvious that we needed to make creating video experiences easier by creating a robust video backend, SDKs on all popular platforms, and components that cover all aspects of a video calling experience. We also decided to cover all popular use-cases, such as meet-style video calling, end-to-end calling experiences integrating with OS call systems, livestreaming (both WebRTC and HLS-based), and audiorooms.
Building all the aforementioned features and ensuring scalability required a lot of thought into the infrastructure that we needed to use to build our systems. We considered several architectures but eventually settled on SFUs with cascading as our primary choice. This was important since users around the world connecting to a single SFU would be problematic and would increase latency for everyone on a single call. The SFUs in our systems work like a mesh, and each individual SFU talks to and relays information to all other SFUs about the participant connected to the node. This ensures we can do adequate load-balancing and add redundancy while reducing latency for all users.
There were several technical challenges along the way. For one, making the SFUs relay information to all other nodes is not an easy task to achieve. We used Redis streams to publish updates to the call state and made every node check for new updates every 100ms to maintain a valid call state. The DTLS (Datagram Transport Layer Security) streams used to transfer video and audio streams between SFUs are difficult to work with. Distinguishing all tracks and layers sent across SFUs was critical and needed some work to achieve. Debugging SFU issues is difficult on its own, and adding cascading also adds a layer of complexity. Then there was the issue of bandwidth: video is egress-heavy since every incoming video stream needs to be sent to all participants, increasing the outgoing bandwidth requirements. There were also additional challenges, such as adding a congestion control algorithm to the cascading implementation.
There were additional technical challenges that were too verbose to note down here, but the gist of it is that building video is a complex challenge. We pushed hard to make sure any development team can build the video experiences they desire without facing the complex issues associated with WebRTC and perfecting cascading.
Best Practices for SFU Cascading
Network Design
When designing a cascaded SFU network, consider these best practices:
- Regional Placement: Deploy SFUs close to user concentrations
- Redundancy Planning: Ensure each region has backup capacity
- Network Topology: Choose between full mesh, hub-and-spoke, or hybrid
- Capacity Planning: Size each SFU based on expected regional load
Performance Optimization
Optimize cascaded SFU performance through:
class PerformanceOptimizer {
optimizeForLatency(network) {
// Minimize inter-SFU hops
this.reducePathLength(network);
// Prioritize direct connections for active speakers
this.optimizeActiveSpeakerRouting(network);
// Use predictive routing for mobile participants
this.enablePredictiveRouting(network);
}
optimizeForBandwidth(network) {
// Enable selective forwarding between SFUs
this.enableSelectiveForwarding(network);
// Implement bandwidth-aware routing
this.configureBandwidthAwareRouting(network);
// Use adaptive bitrate for inter-SFU links
this.enableAdaptiveBitrate(network);
}
}
Monitoring and Debugging
Effective monitoring is crucial for cascaded systems due to their distributed nature and complex interactions. Unlike single-SFU deployments where monitoring is straightforward, cascaded systems require comprehensive observability across multiple dimensions.
The monitoring system must track:
- Network Health: Latency, packet loss, and bandwidth between SFUs
- Server Metrics: CPU, memory, and I/O usage at each node
- Call Quality: End-to-end metrics for participant experience
- System Behavior: Routing decisions, failover events, and state consistency
Debugging challenges in cascaded systems include:
- Tracing issues across multiple servers
- Correlating events in distributed logs
- Identifying performance bottlenecks
- Understanding cascade effects of failures
A robust monitoring solution provides:
- Real-time dashboards for system health
- Automated alerting for anomalies
- Distributed tracing for request flows
- Historical data for trend analysis
class CascadeMonitor {
constructor() {
this.metrics = new MetricsCollector();
this.alerts = new AlertManager();
}
async monitorNetwork() {
// Collect metrics from all SFUs
const networkMetrics = await this.collectNetworkMetrics();
// Analyze for anomalies
const anomalies = this.detectAnomalies(networkMetrics);
// Generate alerts for critical issues
if (anomalies.length > 0) {
await this.alerts.notify(anomalies);
}
// Update dashboards
await this.updateDashboards(networkMetrics);
}
collectNetworkMetrics() {
return {
interSFULatency: this.measureInterSFULatency(),
participantDistribution: this.getParticipantDistribution(),
bandwidthUtilization: this.getBandwidthUtilization(),
errorRates: this.getErrorRates()
};
}
}
Conclusion
SFU cascading transforms the limitations of single-SFU architectures into a scalable, globally distributed solution for modern video applications. By creating a mesh of interconnected SFUs, this architecture enables low-latency communication across geographic boundaries while maintaining the efficiency advantages of selective forwarding.
The implementation challenges—from DTLS stream management to state synchronization—are significant but manageable with proper architectural design and robust engineering practices. Stream's implementation demonstrates that these challenges can be overcome to provide a reliable, developer-friendly video infrastructure.
For developers building global-scale video applications, SFU cascading offers the best balance of performance, scalability, and reliability. While the complexity of implementation is higher than single-SFU deployments, the benefits in terms of user experience and system resilience make it the preferred architecture for enterprise-grade video solutions.
Further Reading
- Stream's Global Edge Network: https://getstream.io/blog/global-edge-network/