Introduction to SFU Architecture
The Selective Forwarding Unit (SFU) is currently the most popular way to connect devices on a conference call. In some ways, it is a middle ground between the MCU and a P2P network. In the SFU architecture, every connected device emits a single set of outgoing media streams to the backend. This set of streams is forwarded to every other device by the SFU. Every individual device receives all streams of every connected device via the SFU. These forwarded streams remain relatively untouched by the backend. Hence, all individual connected devices are responsible for decoding and processing all incoming media streams and creating a coherent video call view for the user.
Multiple streams are coming into each device, which it needs to decode individually. This requires more computing power on the device compared to the MCU approach since the server does not process the streams anymore. However, this is still less computing power compared to P2P as each connected device only has to encode and upload its set of media once. This method, however limits scalability to some extent since the more devices, the more streams that each device needs to decode. Computationally, this is far better than a P2P mesh but worse than the MCU architecture. Compared to the same MCU, the SFU architecture greatly reduces the central server load since there is now little processing happening - the streams are simply being forwarded to each endpoint.
Scalability
The limiting factor regarding scalability for SFU-based architecture is often the computer power available on the connected devices. Since activities such as transcoding and layout happen on the device, compute power requirements increase as there are more streams that need to be displayed.
In theory, this would make SFUs somewhere between P2P and MCUs in terms of scalability. However, for all practical purposes, most devices do not subscribe to all streams in a video call and only display a certain number of streams. There are also several other optimizations available for making SFUs more suitable for larger group calls.
Performance
The video call load in the SFU-based architecture is split between the SFU and the connected devices to the call. The task of forwarding media to all devices is given to the SFU, while the task of transcoding media is left to the individual connected devices.
Connected Devices: Every device connected to the call may have to receive and decode all media streams belonging to connected users on the call. This task is not trivial and limits how many users can join in a call. Often, there needs to be some optimization on the client side, such as only displaying the media streams from a certain number of connected users which can help the call to scale. These tasks are computationally heavy but can be handled by modern devices. However, legacy devices may not be able to handle it as well.
SFU: Since the SFU’s main job is to forward media, adding more users does not tremendously increase the load on the backend. However, like before, it is not a trivial task either and requires adequate backend resources to handle many users.
Cost
The SFU architecture costs often lie somewhere between P2P and MCU architecture types. This is due to the need to set up backend infrastructure other than signaling servers. However, this backend infrastructure does not process or composite the streams, leading to lower compute power requirements than the MCU architecture.
Infrastructure: Since SFUs only work on forwarding media streams and not transcoding or processing them, server costs are markedly lower than when working with the MCU architecture. Additional users do not cause a large change since the computing power for forwarding media is low.
Bandwidth: Since SFUs have many outgoing streams to all participants and do not mix/composite any streams, costs associated with bandwidth are generally higher when using SFU architecture.
Advantages
Here are the advantages associated with using SFU architecture:
- Better scalability than P2P architecture due to fewer connections between nodes and addition of an intermediary SFU to transfter all call streams.
- More cost-efficient than MCU architecture due to some tasks like transcoding being shifted to participant devices.
- Works with asymmetric bandwidth since there is only a single copy of every outgoing stream as compared to P2P which sends them to every device.
- It supports sending several resolutions (Simulcast, seen later).
Disadvantages
Here are the disadvantages of using SFU architecture:
- Latency is worse due to single-node architecture.
- It cannot scale to extremely high numbers of participants.
- Cost-effectiveness depends on where you run the SFU.
When to use the SFU architecture
In general, you don’t want to run a SFU without cascading. However, there are a few conditions where it may still be appealing:
- You want to run your own SFU and have control over it.
- Users are geographically close to each other.
- Number of users on a single call are relatively small.