Intro to WebRTC Architectures

What Is WebRTC?

WebRTC or Web Real-Time Communication was created to act as a standard for enabling real-time communication between web browsers. It is an open-source project created by Google and is now developed by various large organisations such as Google, Mozilla, Opera, and more. In a sense, WebRTC is a collection of standardized APIs and is versatile in terms of the implementation architectures available to a developer.

Before we start going into the various architectures of WebRTC, it is important to know the fundamental concepts that power a WebRTC connection. These applies across nearly all the architecture types and are also seen in other lessons.

NAT (Network Address Translation)

Any device you use to connect to the internet has a private IP address, while the router it may be connected to has a public IP address. In order to establish a connection between two devices, it is important that one can identify and reach the other. For this to happen, we use NAT - a method to translate private IP addresses into public IP addresses.

These devices aren’t directly exposed to the internet. Instead, all the traffic goes through the router, which communicates with the outside world. When you request resources from a remote server, the router is responsible for “routing” the request from your local machine to that server and routing the response from the server back to your local machine. These requests are translated from the device’s private IP address to the router’s public IP with a unique port - which is then stored in a NAT table. In this way, having a unique public IP for each device on a local network is not necessary.

The image above is a simplistic example of what a NAT table would look like. Let’s pretend the local device, with a private IP of 192.168.1.50, requests the public address 82.88.31.26:80 for some data.

This is accomplished by the local device first sending the request to the router, which routes the request to the remote device. The router then tells the remote device to send the response to its external IP address, with a unique port, which in this example is 86.88.71.25:8830.

This unique port is important as it will allow the router to determine which local device made the request. All of this information is stored in a NAT table. Once the router gets the response, it can perform a lookup and decide to which local device the response should be forwarded.

STUN (Session Traversal Utilities for NAT)

STUN is a protocol to tell you your public IP address/port through NAT and to determine any restrictions in your router that would prevent a direct connection with a peer. A STUN server is a mechanism for clients to discover the presence of a NAT, as well as the type of NAT, and to determine the NAT's external IP address and port mapping.

The purpose of a STUN request is to determine your public presence so that this public presence can then be communicated with someone else so that they can connect with you - this communication is referred to as signaling, which we will discuss more later.

STUN servers are lightweight and cheap to maintain. There are public STUN servers that can be queried for free.

The image below illustrates when STUN works and when a peer-to-peer connection can be established.

If a peer-to-peer connection cannot be established - for example, when a peer is behind a symmetric NAT - then the final connection in step three won't be allowed. As the initial connection was established with the STUN server, and no other peer can use that connection info.

In an event like this where a direct connection cannot be established, we need to make use of a TURN server.

TURN (Traversal Using Relay NAT)

TURN is a protocol for relaying network traffic when a direct connection cannot be established between two peers. For example, if one peer is behind a symmetric NAT, a dedicated server is needed to relay the traffic between peers. In that event, you would create a connection with a TURN server and tell all peers to send packets to the server, which will then be forwarded to you. This comes with overhead, and a TURN server can be expensive to maintain and run. The following image illustrates how a TURN server is used to relay messages between two or more peers.

ICE (Interactive Connectivity Establishment)

ICE uses a combination of the STUN and TURN protocols to provide a mechanism for hosts to discover each other's public IP addresses and establish a direct connection. If a direct connection is impossible, ICE will use TURN to establish a relay connection between the two hosts.

All of these possible ways of potentially establishing a connection are called ICE candidates. All the collected addresses are sent to the remote peer via SDP, which we will explore next. WebRTC uses this information on each client to determine the best way to connect to another peer. It may be that both the peers are on the same NAT and that a local connection can be established, or it might be that both peers are behind symmetric NAT and requires a relay using a TURN server.

Signaling and SDP (Session Description Protocol)

While WebRTC does not have signalling as part of the specification, it is still an important aspect of establishing a connection. For two peers to connect using WebRTC, they require each other's SDP data. As such, it is up to you, as the developer, to establish a way for two devices to share this information. A popular option is WebSockets, or the signaling information can be sent back and forth over email or delivered on foot and entered manually to establish a connection.

SDP is a data format for describing media sessions for session announcement, session invitation, and other forms of session initiation. It is a standard for describing the multimedia content for the connection, such as resolution, formats, codecs, and encryption.