HTTP, WebSocket, gRPC, or WebRTC - Which protocol is best?

Our forms of communication are constantly improving: getting faster, more convenient, and more reliable. Our communication has evolved from sending messages using carrier pigeons, to postal mail, to the invention of the landline, to email and text messages from a small device that fits in our pockets.

In the future, we may even transition meetings and birthday parties to VR (hopefully, this is just a joke!). But the best form of communication will always be dependent on the situation.

A quick text message is sometimes better than a long email. Other times, a video call with the team is the best way to exchange information. In contrast, important insurance documents must be sent over regular mail and delivered in hard copy.

The same is also true for the web technologies and protocols we use. Different applications have different communication needs.

Overview

In this article, we’ll cover some popular communication protocols we can use as developers and explore the pros and cons of the different options. No solution is better than another – there are only some that are better for a particular application or problem.

Some applications require a peer-to-peer connection, with low latency and high data transfer, and can accept some packet (information) loss. Other applications can poll the server on an as-needed basis and don’t need to receive communication from a different peer. Other applications require real-time communication with data reliability.

An online multiplayer game, messaging app, blog website, media gallery app, and video conferencing software all have different levels of communication and data needs.

If you're building a video streaming solution then there are other considerations as well, see our article on video streaming protocols for more information on choosing the correct one.

What is a Communication Protocol?

In computer networking, a protocol is a set of rules that govern how data is exchanged between devices. The protocol defines the rules, syntax, semantics, and synchronization of communication and possible error recovery methods.

The protocols discussed in this article define how application-layer software will interact with each other. Different protocols adhere to different rules, and it’s essential to understand the strengths and limitations of each. In this article, you’ll learn about the following protocols:

HTTP (Hypertext Transfer Protocol) is an application protocol for distributed, collaborative, and hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web. Hypertext is structured text that uses logical links (hyperlinks) between nodes containing text. HTTP is the protocol to exchange or transfer hypertext.

HTTP/2 was designed to address the shortcomings of the original HTTP protocol and to improve performance. HTTP/2 is faster and more efficient than HTTP/1.1, and it supports multiplexing to allow multiple requests and responses to be multiplexed over a single connection. Other notable features include header compression and server push. It’s gradually becoming the default protocol for web traffic.

WebSocket is a protocol allowing two-way communication between a client and a server. It's a popular choice for applications that handle real-time data, such as chat applications, online gaming, and live data streaming.

gRPC is a modern open-source RPC framework that uses HTTP/2 for transport. It's a great choice for applications that need to make a lot of small, fast API calls. gRPC generates cross-platform client and server bindings for many languages, making it possible for a client application to directly call a method on a server application on a different machine as if it were a local object.

WebRTC is a technology that allows for real-time communication between clients and makes it possible to establish direct peer-to-peer connections. It's used for video, chat, file sharing, and live video streaming applications.

Understanding TCP and UDP

Before delving into the application layers mentioned above, it’s important to have a basic understanding of TCP and UDP, two underlying transport layers that facilitate data transfer in fundamentally different ways.

TCP (Transmission Control Protocol) is a standard that defines how to establish and maintain a network conversation via the Internet. TCP is the most commonly used protocol on the Internet and any connection-oriented network. When you browse the web, your computer sends TCP packets to a web server. A web server responds by sending TCP packets back to your computer. A connection is first established between two devices before any data is exchanged, and TCP uses error correction to ensure that all packets are delivered successfully. If a packet is lost or corrupted, TCP will try to resend it.

UDP (User Datagram Protocol) is a connectionless, unreliable transport layer protocol. It does not require a connection to be established or maintained and does not guarantee that messages will be delivered in order. Meaning that there can be some data loss if a packet does not get sent or if it’s corrupted. UDP is often used for streaming media or real-time applications where dropped packets are less problematic than ensuring delivery.

HTTP/1

It’s important to have a basic understanding of the foundation of all internet-based communication and data transfer at the application layer - HTTP (Hypertext Transfer Protocol).

Understanding HTTP/1 and its limitations will also be important before we can explore the other protocols in more detail and fully appreciate what they provide.

With HTTP, clients and servers communicate by exchanging individual messages. Messages that the client sends are called requests, and the messages sent by the server are called responses. These messages are sent as regular text messages over a TCP connection. They can also be encrypted using TLS and sent using the HTTPS protocol.

A client will typically be a web browser or application running on a user’s phone or computer, but it could technically be anything, for example, a script that crawls websites.

HTTP requests can only flow in one direction, from the client to the server. There is no way for the server to initiate communication with the client; it can only respond to requests.

HTTP is perfect for traditional web and client applications, where information is fetched on an as-needed basis. For example, you refresh a page, and a request is made to the server to fetch the latest information.

However, this protocol expanded in a way that it was not originally intended. In the following sections, we’ll explore some of HTTP/1’s limitations.

HTTP/1 Real-Time

HTTP/1 is inefficient when messages need to be sent in real-time from the client to the server and vice-versa. For example, if new information is available on the server that needs to be shared with the client, this transaction can only occur once the client initiates a request.

There are workarounds for this, using techniques called HTTP short and long polling, as well as Server-Sent Events.

Short Polling

HTTP short polling is a technique where the client repeatedly sends requests to the server until it responds with new data. Once it receives data it starts the process again and repeatedly asks until something else is available.

This is an inefficient real-time communication strategy, as it wastes a lot of resources by continuously transmitting and parsing HTTP requests/responses.

Long Polling

With HTTP long polling, a single request is made from the client, and then the server keeps that connection open until new data is available and a response can be sent. After the client receives the response, a new connection is immediately made again.

Long polling is more efficient than short polling but is not an optimal solution for real-time communication.

Server-Sent Events (SSE)

Server sent events allows a client to hold open a connection and receive updates (push messages) from a server in real-time, without having to poll the server for new data constantly. This is a one-way connection, so you can't send events from the client to the server.

SSE is a standard describing how servers can initiate data transmission toward clients once an initial client connection has been established.

The Performance Problem with HTTP/1

Most of the HTTP data flows consist of small, bursty data transfers, whereas TCP is optimized for long-lived connections and bulk data transfers. Network round trip time is the limiting factor in TCP throughput and performance in most cases. Consequently, latency is the performance bottleneck and most web applications deliver over it.

What the above means is that TCP, which HTTP uses, was built to handle long-lived connections and to transfer a lot of data. HTTP/1, on the other hand, would open a bunch of short-lived TCP connections and usually only send small pieces of data.

Head-of-line Blocking

A performance issue with HTTP/1.0 is that you have to open a new TCP connection for each request/response. This was not a problem for what HTTP was initially invented for - to fetch a hypertext document. The document part is important because HTTP was not meant for “hypermedia”.

Opening a new TCP connection for each request became a problem as the web evolved. We started building full applications instead of simple web pages, and the number of media and files a browser needed to retrieve became more. Imagine a typical web application that requires an HTML, CSS, and JavaScript file, as well as various images and other assets. For each file, a new connection had to be made.

Along comes HTTP/1.1, which has persistent connections which define that we can have more than one request or response on the same TCP connection.

This solution is definitely an improvement, but it does not allow the server to respond with multiple responses simultaneously. It’s a serialized protocol where you must send a request and wait for the response, then send the second request and so forth. This is known as head-of-line blocking.

It is, however, possible to achieve some parallelism, as a browser can open up to six TCP connections to the same origin - where “origin” is defined as the unique combination of host and port number.

For example, if you have a photo gallery app that needs to load 12 images, then six requests will be made to load the first six images, and each request will open up a new TCP connection under the hood. The other six images will be blocked until a response is received and one of the open connections can be used to load the next image. The original six open TCP connections to the same origin will be reused as they become available, but you are limited to six active connections.

Naturally, programmers found an easy workaround - by changing the origin. Instead of hosting all the assets on the same origin, you host six of the images on one origin and the rest on another. Now you can have 12 simultaneous requests (or open TCP connections). This is referred to as “sharding”.

Images 1-6 are hosted on 1.images.com
Images 7-12 are hosted on 2.images.com

There is a limitation on how many times you can do this though, and it’s hard to determine an optimal number of shards. At some point, adding more shards will increase the complexity, add overhead, and could result in links becoming congested and packets being lost.

There are also other concerns, as each TCP connection adds unnecessary overhead to the server. Connections compete with each other, each TCP and TLS handshake adds unnecessary cost, and other server/proxy resources must be used to maintain the active connections. There is a clear limitation in the way that HTTP/1 makes use of the underlying TCP connections.

Headers and Cookies Bloat

Another problem is that as the HTTP spec evolved, more headers have been added to the specification. Developers also have the option to add cookies to the headers, and these can be arbitrarily large. This adds a lot of bloat, as each request and response needs to transmit all of this text information, and HTTP/1.1 does not include a mechanism for compressing headers and metadata.

If you need a high-performance RPC protocol, this overhead adds up quickly, and HTTP is no longer an optimal solution.

Prioritization

With HTTP/1.1, browsers “prioritize” resources by holding a priority queue on the client and taking educated guesses for how to make the best use of available TCP connections. Browsers have embedded heuristics for determining what resources are more valuable than others.

For example, loading CSS will take a higher priority than loading images.

The problem is that there is no way for you, as the developer, to prioritize one request over another or to change the priority of an ongoing message. What content is loaded first is up to the browser, and you have no say in the prioritization scheme.

HTTP/2

HTTP/2 is an improved version of the HTTP protocol and addresses all of the performance issues outlined above with HTTP/1 and adds other enhancements without changing any of the semantics (verbs, headers, etc).

The most significant change in HTTP/2 is the use of multiplexing to simultaneously send and receive multiple HTTP requests and responses over a single TCP connection. All HTTP/2 connections are persistent, and only one connection per origin is required. This allows for much more efficient use of network resources and can significantly improve the performance of applications.

Some other benefits of HTTP/2:

Uses header compression to reduce the size of headers, which avoids sending the same plain text headers over and over. This significantly reduces the overhead of requests/responses and the amount of data sent.
Enables prioritization, allowing the client (developer) to specify the priority of the resources it needs. It’s also possible to update the priority of ongoing requests - for example, on scroll, if an image is no longer visible, the priority can change.
Uses server push to send data to the client before it requests it. This can be used to improve loading times by eliminating the need for the client to make multiple requests.

How Does HTTP/2 Work?

The basic protocol unit in HTTP/2 is a frame. This new binary framing mechanism changes how the data is exchanged between the client and server.

The standard defines ten different frame types, each serving a different purpose. For example, HEADERS and DATA frames form the basis of HTTP requests and responses:

A frame is the smallest unit of communication that carries a specific type of data.

Some other frame examples are:

SETTINGS: exchange setting information in the beginning or during a connection.
PRIORITY: reassign priority for messages.
PUSH_PROMISE: allows the server to push data to you - this acts as a promise of what the server will send. For example, if you request the index.html, the server can create a PUSH_PROMISE which promises to push app.js and styles.css, meaning that the client does not need to request those resources.

Frames are combined to form a message, for example, the header and data frame in the image above. This equates to a normal request or response.

Then finally, a series of messages can be part of a stream. This allows for a bidirectional data flow between the client and server and full request and response multiplexing.

The image above is a bit misleading, giving the impression that multiple connections are open between the client and server. But it's a single TCP connection, and data flows freely between the client and server in a non-blocking way.

The new binary framing layer allows the client and server to break down the HTTP message into independent frames, interleave them, and then reassemble them on the other end.

This is only a summary of how HTTP/2.0 works. If you want to learn more and explore prioritization, server push, and header compression, see this in depth article. For a history of the problems of HTTP/1 and how HTTP/2 solves them, watch this video.

HTTP/2 Bidirectional Data Streaming

From the HTTP/2 spec:

A "stream" is an independent, bidirectional sequence of frames exchanged between the client and server within an HTTP/2 connection. Streams have several important characteristics:

A single HTTP/2 connection can contain multiple concurrently open streams, with either endpoint interleaving frames from multiple streams.

Streams can be established and used unilaterally or shared by either the client or server.

Streams can be closed by either endpoint.

There is a lot of misunderstanding with the Server Push functionality, which allows a server over HTTP/2 to proactively send resources it thinks you may need, for example, the .js and .css files, without the client requesting it. This has nothing to do with bidirectional streaming and is only a web optimization technique for cacheable resources.

What is true is that with HTTP/2 the server can’t initiate a stream. But once the client opens a stream by sending a request, both sides can send DATA frames over a persistent socket at any time. An excellent example of this is gRPC, which we will discuss later.

With HTTP/2, it is possible to achieve bidirectional data streaming, and you could argue that it’s a more optimal solution than something like WebSockets, or you could argue that it is not. We’ll discuss this in more detail in the WebSocket section.

WebSockets

From the WebSocket Protocol Specification:

The goal of this technology is to provide a mechanism for browser-based applications that need two-way communication with servers that do not rely on opening multiple HTTP connections (e.g., using XMLHttpRequest or iframes and long polling).

WebSockets were invented to enable full-duplex communication between a client and server, which allows for data to travel both ways through a single open connection immediately.

Once a WebSocket connection is established, the client does not need to poll a server for updates. Instead, communication happens bi-directionally. This improves speed and real-time capability compared to the original long- and short-polling of HTTP/1. WebSocket does not have a format it complies to. You can send any data, text, or bytes – this flexibility is one of the reasons why WebSockets are popular.

Some of this may sound familiar to what we discussed in the HTTP/2 section, but it’s important to note that WebSockets were invented long before HTTP/2. We’ll compare them more later.

How WebSockets Work

WebSockets effectively run as a transport layer over TCP.

To establish a WebSocket connection the client and server first need to perform a handshake over a normal HTTP/1.1 connection. This handshake is the bridge from HTTP to WebSockets.

Below is an example client handshake request. The client can use an HTTP/1.1 mechanism called an upgrade header to switch their connection from HTTP over to WebSockets:

GET /chat HTTP/1.1
Host: example.com:8000
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13

The server will then conclude the handshake with a special response that indicates the protocol will be changing from HTTP to WebSocket:

HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=

WebSockets require a uniform resource identifier (URI) with a ws:// or wss:// scheme. The ws:// scheme is used for unencrypted connections, and the wss:// scheme is used for encrypted connections, similar to how HTTP URLs use http:// or https:// scheme.

Once the two-way communication channel is established the client and server can send messages back and forth. These messages can be anything from binary data to text. The WebSocket connection will remain open until either the client or the server disconnects.

WebSocket Multiplexing

At the time of writing, the WebSocket protocol does not support built-in multiplexing. We discussed multiplexing in the HTTP/2 section, and we learned that it’s a built-in feature for HTTP/2, and it’s possible to have multiple streams multiplexed over the same connection. Each stream gets a unique identifier, and all the frames sent have an ID associated with the corresponding stream.

Not supporting multiplexing means that the WebSocket Protocol requires a new transport connection for every WebSocket connection. For example, multiple clients running in different tabs of the same browser will result in separate connections. To have multiplexing over WebSockets is typically something you will need to implement as the developer or rely on a third-party plugin or library.

WebSocket vs HTTP/2

So, has HTTP/2 replaced WebSockets? The short answer is no. The longer answer is that HTTP/2 makes bidirectional streaming possible, and as a result, WebSockets are not the only/best option. HTTP/2 as a spec does more work for you compared to WebSockets. It has built-in multiplexing and, in most cases, will result in fewer open TCP connections to the origin. On the other hand, WebSockets provide a lot of freedom and are not restrictive in how data is sent between the client and server once a connection is established. However, you will need to manage reconnection yourself (or rely on a library that does this for you).

Which one is ultimately better, and where one will work and the other won't, is up for debate, and this article does not provide an opinionated answer. WebSockets offer a lot of flexibility, and as an established standard, it is fully supported by all modern browsers, and the ecosystem around client and server libraries is robust.

For more detailed and opinionated discussions, see these Stack Overflow questions:

There is also an RFC to allow a mechanism for running the WebSocket Protocol over a single stream of an HTTP/2 connection.

Being able to bootstrap WebSockets from HTTP/2 allows one TCP connection to be shared by both protocols and extends HTTP/2's more efficient use of the network to WebSockets.

This has been implemented in Chrome and Firefox. You can read the Chrome design document and motivation here.

When Should You Use WebSockets?

Websockets are best suited for applications that need two-way communication in real-time and when small pieces of data need to be transmitted quickly, for example:

Chat applications
Multiplayer games
Collaborative editing applications
Live sports ticker
Stock trading application
Real-time activity feeds

Coincidentally, this is an area where our team has a lot of experience. We’re extensively using WebSockets to power our chat and activity feeds infrastructure.

Support for WebSockets is good and has been supported by major browsers and clients for a long time, and it is also well documented and easy to use. However, WebSockets should not be overused; depending on what you want, there may be better alternatives.

For example, Server Sent Events (SSE) are efficient over HTTP/2 and simple to use. SSE is not a bi-directional communication system; the server unilaterally pushes data to the client. But if all you need is a way for the server to send data to a client, this is potentially a better option than adding the overhead of WebSockets. SSE also falls back to HTTP/1.1 when HTTP/2 is not available. Additionally, the client (or browser) manages the connection for you and supports automatic reconnection.

If a connection over WebSockets is lost, there are no included mechanisms for load balancing or for reconnecting. This has to be implemented manually or by third-party libraries.

gRPC

gRPC is a modern open source high performance Remote Procedure Call (RPC) framework that can run in any environment. It can efficiently connect services in and across data centers with pluggable support for load balancing, tracing, health checking and authentication. It is also applicable in last mile of distributed computing to connect devices, mobile applications and browsers to backend services.

gRPC is an open-source, contract based, RPC system initially developed at Google. gRPC enables applications to communicate transparently and simplifies the building of connected systems.

It generates cross-platform client and server bindings for many languages, making it possible for a client application to directly call a method on a server application on a different machine as if it were a local object.

Built on HTTP/2, gRPC leverages features such as bidirectional streaming and built-in Transport Layer Security (TLS).

gRPC Motivation

It’s important to dive into the motivation behind gRPC and why it was invented to understand its benefits. Why not use the existing technology we already have: HTTP/2 and WebSockets, for example? Why do we need another layer of abstraction on top of what we already have?

There are various ways that data can be structured and sent over the internet. Some popular examples are SOAP, REST and GraphQL. You can even create your own protocol, send data over raw TCP, and handle the implementation yourself if you want.

But no matter what you choose as your communication protocol, the problem is that you need to ensure that the client and server agree on the protocol. For example, if you’re building a REST API, the client library you need to use to send REST data is the HTTP library. The HTTP library is built into the browser by default, and the browser handles everything for you:

It establishes communication with the server.
It handles HTTP/2 and fallback to HTTP/1. And will need to support HTTP/3 in the future.
It handles TLS and negotiates the protocol.
It handles headers, streams, and everything else.

But what if you’re not on a browser? What if you’re a Python application running on some server, a GoLang CLI, or a Flutter application running on iOS? All of these clients need their own HTTP library that understands the protocol you are communicating with.

Luckily, many dedicated people are working on various HTTP libraries for all these languages and frameworks. Some languages even have multiple HTTP libraries with different features. All of this, however, comes at a cost - and that is maintenance.

This cost could influence you in the event you were to, for example, upgrade your server to HTTP/2 if the GoLang library you use supports it. But, on your front-end Python client, the equivalent HTTP library has not implemented HTTP/2 or may no longer be maintained. Different HTTP libraries in other languages can not be compared 1:1.

As the HTTP spec evolves, these libraries must keep up with advancements, security issues, new features, and other patches. HTTP is just an example, and the same is true for the WebSocket protocol or any other. Something may be well implemented in the major browsers, but that functionality must be ported to multiple different languages and frameworks.

How is gRPC Different?

gRPC attempts to solve this problem by maintaining the library for the popular languages themselves, meaning that new features added will be supported by all of these languages.

Underneath the hood, gRPC uses HTTP/2 as its protocol, however, this implementation is hidden from you. In the future, the maintainers of gRPC could easily replace HTTP/2 with HTTP/3, and you will immediately benefit from that change.

gRPC also uses protocol buffers as the Interface Definition Language (IDL) and its underlying message interchange format. This format is language neutral and makes it possible to easily communicate between different programming languages. We’ll explore this concept more in the next section.

What are Protocol Buffers?

Protocol buffers are Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data – think XML, but smaller, faster, and simpler. You define how you want your data to be structured once. Then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages.

Traditionally with an API, you don’t have a contract defined by the protocol itself. For example, if you’re using REST, you’re typically just sending JSON messages with key/value pairs that aren’t checked until the message is at the receiving end. This message can typically be anything, and it’s up to you to ensure the correct structure is defined.

Take a look at the following JSON payload:

json

1
2
3
‘id’: 123
‘name’: ‘Gordon’,
‘email’: ‘gordon@somewhere.io’

Once this data is received on the client/server it can be deserialized into an object, for example:

dart

1
2
3
4
5
class Person {
    int id;
    String name;
    String email
}

However, it is up to you as the developer to implement the correct serialization and deserilization logic for the above mentioned payload - this may involve writing toJson and fromJson methods manually, relying on code generation, or it could be a built-in feature of the language you’re using.

Regardless of how you serialize this data, the underlying code will need to be manually updated, potentially in multiple environments, in the event the schema changes.

With protocol buffers, you create a schema that defines the object type for fields and specify which are required and which are optional:

// The request message containing the person’s information
Message Person {
    optional int32 id = 1;
    required string name = 2;
    optional string email = 3;
}

// The response message containing the greetings
message HelloReply {
  string message = 1;
}

You can then specify the procedures that you want to expose.

// The greeting service definition.
service Greeter {
  // Sends a greeting
  rpc SayHello (Person) returns (HelloReply) {}
}

Once you’ve specified the data structures and schema, you use the protocol buffer compiler protoc to generate data access classes in your preferred language(s) from your proto definition.

These will be interfaces that describe the objects outlined in the proto file, with accessors for each field, as well as methods to serialize/parse the whole structure to/from raw bytes.

gRPC Modes

There are four modes of transportation over gRPC. These four modes replicate the behavior we discussed previously, for example, a normal request/response, SSE, and WebSockets.

Unary RPC

Unary RPC is a simple request and response, similar to calling a function. The client asks for some data, and the server does some processing and returns that data.

Server Streaming RPC

Server streaming RPCs where the client sends a single request to the server and expects multiple or a stream of responses. The client reads from the returned stream until there are no more messages.

An example would be video streaming, where you request to load a video, and the server responds with the video stream.

Client Streaming RPC

Client streaming RPCs where the client writes a sequence of messages and sends them to the server, again using a provided stream. Once the client has finished writing the messages, it waits for the server to read them and return its response.

An example would be to upload a big file to the server and once all the data is sent the client can send a final message to indicate that the upload is complete, and the server can optionally respond.

Bidirectional Stream RPC

A combination of both client and server streaming. A chat application or a multiplayer video game is an example where data needs to flow between the client and server freely.

Bidirectional streaming RPCs where both sides send a sequence of messages using a read-write stream. The two streams operate independently, so clients and servers can read and write in whatever order they like.

In a bidirectional streaming RPC, the call is initiated by the client invoking the method. Client- and server-side stream processing is application specific. Since the two streams are independent, the client and server can read and write messages in any order.

Microservices

A good example of where gRPC is powerful is within microservices.

In this example, we have microservices written in Python, Java, and GoLang. These need to send data between themselves.

Using HTTP/1.1 and JSON will require you to implement the HTTP connections and the serialization for each language. You will also need to ensure that the schema is implemented correctly for each language, and if the API changes, all the services need to be manually updated.

gRPC, on the other hand, handles the implementation of the HTTP/2.0 protocol for us. A single schema is written, and the corresponding code can be generated for all the languages used. This schema can be seen as a contract that all languages need to adhere to, making communicating between these services much easier and more reliable.

gRPC Performance

gRPC is fast and is generally much more performant than a REST equivalent:

Protocol buffers are serialized and sent as binaries over the wire, which are significantly smaller than normal JSON messages.
gRPC uses HTTP/2.0 that delivers further improvements

gRPC efficiently compressing the data sent has a significant advantage, as the smaller the data payload transmitted, the fewer TCP round trips are needed. The maximum transmission unit (MTU) is a measurement representing the largest data packet that a network-connected device will accept, which is 1,500 bytes.

The compression is handled for you, and you benefit simply by using gRPC. As an alternative, it is possible to use something like GZIP to compress JSON messages before sending over regular HTTP. However, this can be inconvenient and adds a layer of complexity. Different languages and environments may also have different levels of support for GZIP and other equivalent compression tools. And for each language that you use, you will need to reimplement the correct compression and communication logic yourself. This is a similar problem to what we discussed about the HTTP library.

When should you use gRPC?

If you’re using multiple different programming languages that need to integrate tightly with each other, and require fast and frequent communication that sends a lot of data, then gRPC would be perfect.

Pros:

With gRPC streaming, it’s possible to determine the upload/download progress easily - without needing to make any unnecessary requests for updates.
It’s possible to cancel requests.
All the benefits of HTTP/2.
If gRPC supports your language, you don’t have to worry about external libraries.

Cons:

gRPC does not support all languages.
The schema may feel restrictive and cumbersome.
It can be complicated to set up compared to WebSockets.
Still young and errors may be difficult to debug.
Communication with gRPC does not natively work out of the box with web browsers. You need to use the gRPC-Web library.

WebRTC

The WebRTC protocol is a free, open-source project that provides real-time communication (RTC) capabilities to your application that works on top of an open standard. It supports video, voice, and generic data sent between peers.

The technology is available as a set of JavaScript APIs for all major browsers and a library for native clients like Android and iOS applications.

WebRTC is different from WebSockets and gRPC in a fundamental way, and that is that once a connection is established, data can (under some circumstances) be transmitted directly between browsers and devices in real time without touching the server.

This reduces latency and makes WebRTC great for audio, video, or screen sharing - where low latency is important and a lot of data needs to be sent.

WebRTC Motivation

WebRTC intends to standardize how media, such as audio and video, is communicated over the wire - and to accomplish this conveniently with a simple-to-use API.

Other solutions, such as WebSockets, do make it possible to transmit any data between two peers; however, this data needs to be transmitted through a proxy or server. Relying on another server adds latency, as everything that is sent through it needs to be looked at, processed, and decrypted. There is a middle-man between the two peers. For video streaming or even real-time chat, this latency is undesirable.

Browsers are also more powerful now than a few years ago. Browsers have access to the webcam and microphone, requiring a built-in API and an easy way to transmit this rich information. WebRTC is intended to simplify this entire process and expose an easy-to-use API natively available on browsers.

The Problem with WebRTC

The motivation is defined, and it seems like WebRTC is a magical solution allowing faster communication between two peers. But there are unfortunately a few problems.

The first problem is that establishing a peer-to-peer connection is not simple - the internet is complicated, and there are a lot of routers, proxies, and firewalls between Alice in California and Ben in South Africa. And in some circumstances, it may not be possible to have a direct line between two peers. A connection between two peers may need to bypass a firewall that prevents open connections, you could potentially not have a public IP address, or the router may not allow direct connection between peers.

The second problem is that there needs to be a way for two peers to discover each other and determine the optimal route in which communication can happen. This requires certain information to be shared between the two clients before they can know how to best communicate with each other - and a common way to share this information is by using WebSockets.

Which is a little funny. An HTTP connection is upgraded to a WebSocket connection just to share information to establish a WebRTC connection.

If you truly want to appreciate what WebRTC does and its complexity, you will need to become familiar with some potentially unfamiliar terms: NAT, STUN, TURN, ICE, SDP, and Signaling.

How Does WebRTC Work?

In the overview above, we described the motivation of WebRTC, which describes the basic idea of how it works. This section will dive into some of the lower-level topics you need to understand to grasp WebRTC fully.

Network Address Translation (NAT)

Understanding what NAT is and how it works is essential to understanding WebRTC.

A NAT is used to give your device (laptop or cell phone) a public IP address; this is important because we want to establish a connection between two peers that are probably both behind a router. A router will have a public IP address, and every device connected to the router will have a private IP address.

These devices aren’t directly exposed to the internet. Instead, all the traffic goes through the router, which communicates with the outside world. When you request resources from a remote server, the router is responsible for “routing” the request from your local machine to that server and routing the response from the server back to your local machine.

These requests are translated from the device’s private IP address to the router’s public IP with a unique port - which is then stored in a NAT table. In this way, having a unique public IP for each device on a local network is not necessary.

The image above is a simplistic example of what a NAT table would look like. Let’s pretend the local device, with a private IP of 192.168.1.50, requests the public address 82.88.31.26:80 for some data.

This is accomplished by the local device first sending the request to the router, which routes the request to the remote device. The router then tells the remote device to send the response to its external IP address, with a unique port, which in this example is 86.88.71.25:8830.

This unique port is important as it will allow the router to determine which local device made the request. All of this information is stored in a NAT table. Once the router gets the response, it can perform a lookup and decide to which local device the response should be forwarded.

This is quite simple to understand when we have a normal request/response pair - one device and one server. But what happens if another external device with a completely different IP address decides to send packets to the router's external IP address on the same port that was previously used? Should the router forward it to the local device that is mapped to that port number?

This decision depends on which NAT translation the router uses and ultimately determines if a peer-to-peer connection can be established. Depending on the router you use, it will implement a different NAT translation. There are four different NAT translation methods:

One-to-One NAT
Address Restricted NAT
Port Restricted NAT
Symmetric NAT

One-to-One NAT: Maps one external IP address and port (usually public) to one internal IP address and port (usually private). In the above example, if the router receives a response on port 8830 and external IP 86.88.71.25, it will forward it to the local device 192.168.1.50, as that is the local device that made the request (information retrieved from the NAT table). The router does not care about the destination IP or where the response originated from. If it’s on a particular external port it goes to that local device.

Address restricted NAT: A remote device can send a packet to the local device only if the local device had previously sent a packet to the remote IP address. In summary, we only allow it if we have communicated with this host before. In the above example, only allow packets from 86.88.71.25.

Port restricted NAT: The same as address restricted NAT, but the restriction also includes port numbers. The remote device can only send a packet to the internal device if the internal device had previously sent a packet to IP address X and port P. In the above example, only allow from 86.88.71.25 and port 80.

Symmetric NAT: The most restrictive. For this the external IP, external port, destination IP, and destination port all have to match what is present in the NAT table. This means that packets can only be sent to a specific port of a local device if that device was the one that requested the destination IP and port.

WebRTC does not work over symmetric NAT, and to understand why we need to understand what a STUN server is.

Session Traversal Utilities for NAT (STUN)

STUN is a protocol to tell you your public IP address/port through NAT and to determine any restrictions in your router that would prevent a direct connection with a peer. A STUN server is a mechanism for clients to discover the presence of a NAT, as well as the type of NAT, and to determine the NAT's external IP address and port mapping.

The purpose of a STUN request is to determine your public presence so that this public presence can then be communicated with someone else so that they can connect with you - this communication is referred to as signaling, which we will discuss more later.

It works for one-to-one, address restricted, and port restricted NAT. But does not work for symmetric NAT. Because when you request the STUN server for your public information, that communication pair was created especially for the client making the request. It's not possible to involve another peer using symmetric NAT - communication over the local device's port is restricted to the STUN server.

STUN servers are lightweight and cheap to maintain. There are public STUN servers that can be queried for free.

The image below illustrates when STUN works and when a peer-to-peer connection can be established.

On the other hand, if a peer-to-peer connection cannot be established, for example, when a peer is behind a symmetric NAT - then the final connection in step three won't be allowed. As the initial connection was established with the STUN server, and no other peer can use that connection info.

In an event like this where a direct connection cannot be established, we need to make use of a TURN server.

Traversal Using Relays around NAT (TURN)

TURN is a protocol for relaying network traffic when a direct connection cannot be established between two peers. For example, if one peer is behind a symmetric NAT, a dedicated server is needed to relay the traffic between peers. In that event, you would create a connection with a TURN server and tell all peers to send packets to the server, which will then be forwarded to you.

This comes with overhead, and a TURN server can be expensive to maintain and run.

The following image illustrates how a TURN server is used to relay messages between two or more peers.

Interactive Connectivity Establishment (ICE)

ICE uses a combination of the STUN and TURN protocols to provide a mechanism for hosts to discover each other's public IP addresses and establish a direct connection. If a direct connection is impossible, ICE will use TURN to establish a relay connection between the two hosts.

All of these possible ways of potentially establishing a connection are called ICE candidates. All the collected addresses are sent to the remote peer via SDP, which we will explore next. WebRTC uses this information on each client to determine the best way to connect to another peer. It may be that both the peers are on the same NAT and that a local connection can be established, or it might be that both peers are behind symmetric NAT and requires a relay using a TURN server.

Session Description Protocol (SDP)

SDP is essentially a data format for describing media sessions for session announcement, session invitation, and other forms of session initiation. It is a standard for describing the multimedia content for the connection, such as resolution, formats, codecs, and encryption.

Importantly, it’s also used to describe the ICE candidates and other networking options. When peer A wants to connect to peer B, they need to share SDP information to connect. How this SDP is shared is entirely up to - this is referred to as signaling and we will explore it next.

Signaling - Establishing a Connection

Signaling is the process of sending control information between two devices to determine the communication protocols, channels, media codecs and formats, and method of data transfer, as well as any required routing information. The most important thing to know about the signaling process for WebRTC: it is not defined in the specification.

Peer connections deal with connecting two applications on different computers. A connection is established through a discovery and negotiation process called signaling.

An important caveat is that WebRTC does not have signaling built in as part of the specification, as it is not possible for two devices to directly contact each other, which we explored in detail earlier. For two peers to connect using WebRTC, they require each other's SDP data.

As such, it is up to you, as the developer, to establish a way for two devices to share this information. A popular option is WebSockets, or the signaling information can be sent back and forth over email or delivered on foot and entered manually to establish a connection.

Once this information is shared, you have everything you need for two peers to establish a WebRTC connection, it may be a direct connection, or it may be through a TURN server.

When should you use WebRTC?

You may even ask: why should I use WebRTC? It seems complicated to understand and even more complicated to set up.

It is complicated to set up, but there are plenty of benefits:

The API is easy to use and is available directly in your browser.
It has good performance, making it possible to transmit high bandwidth content, such as video or audio.
More advanced features, such as screen sharing and file sharing can be easily implemented.
Supports peer-to-peer connection with reduced latency.
Free and open source.

Cons:

No built-in signaling.
You need to maintain STUN & TURN servers.
For group connections (such as a group video call) an SFU may be needed.
Complicated to set up and understand.

Which Should You Choose?

The protocol you choose will depend on your specific needs.

HTTP: With HTTP/2, it is now possible to have bidirectional communication between a client and server. Depending on your application you may not need full duplex communication and something like SSE will be more than enough. We also discovered in this article that WebSockets and gRPC are reliant on HTTP, while WebRTC requires some other channel for signaling. It’s worth first exploring if HTTP solves your application needs before delving into these other protocols.

WebSockets are best suited for real-time applications that need two-way communication, such as chat applications. They are also relatively easy to set up and use. However, WebSockets are not as efficient as gRPC or WebRTC, and they are not well suited for applications that need to send a lot of data.

gRPC is a more efficient protocol than WebSockets, and it is better suited for applications that need to send a lot of data. However, gRPC is more complicated to set up and use than WebSockets. If you need to make many small API calls, gRPC is a good choice. Or, when you implement microservices with various programming languages that need to communicate, then gRPC’s serialized structured data and code generation will save you a lot of time. It’s also worth noting that you can’t easily use gRPC from the browser. For that, you need a dedicated proxy in your backend that translates calls - see grpc-web.

WebRTC is the most efficient protocol for real-time communication with low latency between browsers and devices, and it is well suited for applications that need to send a lot of data. WebRTC also provides an easy-to-use API directly available in the browser, making it easy to share your camera, audio, screen, or other files. However, WebRTC can be complicated to set up and use as it requires you to perform the signaling and maintaining a TURN and STUN server.

Conclusion

The future will see more protocols, changes, and further improvements. HTTP/3 is already released and there is also a new communication protocol called WebTransport, a potential replacement for WebSockets.

We hope you found this article useful and that it'll help you make an informed decision. If you want to continue the discussion, reach out to us on Twitter or LinkedIn.

HTTP, WebSocket, gRPC or WebRTC: Which Communication Protocol is Best For Your App?

Overview

What is a Communication Protocol?

Understanding TCP and UDP

HTTP/1

HTTP/1 Real-Time

Short Polling

Long Polling

Server-Sent Events (SSE)

The Performance Problem with HTTP/1

Head-of-line Blocking

Headers and Cookies Bloat

Prioritization

HTTP/2

How Does HTTP/2 Work?

HTTP/2 Bidirectional Data Streaming

WebSockets

How WebSockets Work

WebSocket Multiplexing

WebSocket vs HTTP/2

When Should You Use WebSockets?

gRPC

gRPC Motivation

How is gRPC Different?

What are Protocol Buffers?

gRPC Modes

Unary RPC

Server Streaming RPC

Client Streaming RPC

Bidirectional Stream RPC

Microservices

gRPC Performance

When should you use gRPC?

WebRTC

WebRTC Motivation

The Problem with WebRTC

How Does WebRTC Work?

Network Address Translation (NAT)

Session Traversal Utilities for NAT (STUN)

Traversal Using Relays around NAT (TURN)

Interactive Connectivity Establishment (ICE)

Session Description Protocol (SDP)

Signaling - Establishing a Connection

When should you use WebRTC?

Which Should You Choose?

Conclusion