How WhatsApp Works - Architecture Deep Dive on 100 Billion Messages

If you're in the US, chances are WhatsApp isn't even installed on your phone. Although the Meta app surpassed 100 million monthly users in the US in 2024, WhatsApp remains a global phenomenon that has, until recently, largely bypassed the US.

But if you live elsewhere in the world, WhatsApp is a fundamental part of daily life---how you contact friends, chat with customer service, and even make payments. It is the everything app of Brazil, India, and Europe, with 3 billion people worldwide using WhatsApp every month.

With that incredible usage, here we want to look at how WhatsApp works under the hood. We'll explore how its chat infrastructure handles over 100 billion messages daily, processes voice and video calls across continents, and maintains end-to-end encryption.

The Two Core Features of WhatsApp

Like most modern apps, WhatsApp offers a plethora of features, including WhatsApp Payments in India, status updates, communities, location sharing, and the obligatory Meta AI integration. But when people are using WhatsApp, they are really using the two core features.

Chat is the foundation of WhatsApp and where it started.

Users can send text messages, photos, videos, documents, and voice notes. So far, so "chat app." And a real differentiator is the ease of group chat. Yes, you can just send messages to individuals, but you can also easily set up groups of up to 1,024 participants, making it a central hub for friend and work chats.

The other core selling point of WhatsApp is end-to-end encryption, meaning only the sender and recipient can read the content. WhatsApp cannot access message contents, nor can government agencies or hackers who might intercept the data.

Voice and video calling allow users to make free voice and video calls to anyone, worldwide. Again, a key selling point is that these can be group calls, supporting up to 32 participants for voice and eight for video. Like messages, all calls are end-to-end encrypted.

The main infrastructure of WhatsApp is built for scale and reliability. Every component must handle billions of users sending messages simultaneously, while maintaining sub-second latency across continents. The system processes over 1 million new user registrations daily and manages 140 billion messages per day, all while maintaining 99.99% uptime.

WhatsApp's Core Architecture

For the most part, Meta keeps the internal workings of WhatsApp close to their chest. But you can piece together the main components of the architecture from blog posts, talks, and whitepapers over the years.

At its core, WhatsApp runs on two components:

FunXMPP: A compressed messaging protocol that reduces standard XMPP's bandwidth usage by 50-70% through binary encoding, essential for mobile networks.
Erlang/OTP: A platform built for telecom systems that provides massive concurrency and fault tolerance, allowing single servers to handle millions of simultaneous connections.

FunXMPP and Games with Mobile Bandwidth

Standard XMPP sends messages as verbose XML text. A simple "hello" message might be 200 bytes of XML tags and attributes.

<message  from='user123@whatsapp.net/resource' 
         to='user456@whatsapp.net' 
         type='chat' 
         id='message-12345'>
    <body>hello</body>
    <request  xmlns='urn:xmpp:receipts'/>
</message>

That's roughly 180 bytes to deliver 5 bytes of actual content. If you are on 5G or wifi, then this difference is negligible. But those networks aren't what WhatsApp was built for. There could be as many as 300 million 2G users in India. For users on slower networks or expensive data plans, this overhead is unacceptable.

WhatsApp's solution is radical simplification through tokenization. They built a dictionary where common words get single-byte codes. The word "message" becomes byte 0x59. The phrase "@s.whatsapp.net" becomes 0x91. Common attributes like "type" and "from" each get their own byte. That verbose XML message shrinks from 180 bytes to roughly 20 bytes in FunXMPP.

The compression goes deeper than word replacement. They eliminated XML's structural overhead. Instead of content with matching tags, FunXMPP uses a single byte to indicate structure. The byte 0xF8 might mean "here come three items" followed by those items. The parser counts items rather than searching for closing tags. No angle brackets, no forward slashes, no nested tag matching.

This obsessive compression makes WhatsApp viable on the cheapest phones with the worst connections. A user in rural India on a 2G feature phone experiences the same quick message delivery as someone in San Francisco on 5G, because the protocol overhead is so minimal that even slow networks can handle it.

Erlang/OTP: Phoning It In Since 1986

Erlang comes from Ericsson's telecom systems, designed in the 1980s for phone switches that must never go down. The telephone network demands 99.999% uptime, less than 5 minutes of downtime per year. WhatsApp chose Erlang because messaging has the same requirement: people expect it to always work.

The platform provides three critical capabilities that make WhatsApp possible:

Ready to integrate? Our team is standing by to help you. Contact us today and launch tomorrow!

Lightweight processes. Each WhatsApp user connection runs in its own Erlang process. These aren't operating system processes or threads; they're Erlang's internal concept using just 300 bytes of memory each. A server with 64GB of RAM can theoretically run over 200 million of these processes, though practical limits like network sockets cap it at 2-3 million connections per server.
Crash isolation. When one process fails, it fails alone. This is Erlang's "let it crash" philosophy. If your connection hits a bug and crashes, mine continues normally. A supervisor process immediately restarts yours in a clean state, often so quickly you won't notice beyond a brief reconnection.
Built-in distribution. Erlang servers naturally form clusters where processes communicate regardless of physical location. Process A on Server 1 can send messages to Process B on Server 2 using the same syntax as local communication. No additional message queue or service mesh needed.

The secret weapon is Mnesia, Erlang's built-in distributed database that keeps everything in RAM. Mnesia stores the routing table (which user is on which server), offline message queues (undelivered messages waiting in memory), and user data (profiles and group memberships), all replicated across multiple servers for fault tolerance.

Following a Message Through the WhatsApp System

Let's trace what happens when you send "Hello" to a friend:

Step 1: Message Creation and Encoding

Your phone creates the message and encodes it using FunXMPP. The XML structure Hello becomes a sequence of bytes: token for "message", token for "to", compressed recipient ID, and the actual text "Hello". Total size: maybe 20 bytes instead of 80.

Step 2: Persistent Connection

Your phone maintains a constant TCP connection to a WhatsApp server. This isn't a new connection per message; it's a long-lived connection that remains open as long as WhatsApp is running. Your assigned Erlang process monitors this connection, reading incoming bytes.

Step 3: Server Reception

The FunXMPP bytes arrive at your server process. It decodes them (fast operation, just reading bytes) and identifies this as a message for another user. The process now needs to find where your friend is connected.

Step 4: Routing Lookup

Your server process queries Mnesia: "Where is friend@s.whatsapp.net connected?" Mnesia maintains a distributed table mapping every online user to their current server. This lookup occurs in microseconds, as it's all in RAM.

Three possibilities emerge:

Friend is online, same cluster: Mnesia returns something like, "Server-47, Process-82719." Your server process sends the message directly to that process using Erlang's built-in messaging. That process writes the message to your friend's TCP connection. Total time: milliseconds.
Friend is online, different datacenter: Mnesia indicates they're in another cluster. Your server forwards the message to the remote cluster, which delivers it to your friend. One extra hop, still sub-second delivery.
Friend is offline: Mnesia shows no active connection. Your server stores the message in the "offline queue" (also in Mnesia), replicated to a backup server. You get one checkmark (server received). When your friend opens WhatsApp, their new process queries for waiting messages, retrieves this one, and delivers it. You get the second checkmark.

Step 5: Delivery Confirmation

Your friend's phone receives the message and sends back a FunXMPP acknowledgment. This follows the same path in reverse: through their server, across the cluster, back to your server, down to your phone. The double checkmark appears.

Group Messages: Parallel Delivery

Group messages showcase the architecture's efficiency. When you send to a 100-person group:

Your phone sends one message to the server with the group ID
Your server process looks up the group's member list in Mnesia
It spawns temporary processes to handle parallel delivery
Each temporary process handles 10-20 recipients, checking their online status and delivering or queuing
All 100 deliveries happen simultaneously across the cluster

The server does the multiplication, not your phone. Your battery and bandwidth only handle one upload.

Why WhatsApp's Architecture Works at Scale

The elegance isn't in any single component, but in how perfectly they fit together: a minimal protocol running on a platform built for communication, organized in a hierarchy that scales from single processes to global clusters.

Resource efficiency. A single server handles 2-3 million connections because each connection is just a lightweight Erlang process. Traditional architectures using threads or connection pools would need 100x more servers.
Operational simplicity. Everything runs on Erlang/OTP. No separate message queues, no external databases for routing, no complex service mesh. Engineers can understand and debug the entire system within one framework.
Failure isolation. When something breaks, it breaks alone. A crashed user process doesn't affect others, a failed server doesn't crash the cluster, and a down datacenter doesn't kill the service. Each failure is contained at its level while supervisors automatically restart failed components, usually faster than users notice.
Predictable speed. Every message follows the same simple path: receive, lookup, forward, deliver. Whether WhatsApp has 1 million or 1 billion users online, your message takes the same route through the same in-memory operations. No queues to back up, no databases to slow down, just consistent sub-second delivery.

The Power of Boring Technology

WhatsApp's architecture is a masterclass in doing more with less. WhatsApp handles 3 billion users with an architecture you could sketch on a napkin: binary protocol in, Erlang processes handling connections, Mnesia for routing, direct delivery out. Of course, there is more complexity to run a 3-billion-user service, but the core idea is to keep it simple.

They proved that with the right architectural choices, a small team can build a chat infrastructure that rivals or exceeds what hundreds of engineers achieve elsewhere. The lesson isn't to copy WhatsApp's exact stack, but to understand that sometimes the old, boring, battle-tested solution is precisely what you need.

How Does WhatsApp Work? Architecture Deep Dive