Chat looks simple. A user types a message, and it appears on the other person's screen. Ship it.
At Stream, we know it's anything but. WebSockets. Persistence. Message ordering. Offline sync. Presence at scale. File uploads. Emojis. All of it is table stakes for any chat or messaging app, and all of it is a challenge to build, even independently. When combined, the complexity compounds fast.
This guide breaks down the full architecture of a production chat system and shows where a managed infrastructure like Stream saves months of engineering time.
The High-Level Architecture of a Chat App
Every chat system, regardless of scale, shares the same fundamental components.
A typical chat architecture includes six core pieces:
- Clients (web, mobile, desktop) that render the UI and maintain a persistent connection to the server
- API server handling REST endpoints for authentication, channel management, message history, and file uploads
- WebSocket server managing persistent bidirectional connections for real-time event delivery
- Message broker (Redis Pub/Sub, Kafka, NATS) routing messages between WebSocket server instances
- Database storing users, channels, messages, reactions, and read receipts
- File storage (S3, R2, GCS) with a CDN for media attachments
REST handles stateless operations (fetching history, creating channels, uploading files) while WebSockets handle stateful, real-time event streaming (new messages, typing indicators, presence updates). Every major chat provider, including Stream, follows this REST + WebSocket hybrid pattern.
WebSockets: The Transport Layer
Messaging starts with WebSockets. This is what Discord, Slack, and WhatsApp all use in their messaging apps. There are alternatives (Server-Sent Events, long polling), but WebSockets are the standard for chat and the only protocol that gives you bidirectional communication over a single connection.
After an HTTP upgrade handshake (the client sends Upgrade: websocket, the server responds with HTTP 101), both sides can send frames independently with minimal overhead: just 2 to 6 bytes of framing per message.
Two practical concerns require attention from day one:
- Always use
wss://(TLS-encrypted WebSockets) in production: corporate proxies and firewalls routinely block unencryptedws://upgrades. - Implement application-level heartbeats. The browser WebSocket API does not expose native ping/pong frames, so clients must send their own keepalive messages to detect dead connections before infrastructure proxies close them (typically after 30 to 120 seconds of inactivity).
Here's how a WebSocket connection works on the server:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758// server/src/ws/index.js import { WebSocketServer } from 'ws'; import { authenticateWs } from './authenticate.js'; import { handleEvent } from './handlers.js'; import { clientManager } from './clientManager.js'; export function setupWebSocketServer(server) { const wss = new WebSocketServer({ server, path: '/ws' }); wss.on('connection', (ws, req) => { const url = new URL(req.url, 'http://localhost'); const token = url.searchParams.get('token'); const user = authenticateWs(token); if (!user) { ws.close(4001, 'Authentication failed'); return; } ws.userId = user.id; ws.user = user; // Register connection and broadcast presence const wasOffline = !clientManager.hasConnections(user.id); clientManager.addClient(user.id, ws); if (wasOffline) { queries.setUserOnline(user.id, true); clientManager.broadcastToAll({ type: 'presence.update', userId: user.id, isOnline: true, }); } ws.send(JSON.stringify({ type: 'connection.established', onlineUsers: clientManager.getOnlineUserIds(), })); ws.on('message', (raw) => { const event = JSON.parse(raw.toString()); handleEvent(ws, event); }); ws.on('close', () => { clientManager.removeClient(user.id, ws); if (!clientManager.hasConnections(user.id)) { queries.setUserOnline(user.id, false); clientManager.broadcastToAll({ type: 'presence.update', userId: user.id, isOnline: false, }); } }); }); }
The server does three things on every new connection:
- Authenticate the token before allocating any resources
- Register the socket in a
ClientManagerthat maps user IDs to their active connections - Broadcast a presence update if this is the user's first active connection.
The close handler reverses the process, only marking the user offline when their last connection drops (supporting multiple tabs or devices). Every incoming message is parsed and routed to a handler function based on the event type.
The corresponding client-side hook handles reconnection with exponential backoff:
12345678910111213141516171819202122232425262728293031323334353637383940// client/src/hooks/useWebSocket.js export function useWebSocket({ token, onEvent }) { const wsRef = useRef(null); const backoffRef = useRef(1000); const [connectionStatus, setConnectionStatus] = useState('disconnected'); const connect = useCallback(() => { const protocol = window.location.protocol === 'https:' ? 'wss:' : 'ws:'; const url = `${protocol}//${window.location.host}/ws?token=${encodeURIComponent(token)}`; const ws = new WebSocket(url); ws.onopen = () => { setConnectionStatus('connected'); backoffRef.current = 1000; // Heartbeat every 30 seconds setInterval(() => { if (ws.readyState === WebSocket.OPEN) { ws.send(JSON.stringify({ type: 'presence.ping' })); } }, 30000); }; ws.onmessage = (e) => { const event = JSON.parse(e.data); onEvent(event); }; ws.onclose = () => { setConnectionStatus('disconnected'); // Exponential backoff: 1s, 2s, 4s, 8s... capped at 30s setTimeout(() => { backoffRef.current = Math.min(backoffRef.current * 2, 30000); connect(); }, backoffRef.current); }; }, [token, onEvent]); // ... }
The client passes the JWT token as a query parameter (the only reliable way to authenticate WebSocket connections, as we'll see in the next section). On open, it resets the backoff timer and starts a 30-second heartbeat loop to keep the connection alive through proxies. On close, it schedules a reconnection attempt with exponential backoff: 1 second, 2 seconds, 4 seconds, doubling up to a 30-second cap. This prevents a thundering herd of reconnection attempts if the server restarts.
That is a lot of code for connection management. With Stream's SDK, the equivalent is handled automatically:
123456789// Stream: the SDK manages WebSocket lifecycle internally import { StreamChat } from 'stream-chat'; const client = StreamChat.getInstance(apiKey); await client.connectUser( { id: user.id, name: user.username }, streamToken ); // WebSocket connected, heartbeats running, reconnection configured
StreamChat.getInstance() returns a singleton client. connectUser() opens the WebSocket, authenticates with the provided token, starts heartbeats, and configures automatic reconnection with backoff. All the connection lifecycle code from the full version is handled internally.
Authentication: Securing WebSocket Connections
WebSocket connections are great for bidirectional communication, but they present an interesting authentication challenge: the browser WebSocket API does not support custom HTTP headers during the upgrade handshake. You cannot send an Authorization: Bearer header the way you would with a normal REST request. This means you need an alternative mechanism to prove the client's identity before the server commits resources to the connection.
The standard workaround is to pass the JWT token as a query parameter: wss://api.example.com/ws?token=eyJhbG.... The server validates the token before accepting the connection and allocating resources. The token is protected by TLS encryption in transit.
You can validate the token before completing the WebSocket handshake:
1234567891011121314// server/src/ws/authenticate.js import jwt from 'jsonwebtoken'; import { config } from '../config.js'; export function authenticateWs(token) { if (!token) return null; try { const payload = jwt.verify(token, config.jwtSecret); const user = queries.findUserById(payload.id); return user || null; } catch { return null; } }
The authenticateWs function extracts the token from the query string, verifies it against the JWT secret, and looks up the corresponding user in the database. If any step fails (missing token, expired signature, or deleted user), it returns null, and the WebSocket connection is immediately closed.
Rate limiting is essential for chat's bursty traffic patterns. Users type rapidly, then go silent. A token bucket algorithm handles this well, allowing controlled bursts (e.g., 5 messages/second with a bucket capacity of 10) while preventing sustained abuse.
Stream handles token validation, WebSocket authentication, rate limiting, and token refresh internally. You only manage the initial token generation on your server.
1234567891011121314151617181920// Server-side: generate a Stream token for the authenticated user import { StreamChat } from 'stream-chat'; const streamClient = StreamChat.getInstance( process.env.STREAM_API_KEY, process.env.STREAM_API_SECRET ); // After validating the user's credentials... const streamToken = streamClient.createToken(userId); // Return streamToken to the client alongside your app's session token //--- // Client-side: connect with the token const client = StreamChat.getInstance(apiKey); await client.connectUser( { id: user.id, name: user.username }, streamToken // Generated server-side, never exposes your API secret );
createToken() signs a JWT with your API secret and the user's ID. This token is scoped to that specific user, so the client can authenticate with Stream's servers without your API secret ever leaving the backend.
Data Modeling: Schemas, IDs, and Pagination
Chat data modeling revolves around four core entities:
- Users
- Channels (or conversations)
- Messages
- Metadata, like reactions and read receipts
The schema design has direct implications for query performance, pagination behavior, and how easily you can add features later.
Here's an example schema from a chat app implementation:
CREATE TABLE IF NOT EXISTS users (
id TEXT PRIMARY KEY,
username TEXT UNIQUE NOT NULL,
email TEXT UNIQUE NOT NULL,
password TEXT NOT NULL,
avatar_url TEXT,
is_online INTEGER DEFAULT 0,
last_seen TEXT DEFAULT (datetime('now')),
created_at TEXT DEFAULT (datetime('now'))
);
CREATE TABLE IF NOT EXISTS channels (
id TEXT PRIMARY KEY,
name TEXT NOT NULL,
type TEXT NOT NULL CHECK(type IN ('direct', 'group')),
created_by TEXT NOT NULL REFERENCES users(id),
created_at TEXT DEFAULT (datetime('now'))
);
CREATE TABLE IF NOT EXISTS channel_members (
channel_id TEXT NOT NULL REFERENCES channels(id) ON DELETE CASCADE,
user_id TEXT NOT NULL REFERENCES users(id) ON DELETE CASCADE,
role TEXT DEFAULT 'member',
joined_at TEXT DEFAULT (datetime('now')),
PRIMARY KEY (channel_id, user_id)
);
CREATE TABLE IF NOT EXISTS messages (
id TEXT PRIMARY KEY,
channel_id TEXT NOT NULL REFERENCES channels(id) ON DELETE CASCADE,
user_id TEXT NOT NULL REFERENCES users(id),
text TEXT,
parent_id TEXT REFERENCES messages(id), -- threading support
attachment_url TEXT,
attachment_type TEXT,
attachment_name TEXT,
created_at TEXT DEFAULT (datetime('now'))
);
CREATE TABLE IF NOT EXISTS reactions (
id TEXT PRIMARY KEY,
message_id TEXT NOT NULL REFERENCES messages(id) ON DELETE CASCADE,
user_id TEXT NOT NULL REFERENCES users(id) ON DELETE CASCADE,
emoji TEXT NOT NULL,
UNIQUE(message_id, user_id, emoji)
);
CREATE TABLE IF NOT EXISTS read_receipts (
channel_id TEXT NOT NULL REFERENCES channels(id),
user_id TEXT NOT NULL REFERENCES users(id),
last_read_message_id TEXT REFERENCES messages(id),
last_read_at TEXT DEFAULT (datetime('now')),
PRIMARY KEY (channel_id, user_id)
);
CREATE INDEX IF NOT EXISTS idx_messages_channel ON messages(channel_id, created_at);
CREATE INDEX IF NOT EXISTS idx_messages_parent ON messages(parent_id);
A few design decisions worth calling out:
- The
channel_membersjoin table uses a composite primary key(channel_id, user_id)and tracks each user's role and join time. - A unified
channelstable handles both direct messages and group chats through atypediscriminator. - The
read_receiptstable stores each user's last-read message per channel, which underpins unread counts. reactionsuse aUNIQUE(message_id, user_id, emoji)constraint so a user can add each emoji at most once per message.
Why message ID generation matters
Auto-incrementing integers don't work in distributed systems because multiple servers would generate conflicting IDs. Random UUIDs work for uniqueness but are terrible for database performance: random insertion patterns fragment B-tree indexes, significantly degrading write throughput.
The solution used by Discord, Twitter, and Instagram is Snowflake IDs: 64-bit integers composed of 41 bits for the millisecond timestamp, 10 bits for the machine ID, and 12 bits for the sequence number. This yields 4,096 unique IDs per millisecond per machine with no coordination required. Because Snowflake IDs are time-sortable, they enable efficient range queries and cursor-based pagination while producing 5 to 10x faster writes than random UUIDs in B-tree indexes.
For teams that don't want to implement their own ID generation, ULIDs (128-bit, lexicographically sortable) and the newer UUIDv7 standard offer similar time-ordering properties with broader tooling support.
Database evolution at scale
Chat workloads are write-heavy, append-mostly, and time-ordered. These characteristics push systems through a predictable database evolution as they grow.
PostgreSQL is the right starting point. ACID compliance, full-text search via tsvector, rich query capabilities, and mature tooling make it sufficient for most applications up to moderate scale. When write throughput becomes the bottleneck, distributed stores like ScyllaDB, Cassandra, or DynamoDB provide horizontal scaling at the cost of query flexibility.
Most production systems end up with a hybrid storage architecture: a relational database for user accounts and metadata, a distributed store for message persistence, Redis for routing and ephemeral state, and object storage for media.
Cursor-based pagination
Offset-based pagination (LIMIT N OFFSET M) breaks for chat. The database must scan M+N rows and discard M of them, with the time degrading linearly as you page deeper. When new messages arrive between page requests, offset pagination may return duplicates or skip messages entirely.
Cursor-based pagination solves both problems. The client passes the timestamp (or ID) of the last message it received, and the server returns everything before that point:
12345678910111213141516171819// server/src/routes/messages.js router.get('/:channelId', (req, res) => { const { channelId } = req.params; const { before, limit = '25' } = req.query; // Cursor-based: "give me 25 messages older than this timestamp" const messages = queries.getMessages(channelId, before || null, parseInt(limit)); // Attach reactions to each message const messageIds = messages.map((m) => m.id); const allReactions = queries.getReactionsForMessages(messageIds); // ... enrich and return res.json({ messages: enriched, hasMore: messages.length === parseInt(limit), readReceipts: queries.getReadReceipts(channelId), }); });
The endpoint accepts an optional before cursor (a timestamp) and a limit. It fetches messages older than the cursor, attaches reactions to each message in a batch query (avoiding N+1 queries), and returns a hasMore flag so the client knows whether to request another page. Read receipts for the channel are included in every response so the UI can display them without a separate API call.
The underlying query:
SELECT m.*, u.username, u.avatar_url FROM messages m INNER JOIN users u ON u.id = m.user_id WHERE m.channel_id = ? AND m.created_at < ? ORDER BY m.created_at DESC LIMIT ?
This is O(1) regardless of how deep you page. The hasMore flag (checking if the result set equals the limit) tells the client whether more messages exist.
Message Routing: From Sender to Every Recipient
Once a message leaves a sender's device, it must reach every recipient in the channel, potentially across multiple server instances.
Every production chat system solves routing with some form of publish/subscribe. When a message arrives, the server publishes it to a topic (usually keyed by channel ID). Every server instance subscribed to that topic receives the message and pushes it to its locally connected clients.
We can handle this with an in-process ClientManager that maps user IDs to WebSocket connections:
12345678910111213141516171819// server/src/ws/clientManager.js class ClientManager { constructor() { this.clients = new Map(); // Map<userId, Set<WebSocket>> } sendToChannel(channelId, event, excludeUserId = null) { const memberIds = queries.getChannelMemberIds(channelId); const data = JSON.stringify(event); for (const memberId of memberIds) { if (memberId === excludeUserId) continue; for (const ws of this.getClients(memberId)) { if (ws.readyState === 1) { ws.send(data); } } } } }
The ClientManager maintains a Map of user IDs to Sets of WebSocket connections (a user can have multiple open tabs or devices). sendToChannel looks up all member IDs for a channel, iterates over their active connections, and sends the serialized event to each. The optional excludeUserId parameter prevents echoing events back to the sender when unnecessary.
When a message is sent, the handler persists it and broadcasts it to all channel members:
12345678910111213141516171819// server/src/ws/events/message.js export function handleNewMessage(ws, event) { const { channelId, text, attachmentUrl, attachmentType, attachmentName } = event; const messageId = uuidv4(); queries.createMessage(messageId, channelId, ws.userId, text, attachmentUrl, attachmentType, attachmentName); const message = { id: messageId, channelId, user: { id: ws.user.id, username: ws.user.username }, text, reactions: [], createdAt: new Date().toISOString(), }; clientManager.sendToChannel(channelId, { type: 'message.new', message }); }
The handler generates a UUID for the message, writes it to the database, constructs a message object with the sender's user info and a timestamp, then fans it out to every member of the channel (including the sender, so their UI can confirm delivery). This write-then-broadcast sequence ensures the message is persisted before any client sees it.
Where this breaks: multiple servers
In-process routing only works when all users connect to the same server. As soon as you need a second WebSocket server, you need an external message broker.
Redis Pub/Sub delivers sub-millisecond latency and works well for ephemeral events (typing indicators, presence updates), but provides no persistence. Apache Kafka provides durable, replayable event streams with guaranteed per-partition ordering and throughput exceeding 100K messages per second per broker, at the cost of operational complexity. Most production systems use both: Slack combines Kafka for durable message queuing with Redis for fast in-flight job data.
Chat systems use fan-out on write: when a message is sent, it is immediately pushed to all recipients' connections. The write amplification is bounded by group size rather than follower count (unlike social feeds), making it practical for all but the largest channels.
Core Features: The Hidden Complexity
Each core chat feature appears simple in isolation. But they aren’t.
Typing indicators
Typing indicators require coordination between client-side debouncing, server-side relay, and automatic timeout cleanup.
On the client, keystroke events are debounced to avoid flooding the server:
12345678910111213141516// client/src/components/messages/MessageInput.jsx const handleTyping = useCallback(() => { const now = Date.now(); // Only send a typing event every 2 seconds if (now - lastTypingSent.current > 2000) { sendTypingStart(activeChannelId); lastTypingSent.current = now; } // Auto-stop after 3 seconds of no keystrokes clearTimeout(typingTimer.current); typingTimer.current = setTimeout(() => { sendTypingStop(activeChannelId); lastTypingSent.current = 0; }, 3000); }, [activeChannelId, sendTypingStart, sendTypingStop]);
The debounce logic throttles outgoing typing.start events to one every 2 seconds regardless of how fast the user types. A separate 3-second timeout automatically sends a typing.stop event when the user pauses. Without this, every keystroke would generate a WebSocket message, and the "typing" indicator would never clear if the user navigated away mid-sentence.
The server relays typing events to other channel members without persisting them:
1234567891011// server/src/ws/events/typing.js export function handleTypingStart(ws, event) { const { channelId } = event; if (!queries.isChannelMember(channelId, ws.userId)) return; clientManager.sendToChannel(channelId, { type: 'typing.start', channelId, user: { id: ws.user.id, username: ws.user.username }, }, ws.userId); // Exclude the sender }
The server checks channel membership first (preventing unauthorized users from triggering typing indicators), then broadcasts the event to all other members. Typing events are never persisted to the database since they're purely ephemeral.
On the receiving client, the typing state is tracked per channel with automatic expiration:
12345678910111213141516// In the ChatContext reducer and event handler case 'typing.start': { dispatch({ type: 'SET_TYPING_USER', channelId: event.channelId, userId: event.user.id, username: event.user.username, }); // Auto-clear after 4 seconds if no stop event arrives const key = `${event.channelId}:${event.user.id}`; clearTimeout(typingTimeouts.current[key]); typingTimeouts.current[key] = setTimeout(() => { dispatch({ type: 'CLEAR_TYPING_USER', channelId, userId }); }, 4000); break; }
The reducer adds the typing user to a per-channel map, and a 4-second timeout acts as a safety net: if the typing.stop event is lost (network glitch, tab closed), the indicator clears automatically. The timeout key is scoped to channelId:userId, so multiple users typing in the same channel are tracked independently.
The problem here is that in a group chat with 1,000 members, a single typing event fans out to 1,000 WebSocket pushes. For large channels, production systems suppress or sample typing indicators to prevent self-inflicted load spikes.
Presence (online/offline status)
You can track presence via the WebSocket connect/disconnect event:
123456789101112// Simplified from ws/index.js wss.on('connection', (ws) => { const wasOffline = !clientManager.hasConnections(user.id); clientManager.addClient(user.id, ws); if (wasOffline) { queries.setUserOnline(user.id, true); clientManager.broadcastToAll({ type: 'presence.update', userId: user.id, isOnline: true }); } });
The wasOffline check is important: it prevents redundant presence broadcasts when a user opens a second tab. Only the transition from zero to one connections (or from one to zero) triggers a presence event. The database is updated alongside the broadcast so that the presence state survives server restarts.
This broadcast-to-all approach breaks at scale. If a user with millions of followers comes online, the presence fan-out becomes a self-inflicted DDoS. Slack solves this with selective subscription: clients receive presence notifications only for users visible on their current screen, dramatically limiting fan-out to the active UI context.
Read receipts
Read receipts track the last message each user has seen in a channel. This is an upsert pattern:
123456789101112131415// server/src/ws/events/readReceipt.js export function handleReadMark(ws, event) { const { channelId, messageId } = event; if (!queries.isChannelMember(channelId, ws.userId)) return; queries.upsertReadReceipt(channelId, ws.userId, messageId); clientManager.sendToChannel(channelId, { type: 'read.update', channelId, userId: ws.user.id, messageId, readAt: new Date().toISOString(), }, ws.userId); }
The handler validates channel membership, upserts the read receipt (inserting a new row or updating the existing one for that user/channel pair), and broadcasts the update to other channel members. Other clients use this event to update unread counts and display read indicators (the “seen” checkmarks) in real time.
The client triggers this when viewing a channel's messages:
123456789// client/src/components/messages/MessageList.jsx useEffect(() => { if (channelMessages.length > 0 && activeChannelId) { const lastMsg = channelMessages[channelMessages.length - 1]; if (lastMsg.user.id !== user.id) { markRead(activeChannelId, lastMsg.id); } } }, [channelMessages.length, activeChannelId]);
The useEffect fires whenever the message list length changes or the active channel switches. It checks whether the most recent message was sent by someone else (you don't need to mark your own messages as read), then sends a read receipt with that message's ID. This means read receipts are updated automatically as new messages arrive, without requiring the user to click anything.
Reactions
Reactions require add/remove logic with deduplication (the UNIQUE(message_id, user_id, emoji) constraint) and real-time broadcast:
123456789101112131415161718// server/src/ws/events/reaction.js export function handleReactionNew(ws, event) { const { messageId, emoji } = event; const msg = getMessageChannel(messageId); if (!msg || !queries.isChannelMember(msg.channel_id, ws.userId)) return; const reactionId = uuidv4(); const result = queries.addReaction(reactionId, messageId, ws.userId, emoji); if (result.changes === 0) return; // Already exists (UNIQUE constraint) clientManager.sendToChannel(msg.channel_id, { type: 'reaction.new', channelId: msg.channel_id, messageId, reaction: { userId: ws.user.id, username: ws.user.username, emoji }, }); }
The handler looks up the message's channel (for authorization and broadcast targeting), then attempts the insert. The UNIQUE(message_id, user_id, emoji) constraint at the database level means the insert silently fails if the user has already added that emoji. The result.changes === 0 check detects this and skips the broadcast, preventing duplicate reaction events from reaching other clients.
Message editing and deletion
Editing and deleting messages follow the same persist-then-broadcast pattern as reactions. The main design decision is whether deletion is hard (row removed from the database) or soft (a deleted_at timestamp set, content replaced with a tombstone). Most production chat apps use soft delete: it preserves message threading, keeps read receipt references intact, and lets you display "This message was deleted" in the UI rather than a confusing gap in the conversation.
123456789101112131415161718192021222324252627282930// server/src/ws/events/message.js export function handleMessageUpdate(ws, event) { const { messageId, text } = event; const msg = queries.getMessageById(messageId); if (!msg || msg.user_id !== ws.userId) return; // only author can edit queries.updateMessageText(messageId, text); clientManager.sendToChannel(msg.channel_id, { type: 'message.updated', messageId, text, editedAt: new Date().toISOString(), }); } export function handleMessageDelete(ws, event) { const { messageId } = event; const msg = queries.getMessageById(messageId); if (!msg || msg.user_id !== ws.userId) return; queries.softDeleteMessage(messageId); clientManager.sendToChannel(msg.channel_id, { type: 'message.deleted', messageId, }); }
Both handlers verify that the requesting user is the message author before making any changes. Admins and moderators typically bypass this check via a role stored on the channel_members row. Stream handles edit and delete permissions, soft deletion, and the message.updated and message.deleted event broadcasts out of the box.
File attachments
File uploads use a two-step pattern. The client uploads the file first, then sends a message referencing the uploaded file's URL.
123456789101112131415161718// server/src/routes/upload.js const storage = multer.diskStorage({ destination: 'uploads/', filename: (_req, file, cb) => { const ext = extname(file.originalname); cb(null, `${uuidv4()}${ext}`); }, }); router.post('/', authMiddleware, upload.single('file'), (req, res) => { const isImage = req.file.mimetype.startsWith('image/'); res.json({ url: `/uploads/${req.file.filename}`, originalName: req.file.originalname, size: req.file.size, type: isImage ? 'image' : 'file', }); });
The upload route uses multer for multipart form parsing and renames each file with a UUID to prevent collisions and path traversal attacks. The response includes the file's URL, original name, size, and a type flag distinguishing images from other files. The client uses this metadata to render inline image previews or download links in the message UI.
In production, you'd replace local disk storage with pre-signed URLs for direct-to-S3 uploads. The client requests an upload URL from your server, uploads directly to cloud storage (bypassing the application server entirely), and a post-upload pipeline handles virus scanning, thumbnail generation, and CDN distribution.
How Stream handles all of this
With Stream's React SDK, every feature above is built in. The entire chat UI, including typing indicators, presence, reactions, file uploads, read receipts, and message threading, reduces to a component tree:
1234567891011121314151617181920212223242526// client/src/components/chat/ChatContainer.jsx import { Chat, Channel, ChannelHeader, ChannelList, MessageInput, MessageList, Thread, Window, } from 'stream-chat-react'; export default function ChatContainer({ user, streamToken }) { const client = useStreamClient(user, streamToken); const filters = { type: 'messaging', members: { $in: [user.id] } }; const sort = { last_message_at: -1 }; return ( <Chat client={client} theme="str-chat__theme-light"> <ChannelList filters={filters} sort={sort} /> <Channel> <Window> <ChannelHeader /> <MessageList /> <MessageInput /> </Window> <Thread /> </Channel> </Chat> ); }
The Chat component initializes the client and provides context to its children. ChannelList queries for channels the current user is a member of, sorted by most recent message. Channel manages all state for the active channel. Inside Window, ChannelHeader, MessageList, and MessageInput render the full conversation UI, with built-in typing indicators, read receipts, reactions, and file uploads. Thread renders threaded replies in a side panel. Each of these components is individually replaceable with custom implementations when you need to customize the UI.
This replaces hundreds of lines of code for the WebSocket handler, event routing, typing-debounce logic, presence tracking, read receipt sync, reaction CRUD, and the file upload pipeline. The SDK also handles features that the original version doesn't implement at all: message search, URL link previews, message threading, moderation, and offline support with local caching.
Frontend Architecture: Rendering at Chat Speed
The frontend of a chat application has specific rendering challenges that general-purpose UI patterns don't address well. Chat message lists can contain tens of thousands of items, new content arrives asynchronously from multiple sources, and users expect sub-100ms visual feedback on every action.
Message list virtualization
Rendering thousands of messages creates a massive DOM that destroys scroll performance. Virtualization keeps the full dataset in memory but only renders the 20 to 30 messages currently visible in the viewport, plus a small overscan buffer.
You can use an IntersectionObserver to trigger loading older messages when the user scrolls to the top:
12345678910111213141516171819// client/src/hooks/useInfiniteScroll.js export function useInfiniteScroll(onLoadMore, hasMore, isLoading) { const observerRef = useRef(null); const sentinelRef = useCallback((node) => { if (observerRef.current) observerRef.current.disconnect(); if (!node || !hasMore) return; observerRef.current = new IntersectionObserver((entries) => { if (entries[0].isIntersecting && !isLoading && hasMore) { onLoadMore(); } }, { threshold: 0.1 }); observerRef.current.observe(node); }, [onLoadMore, hasMore]); return sentinelRef; }
The hook returns a sentinelRef callback that attaches an IntersectionObserver to a hidden element at the top of the message list. When that element scrolls into view (the user has scrolled up far enough), the observer fires onLoadMore to fetch the next page of older messages. The hasMore and isLoading guards prevent duplicate requests during an active fetch or when the history is exhausted.
The message list component handles scroll position preservation when prepending older messages:
123456789101112131415// client/src/components/messages/MessageList.jsx const loadOlderMessages = useCallback(async () => { const oldestMsg = channelMessages[0]; const el = listRef.current; const prevScrollHeight = el?.scrollHeight || 0; await loadMessages(activeChannelId, oldestMsg.createdAt); // Maintain scroll position after prepending content requestAnimationFrame(() => { if (el) { el.scrollTop = el.scrollHeight - prevScrollHeight; } }); }, [activeChannelId, channelMessages, loadMessages]);
Without this scrollHeight preservation, the viewport jumps unpredictably when older messages load. It's a subtle detail that most tutorials miss.
For production use, libraries like react-virtuoso (purpose-built for chat with dynamic heights and reverse infinite scroll) handle the complex measurement and recycling logic.
Optimistic updates
Users expect messages to appear instantly after they are sent. The optimistic update pattern renders the message in the UI before the server confirms receipt. When the message echoes back via the WebSocket broadcast, the reducer deduplicates by ID:
12345678case 'ADD_MESSAGE': { const existing = state.messages[channelId] || []; if (existing.find((m) => m.id === message.id)) return state; return { ...state, messages: { ...state.messages, [channelId]: [...existing, message] }, }; }
In a more robust implementation, you'd generate a client_msg_id (UUID) on the client, add the message to local state with a “sending” status, and match it against the server's echo to update the status to “sent” and replace the temporary ID with the server-assigned one.
State management
A reducer pattern for chat state maps well to the event-driven nature of real-time messaging:
123456789101112131415161718192021222324252627// client/src/context/ChatContext.jsx (simplified) function chatReducer(state, action) { switch (action.type) { case 'SET_CHANNELS': return { ...state, channels: action.channels }; case 'ADD_MESSAGE': { const existing = state.messages[channelId] || []; if (existing.find((m) => m.id === message.id)) return state; return { ...state, messages: { ...state.messages, [channelId]: [...existing, message] } }; } case 'SET_TYPING_USER': { const channelTyping = { ...(state.typingUsers[channelId] || {}) }; channelTyping[userId] = { username, timestamp: Date.now() }; return { ...state, typingUsers: { ...state.typingUsers, [channelId]: channelTyping } }; } case 'SET_USER_ONLINE': { const online = new Set(state.onlineUsers); online.add(action.userId); return { ...state, onlineUsers: online }; } // ... 15+ more event types } }
Each WebSocket event maps to a reducer action. Events come in via the WebSocket, dispatch actions to the reducer, and the UI re-renders based on the updated state.
Stream's React SDK handles all of this internally. The Channel component manages message, typing, read, and reaction states. You access it via hooks like useChannelStateContext() when building custom components:
1234567891011121314// Stream version: accessing channel state in a custom header import { useChannelStateContext } from 'stream-chat-react'; export default function CustomChannelHeader() { const { channel, members, watcher_count } = useChannelStateContext(); const memberCount = Object.keys(members || {}).length; return ( <div className="custom-channel-header"> <h3>{channel?.data?.name || 'Chat'}</h3> <span>{memberCount} members, {watcher_count} online</span> </div> ); }
useChannelStateContext() gives you access to the active channel's full state: metadata, members, online watcher count, messages, and more. You can use this to build completely custom UI components while letting Stream manage all the underlying state synchronization. The channel name, member count, and online count shown here update in real time as users join, leave, or connect.
Scaling: From One Server to Millions of Connections
The architectural patterns described so far work on a single server. Production chat requires solving problems that are fundamentally different in a distributed environment.
The stateful connection problem
WebSocket connections are persistent and pinned to specific server processes. Unlike with stateless HTTP, a WebSocket message must reach the exact server that holds the recipient's connection. This breaks traditional round-robin load balancing entirely.
Connection density varies dramatically by technology stack:
| Stack | Connections per server | Memory per connection |
|---|---|---|
| Node.js | \~50K idle, \~20K active | \~10KB |
| Go / Rust | 50K - 500K | \~2-5KB |
| Erlang/BEAM (WhatsApp, Discord) | 2-3 million | \~300 bytes |
The Erlang/BEAM numbers explain why WhatsApp served 900 million users with just 50 engineers, and why Discord scaled individual servers to nearly 2 million concurrent users. Technology choice at the connection layer determines your scaling ceiling by orders of magnitude.
Horizontal scaling strategies
Use least-connections load balancing for WebSocket traffic (round-robin ignores existing connection counts). To manage state across servers, store session data in Redis so any server can handle reconnecting clients. Sticky sessions (IP hash or cookie affinity) are simpler but fragile: server failure disconnects all pinned clients simultaneously.
Slack separates concerns into four cooperating services:
- Gateway Servers (WebSocket termination, multi-region)
- Channel Servers (stateful pub/sub via consistent hashing, handling tens of millions of channels per host)
- Admin Servers (stateless interface between webapp and channel servers)
- Presence Servers (in-memory status tracking).
This separation lets each layer scale independently.
Offline Support: Store, Forward, and Sync
Users switch between WiFi and cellular, enter tunnels, close laptops, and restart apps. A production chat system must handle all of these gracefully.
Store-and-forward
When a message arrives, and the recipient is disconnected, the server stores it in a delivery queue. The sender sees a single checkmark (server received). When the recipient reconnects, queued messages are delivered, the recipient's client sends an acknowledgment, and the sender sees a double checkmark. This is the store-and-forward pattern WhatsApp pioneered.
Delta sync on reconnection
When a client reconnects after being offline, it sends its last_known_event_id per conversation. The server computes and returns all new messages, edits, deletes, and reactions since that point. The client merges the delta into its local store, then pushes any locally queued outbound messages.
Client-generated UUIDs per message prevent duplicates on retry. If the client sends a message, loses connectivity before receiving the server's acknowledgment, and retries after reconnecting, the server can detect the duplicate via the client-generated ID and skip re-processing. This idempotency pattern is the practical alternative to exactly-once delivery, which is impractical in distributed systems.
For mobile apps, the client maintains a local database (SQLite on iOS/Android, IndexedDB on web) as a cache. This enables instant rendering of recent conversations on app launch, without waiting for a network round-trip.
Build vs. Buy: Where To Spend Your Engineering Time
Every section above describes real engineering work. The question now is whether that work should be yours.
What building from scratch actually costs
Building production-ready chat application requires approximately 4 to 6 months with a team of 6 engineers for a multi-platform MVP. A minimal 1:1 chat takes 1 to 3 months with 2 developers. Industry surveys consistently show that more than half of real-time projects exceed their planned budget and timeline, with year-one maintenance adding another $100K to $300K.
Seven specific problems account for the bulk of engineering time:
- Reliable delivery requires at-least-once semantics with client-side deduplication
- Message ordering is trivial on a single server, but exponentially harder across sharded infrastructure
- Offline sync demands local databases, cursor-based delta synchronization, and conflict reconciliation across devices
- WebSocket scaling breaks traditional load balancing entirely due to stateful connections
- Cross-platform push notifications require ongoing maintenance as Apple and Google change their APIs
- Presence at scale needs a distributed infrastructure with thundering herd mitigation
- Moderation and abuse prevention are often estimated at 20% of total effort, but prove critical in production
When each path makes sense
- Build when chat is your core product, when you have deep real-time infrastructure expertise on staff, when data sovereignty requirements eliminate all vendors, or when vendor fees consistently exceed $20K/month.
- Buy when chat is a supporting feature (marketplaces, healthcare, education platforms), when time-to-market matters (SDK integration takes days, not months), when you lack real-time infrastructure specialists, or when total vendor cost stays below the break-even point, typically 5 to 10 years before build costs converge with buy costs.
Build With Your Stack
The architecture in this guide is language-agnostic. Stream's SDKs implement the same patterns across every major platform, so the concepts translate directly regardless of what you're shipping on.
| Platform | Best for | Tutorial |
|---|---|---|
| React | Web apps, dashboards, customer support tools | Tutorial |
| React Native | Cross-platform iOS + Android from a single codebase | Tutorial |
| iOS (SwiftUI) | Native iOS apps using declarative Swift UI patterns | Tutorial |
| iOS (UIKit) | Native iOS apps targeting iOS 13 and above | Tutorial |
| Android (Jetpack Compose) | Native Android apps using modern declarative UI | Tutorial |
| Android (XML) | Native Android apps using traditional view-based UI | Tutorial |
| Flutter | Cross-platform mobile and desktop from a single Dart codebase | Tutorial |
| Angular | Enterprise web apps | Tutorial |
| Unity | In-game chat for mobile and desktop titles | Tutorial |
The React and React Native tutorials are the most comprehensive starting points if you're evaluating the API before committing to a stack.
The Right Abstraction for the Job
Building a real-time chat application well means solving a specific set of distributed systems problems. The architectural patterns are well-established: WebSocket connections managed behind a pub/sub routing layer, Snowflake IDs for time-ordered message identification, cursor-based pagination, fan-out on write for real-time delivery, and store-and-forward for offline users.
The most underappreciated insight from studying production systems is that small teams with the right technology dramatically outperform large teams with the wrong abstractions.
For teams where chat is not the core product, the build-vs-buy math strongly favors buying. The hard problems (reliable delivery, offline sync, cross-platform push, and presence at scale) consume disproportionate engineering time relative to their apparent simplicity. Stream Chat has already solved these problems. The question is whether your specific constraints justify solving them again.