Architecture & Benchmark

Architecture Overview

APIs should scale with your growth, both from a tech and pricing perspective. Here we explain the architecture that allows the activity feed API to scale to apps with hundreds of millions of users.

Activity feeds are notoriously hard to scale. When you follow users or other areas of interest you create a feed that's unique to you. At scale this means it's hard to distribute the load across servers.

Benchmark at a glance

Stream Activity Feeds is regularly benchmarked to ensure its performance stays consistent. As you can see in the chart below, latency remains extremely fast and stable even at 100M users (~11ms p50).

Benchmark Results

We'll cover the following topics to show how Stream achieves this performance:

  • Tech deep dive / architecture
  • Benchmark results
  • For you style personalization

The tech behind the Stream Activity Feeds API

Stream uses Golang as the backend language for activity feeds. Go is an extremely good fit for activity feeds. It handles aggregation, ranking, push and websockets very well. The websocket implementation uses cuckoo filters, so it has a lightweight way to validate if a user is active on a given server.

Push/Pull & Fanout - Materialized Feed

Activity feeds are typically powered using push (fanout on write) or pull (fanout on read). As we started building feeds we worked through many iterations of pull and push based architectures on various databases. We've found the best results come from using a combination of these approaches and building what we call a Materialized Feed.

AspectPush (Fanout on Write)Pull (Fanout on Read)Materialized Feed
Write costHigh - copies to all followersLow - single writeLow - single write with selective updates
Read costLow - feed is precomputedHigh - aggregates at read timeLow - precomputed with delta fetching
Celebrity problemSevere - millions of writes per activityNoneHandled - prioritizes active feeds
Inactive usersWasteful - updates feeds never readEfficientEfficient - skips inactive feeds
Read latency~1-5ms~100-500ms+~10ms
StorageHigh - duplicated across feedsLow - single copyMedium - smart deduplication
ConsistencyEventual (fanout delay)StrongNear-real-time with delta sync
Best forSmall follower counts, read-heavyWrite-heavy, few readsGeneral purpose, large scale

The materialized feed combines the read performance of push with the write efficiency of pull. It achieves this by selectively updating feeds based on user activity patterns rather than blindly fanning out to all followers.

The concept of the materialized feed is similar to what a database does when creating a materialized view. If a user follows 100 other users it will take these activities and store them. The next time you open your feed it will fetch the changes/delta of new activities by those users. There are a few clever tricks that speed it up:

  • In most apps many users/feeds are inactive. The algorithm for updating feeds prioritizes recently active feeds
  • As it builds the top X items in the feed in a heap, it keeps track of which feeds don't have recent enough content / can't contribute to this feed
  • At first we used RocksDB/raft for this. The most recent iteration uses TiKV, which is a slightly higher level of abstraction than RocksDB. TiKV is nice since it's low level enough that you can reach excellent feed performance, but high level enough that you don't have to deal with the complexities of building multi-raft clusters (and rebalancing).
  • Column denormalization. The fields used for ranking or aggregation are denormalized. This allows you to rank feeds before loading the full activity

The end result is a feed that can load in less than 10ms.

Client Side Caching & Cache Stampede Protection

We use Redis client side caching for optimal performance. For high traffic feeds, this client side caching approach saves you a roundtrip to Redis. A combination of ristretto and singleflight help prevent cache stampedes.

Filtering

Many social apps allow some level of filtering on your activity feed. For instance only showing images, or only videos, or only posts in a certain language. To enable filtering and also keep the activity feed fast, we create a temporary materialized feed for your filter pattern. So if you filter on all posts tagged with english, we'll build up a materialized feed with just the english posts.

To ensure that it's possible to build up this feed with the best possible performance we've put a few constraints in place. At read time you cannot specify more than 10 tags. You can filter on equal or IN, but you are not allowed to use "not in". This ensures the filtered feed runs at a performance level that's very close to a regular feed.

Aggregation & Ranking

Aggregation keys are precomputed on write. This ensures that opening up an aggregated feed is extremely fast.

Ranked feeds are more dynamic in nature and can't be precomputed. Feeds includes a fast expression parsing library that enables us to rerank feeds quickly.

Infra & testing

An extensive test suite, monitoring and production smoke tests ensure a high uptime. (100% over the last 12 months at the time this was written.) We also offer a 99.999% uptime SLA with acceleration.

Benchmark

We built an extensive benchmark infrastructure to measure the scalability and performance of our API. Performing realistic benchmarks at large scale is a small challenge on its own that we solved by building our own distributed system with separate control-plane, workers and data plane to run large workloads and capture all relevant telemetry from probes.

Methodology

The benchmark has been set up to simulate an app with 100 million users with the following assumptions:

  • 10-30% daily active users
  • Peak concurrent users (PCU): 5-15% of daily active users

This leads to the following 3 scenarios being simulated:

  • Low estimate: 500k peak concurrent users (10% DAU, 5% PCU)
  • Medium estimate: 2M peak concurrent users (20% DAU, 10% PCU)
  • High estimate: 4.5M peak concurrent users (30% DAU, 15% PCU)

Results

The benchmark processed over 37 million operations with a 10% write / 90% read workload distribution across a dataset of 100M users, 500M activities, and 200M follow relationships. Each scenario was tested at 500, 1000, and 1500 requests per second to measure performance under increasing load.

500K Peak Concurrent Users (Low)

RPSRead Latency (p50)Write Latency (p50)Read Latency (p99)Write Latency (p99)Success Rate
50014.33ms29.53ms68.31ms44.20ms100%
100014.67ms29.11ms62.35ms44.76ms100%
150015.10ms30.38ms62.13ms52.81ms100%

2M Peak Concurrent Users (Medium)

RPSRead Latency (p50)Write Latency (p50)Read Latency (p99)Write Latency (p99)Success Rate
50014.94ms29.87ms73.14ms45.73ms100%
100014.96ms30.13ms71.59ms52.17ms100%
150015.11ms30.67ms83.04ms68.04ms100%

4.5M Peak Concurrent Users (High)

RPSRead Latency (p50)Write Latency (p50)Read Latency (p99)Write Latency (p99)Success Rate
50014.89ms30.13ms73.10ms45.95ms100%
100014.97ms30.53ms76.04ms52.46ms100%
150015.30ms31.40ms135.84ms134.02ms100%

Key takeaways:

  • Read latency stays consistent at ~15ms p50 regardless of user scale or RPS
  • Write latency remains under 32ms p50 across all scenarios
  • 100% success rate maintained even at 4.5M concurrent users and 1500 RPS
  • p99 latency remains stable up to 1000 RPS; at 1500 RPS with 4.5M users, tail latency increases but still stays under 140ms

For You Feeds & Personalization

A for you style feed is key for user engagement. Out of the box we ship with the following activity selectors:

  • Following
  • Popularity
  • Location
  • Interest
  • Activities from people in your follower suggestions
  • Query activities (anything you can query on)

You can add up to 10 different ways to select activities and rank the end result with any of our ranking methods.

You can also enrich activities as they are added. The AI/LLM based enrichers can add topics for further improvements to interest based ranking.

We're always expanding the list of built-in capabilities for ranking / for you style feeds. Be sure to reach out if you have suggestions.

Transparency & Tech

When you select an API you want to know it can scale and will be reliable. So instead of just claiming that Stream works for large apps, we like to share these details on how we achieve the performance and scalability. If you'd like to learn more about Stream be sure to check out the docs or contact our team.