5 Reasons to Use Cassandra For Building Your Newsfeed
Users of the open source Stream-Framework often ask us if they should use Redis or Cassandra to power their newsfeed. This article highlights 5 scenarios in which you are better off going for Cassandra. We’re not comparing Redis & Cassandra, only their application for the specific domain of building newsfeeds.
Benefits of using Cassandra
Let’s start by explaining some of the high level advantages of using Cassandra:
- 1. Cassandra stores to disk, which gives you an important cost advantage compared to any Redis based approach.
- 2. Data is automatically sharded across the nodes in your cluster. Furthermore you can configure the cluster in such a way that data is replicated and your system is highly available.
- 3. The best part about Cassandra is that you can easily add nodes. As your user base is growing it’s easy to add more capacity.
- 4. The opscenter monitoring tool is very powerful and gives you great insights into your cluster’s usage and health.
Cassandra & Newsfeeds
If one or more of the following points are true for your newsfeed functionality you probably want to look into a Cassandra based approach.
A. Rapid Growth
If you are growing rapidly and are exceeding the amount of data you can easily store in a single Redis node you’re definitely better of moving to Cassandra. It takes quite some time to learn how to maintain and operate Cassandra, but it’s definitely easier than building your own resharding mechanism for Redis.
B. Cloud Hosting
Redis can quickly become expensive. The most famous example is Instagram spending tons on AWS Redis hosting. Redis stores all data in memory, which is especially expensive when you’re running in the cloud. If you’re on cloud hosting, you want to make sure you don’t end up running a large cluster of Redis machines.
C. Database Fallback
When you’re using Redis you want to do your best to reduce memory usage. One common way of achieving this is to store only active users or only the first bit of the newsfeed in Redis. This approach works because you can often fallback to the database in the rare scenario that data outside of Redis is requested. Now this approach only works if you can actually fall back to the database. If your feature set combined with your traffic make it hard to fall back to your relational database you should start thinking about Cassandra.
D. Aggregated Feeds
Aggregated feeds take up more storage space than flat feeds. A common example of an aggregated activity is “Thierry, Tommaso and 5 other people like your picture”. Storing these aggregated feeds takes more memory than a simple flat feed. Furthermore they also make it harder to fallback to the database. Cassandra is often the better option if you want to support aggregated feeds and have a large user base.
If your infrastructure bill for Redis is above 5.000 a month you should definitely start looking at Cassandra based solutions. Often your Cassandra based solution will be 4 to 10 times more affordable than the Redis based alternative.
I hope this helps with your decision to use Redis or Cassandra for your newsfeed. Note that our open source Stream-Framework supports both Redis and Cassandra.
If you’re looking for a more high level solution, you should have a look at getstream.io, which gives you a beautiful REST api for creating your newsfeed. (Based on a highly optimized Cassandra 2.0 cluster).