Introduction
Stream's hosted newsfeed API gives you several advantages over an in-house solution:
- Zero Maintenance
- Weeks/Months of development time saved
- Lower monthly costs
This blogpost will focus on the monthly costs. It covers in detail how the GetStream Entry plan pricing compares with in-house solutions. Before going into the details though, let’s cover some of the reasons why we can offer such attractive pricing:
- GetStream.io runs on a highly optimized Cassandra cluster
- We can invest months in optimizing performance
- Resources are shared over our customers
The In-House Solutions
We are going to compare Stream against 3 common in-house solutions.
- Redis as feed cache storage
- Redis as primary feed storage
- Cassandra or MongoDB
The Requirements:
To make the comparison fair we define a few requirements that an in house solutions must meet:
- Feed storage is at least 5 GB
- Feeds are stored on a persistent storage
- Infrastructure can handle 5 million fanouts per month
- Data redundancy (data is stored in at least 2 places)
The In-House Infrastructure
- Feed storage is hosted on AWS. The costs are based on EC2 US East pricing (as of september 2014).
- Redis solutions are hosted on Amazon’s Elastic Cache service; a managed solution to deploy and operate Redis installations.
1. Redis as feed cache storage
The first commonly used in-house approach involves using Redis as a caching layer in front of your database. Using this approach Redis stores the denormalized feed. Redis stores all data in memory and can quickly become an expensive way of storing newsfeeds. In order to reduce this memory usage a common approach is to fallback to the database and only store the most often requested data in Redis.
This solution is very effective in keeping costs low and does not require Redis to be configured to persist data. Let’s assume the caching approach works well and you only need to store halve the data. When hosting on AWS the cache.m3.medium instance type comes with 2.78 GB for 66 dollars a month.
2. Redis as primary feed storage
Another often used solution is to use Redis as the primary feed storage. This has the benefit of having better performance and shorter development times compared to option 1. It is however more expensive. As Redis is our main data store it needs to be configured to persist the data to disk. Furthermore master/slave async replication is used to guarantee a high availability of the feed service.
In this case cache.m3.large is the instance type of choice with 6.05GB of storage. As we need to replicate the data 2 instances will be used. The monthly cost for running these two instances is 266 dollars.
3. Cassandra or MongoDB as feed storage
Redis can quickly become an expensive way to store your newsfeed. Most companies therefore eventually find an alternative solution. Both Fashiolista and Instagram for instance moved from Redis to Cassandra. Note that building and maintaining such solutions is often very time consuming.
Both MongoDB and Cassandra are easy to scale horizontally. A 3 node setup is the smallest cluster configuration to start with (Cassandra could run on 2 nodes but that’s definitely not recommended). The EC2 m3.large instance comes with 7.5 Gbytes of memory and 2 vCPUs. It costs 102 dollars per month, so a cluster of 3 nodes will cost you 306 dollars monthly.
Additional Components
All three of these solutions also require some additional components. You need a task broker such as RabbitMQ to distribute the fanout process. Furthermore you’ll need a task worker to handle the fanouts. If you want to listen to feed changes in realtime you’ll also need a Faye server and a small Redis server. Last if you want to access your newsfeed via a micro service architecture, you’ll need a server to host your API.
- RabbitMQ server
- Task worker
- Web worker (if you use a micro service architecture)
- Faye (for realtime)
- Redis (for realtime)
Let’s say you run these on t2.medium instances you’ll spend $38 per instance per month. Depending on your needs you’ll need 2 to 5 additional instances. This will add between $76 and $190 to your monthly costs.
Recap
- Redis as feed cache: $142 or more monthly
- Redis as primary storage: $342 or more monthly
- Cassandra/MongoDB: $382 (but more than 5GB of storage)
- GetStream.io entry level plan: $50 monthly
Conclusion
A complete feed solution is more than just its infrastructure. For simplicity we did not include the cost of writing your own feed implementation and the costs related to maintaining the code and servers.
Even the simplest in-house deployment comes at almost three times the price of the GetStream entry plan (50 dollars a month). More complex setups with Redis or Cassandra/MongoDB are between 6 and 7 times more expensive.