Think Twice: What to Consider Before Building Chat with RTI

Jeroen L.
Emily R.
Jeroen L. & Emily R.
Published February 20, 2024

Fellow developers, gather around as we delve into the treacherous terrain of chat development using real-time infrastructure, a land filled with landmines, caveats, and countless ways to go over budget without realizing before it is too late.

Picture this: you're navigating through a labyrinth of code, grappling with timelines tighter than your favorite pair of skinny jeans, when suddenly, real-time infrastructure waltzes in like a charismatic charlatan promising the moon on a silver platter. But hold your applause, for beneath its shimmering facade lies a Pandora's box of woes waiting to unleash chaos upon your unsuspecting project. Let us unveil why embracing real-time infrastructure might be the tech equivalent of volunteering for a root canal without anesthesia.

Ok, that might be a bit extreme. We aren't saying Ably, PubNub, and Firebase are bad in every way. In fact, they are all pretty amazing! But when it comes to adding chat to your product, they are not the best option — as we're sure current customers can attest.

The truth is that specialization matters to software developers; the same rings true for the API companies you integrate with. But, before we deep dive into chat specifically, let's recap what RTI is.

So, What is Real-Time Infrastructure?

Real-time infrastructure (RTI) refers to the underlying technology and architecture that enables instantaneous communication and data processing, allowing information to be transmitted and received with minimal delay or latency. This infrastructure includes servers, networks, and protocols optimized for rapid data transmission, facilitating applications and services that require immediate or near-instantaneous responsiveness. In essence, it is the technological magic that ensures your messages don't suffer from the dreaded "wait-for-it" syndrome, making sure your online interactions are smooth like butter.

Usually real-time infrastructure is built using an event-driven architecture with immediate message delivery. To deliver these messages to end-user client devices, the device most often uses WebSockets or long polling. The end user's device is unknown until the client declares an interest in receiving data or messages, and therefore, you need to have connectivity with the client's device to be initiated by the client.

A technical note: HTTP3 supports an expedited handshake procedure for client-server connections that have connected before, allowing for lower latency delivery over protocols that rely on HTTP connectivity, like WebSockets and long polling.

A prime example of a protocol building upon existing HTTP infrastructure is HTTP Live Streaming (HLS), a protocol developed by Apple. HLS delivers adaptable video and audio streams. One of the reasons HLS scales so well is that it can rely on all the existing HTTP infrastructure. HLS renders the video/audio feed into small chunks of tens of seconds in all the bitrates allowed by you as a server owner. These small chunks of content are then delivered to the end-user device and reassembled into a video or audio feed ready for consumption. Since each chunk of data is fetched over HTTP, any optimizations done in any protocols implementing this HTTP connectivity will improve overall performance.

Common use cases for RTI range from one-to-one to one-to-many, including multiplayer collaboration, data broadcasting, chat, GPS location tracking, notifications, and real-time state synchronization. These functions obviously touch many industries and their apps, but rolling out a proprietary global infrastructure only makes sense if you have the economics of scale on your side, plus the engineering specialization to maintain such an infrastructure.

Building your own real-time infrastructure is expensive, both in time and effort.

By buying into a real-time provider, you get the benefit of a worldwide infrastructure at a fraction of the cost. At some point, a company can get big enough not to have any benefit from buying third-party services, but that is a very distant objective that only a few companies will reach. That being said, let's forge on and explore the benefits and pitfalls of using RTI to build chat.

Building Chat With RTI

Chat is a standard use case for a real-time infrastructure. End users expect messages to arrive the moment they send them, and they expect features like online and typing statuses that are time-bound in their value; there is no use in being informed someone was typing five minutes ago.

You could go as far as to say real-time infrastructure is the best solution for this use case. Everything in a chat application is an event, and you want all those events delivered quickly and successfully to all interested parties. However, this does not necessarily mean it is a great idea to start rolling your own chat feature from the ground up on top of RTI, as you will still run into many development hurdles that could be avoided by using a chat API.

At Stream, our developers design chat components for developers and are familiar with the challenges you will face. Our advice as experts in the chat space is to use your time and effort elsewhere where you can make a difference for your app in the competitive market.

Building a globe-spanning chat infrastructure that facilitates low-latency interaction no matter where in the world you are takes a humongous amount of effort. By sharing the cost of creating and maintaining such an infrastructure as a Stream customer, you get the features, benefits, and performance you simply cannot create on your own any time soon.

Building chat in-house is an amazing challenge to get right. We know this because we've spent years building, optimizing, and iterating on our Chat API. A good chat API provider supports you with amazing infrastructure and performance. A great chat API provider accelerates the implementation of your chat interface beyond your wildest dreams and allows you to add a robust and full-featured experience to your product in days, not months.

Let's take a look at how complex this process can be from a timeline and financial perspective by exploring the details of some existing real-time providers.

Evaluating Three Real-Time Providers for Chat

Lest we dismiss our rivals too quickly, we shall pit Stream Chat against the motley crew of PubNub, Ably, and Firebase. Sure, it's a bit like comparing apples to oranges to pineapples to dragonfruit, but our job is to discern whether a platform custom-crafted for chatting holds any true advantage. Buckle up as we embark on an odyssey through cost, speed, reliability, scalability, and the enigmatic realm of developer experience in the land of chat platforms.

As we mentioned, real-time providers allow you to access infrastructure at a scale you could not easily achieve independently. Real-time providers provide infrastructure and APIs at varying levels of complexity; some will require you to build your features on their platform, while others offer a platform specifically tailored to specific use cases, and some even take care of much of the work you must do on the end-user client devices. Let's look at a few RTI providers at different levels of this complexity curve.

1. PubNub

PubNub describes itself as a platform for building real-time applications. It promises low latency, high reliability, and scalability via a globally distributed set of data centers. It is compatible with various platforms across web, desktop, mobile, and IoT, with various SDKs for rapid development of clients and servers. PubNub apps rely on a pub/sub (publish/subscribe) architecture, providing communication between server and subscribed clients and facilities for handling authentication, detecting user presence, sending push notifications, reacting to events, and more.

To start with PubNub, you must authenticate and subscribe your clients to receive messages from a channel on the PubNub platform. You can allow a client to publish messages on a channel if needed.

After connecting to PubNub's service, a device or process can send and receive messages to and from any subscribed channels. HTTP long polling uses JSON by default, but any text-based format can be a message format. On top of message-related events, PubNub also supports other event types you can define yourself. Features like user presence events by monitoring users subscribing and unsubscribing to channels.

Now, while PubNub promises a grand spectacle, don't be fooled into thinking it's all fun and games. Crafting a full-featured chat experience with this platform requires careful planning and a keen eye for detail. From defining your message format to wrangling user interfaces across multiple platforms, it's a labor of love with no shortcuts or UI SDKs to ease the burden, as PubNub only provides SDKs for React and React Native, unlike PubNub alternative, Stream, which provides Android, iOS, Flutter, React, React Native, Angular, Unreal, and Unity SDKs.

Building your own app? Get early access to our Livestream or Video Calling API and launch in days!

2. Ably

Ably is a cloud-based pub/sub platform-as-a-service (PaaS) largely known for its do-it-yourself mentality for building chat. It outlines the details of its platform in its documentation and offers lower-level client SDKs for six platforms. Similarly to PubNub, messages published to Ably are delivered to subscribed devices in real-time. However, Ably's presence feature is more robust and allows for more detailed events when a client subscribes or unsubscribes from a channel. Ably does not have any dedicated chat SDKs available, so be prepared for a fair amount of trial and error while adapting Ably to your chat use case.

3. Firebase

Firebase, the Swiss Army knife of app development platforms, offers many cloud-hosted services and components, and Firebase Realtime Database can be used to develop the core of a custom chat feature. Keep in mind that Firebase does not have any dedicated chat SDKs available.

On top of Firebase Realtime Database to sync data between devices, you would also need Firebase Authentication to allow users to sign in and store binary files in Cloud Storage for Firebase. You would have to design everything from the ground up. Google defines several codelabs, like this Android version, recommending Firebase Realtime Database.

But here's the kicker: Google also recommends against using Firebase Realtime Database when true scalability is a concern, making it suitable for applications with simple data models requiring simple lookups and low-latency synchronization with limited scalability. So beware, there are limits with Firebase Realtime Database, and these are hard limits. You will see error responses when hitting peak performance with Firebase Realtime Database.

Firebase as a whole is a marvel to behold. Yet, like a financial tightrope walker navigating the perils of Amazon Web Services, tread cautiously regarding cost control. It's all too easy to watch your monthly bill skyrocket faster than a SpaceX launch because you cannot set a cost cap or limit its scale, and it's very difficult to get in touch with someone from the correct team to state your case in the event you see a bigger user influx than expected. Apps with live event or live streaming use cases should proceed cautiously, as you could wind up spending a fortune in one fell swoop for little ROI. 

Stream Chat

Finally, we've arrived at Stream, a platform dedicated to bringing dynamic and engaging communication experiences to end users with ease. Stream offers flexible, performant, and easily customizable APIs and SDKs for chat, activity feeds, moderation, and video and audio functionality.

Stream uses real-time infrastructure under the hood to power our Chat API. However, unlike the other providers mentioned above, our reusable components are specifically optimized for chat --- from our infrastructure to our SDKs. Our robust solutions and their UI kits provide all of the features end users expect out of the box on top of an already stellar RTI, allowing developers to focus on the core competencies of what sets their app apart. Stream also offers a world-class developer experience, making implementation, maintenance, and iterations a breeze.

Comparing Pricing

To level the playing field, we have used two usage examples: 10,000 and 50,000 monthly active users (MAU). Keep in mind:

  • Each user sends an average of seven or fewer messages each day.
    • 10k users send 2 million messages a month.
    • 50k users send 10 million messages a month.
  • We assume 5% of your users will be active at the same time.
    • 10k total users would require 500 concurrent connections.
    • 50k total users would require 2500 concurrent connections. 
  • A message, on average, is 200 characters long. With boilerplate included, each message is half a kilobyte, and 5% of all messages include an attachment of, on average, two megabytes. Let's assume that each monthly active user downloads 20% of all attachments and 20% of all messages.
    • 10k users would require:
      • An additional one gigabyte in message-related data for each month.
      • About one and a half gigabytes of attachment-related data each month.
    • 50k users would require:
      • An additional five gigabytes in message-related data for each month.
      • About seven and a half gigabytes of attachment-related data each month.
      • These numbers grow month over month.
Provider10k MAU50k MAUNotes
Ably$20$100Does not include message retention or attachment handling.
PubNub$499$2,499Does not include message retention or attachment handling.

Developer Experience 

When evaluating the developer experience of our contenders, our criteria revolve around three things:

  1. Cognitive load
  2. Feedback loops
  3. Flow state

Using an API that results in a lower cognitive load, quick feedback loops, and a sense of control and frequent victories results in the best experience developers can have.

Cognitive load can be controlled by ensuring clear boundaries and abstractions are in place in an implementation. Doing so allows a developer to dive into the deep end of the pool when needed in specific areas; the rest of the time, they should be shielded from complexity. It is important the implementation behaves as expected in all cases. Unexpected behavior results in mistrusting an implementation, a higher cognitive load, and an incurable headache.

The same rings true for feedback loops; quick feedback loops allow a developer to have an easier time trusting an implementation because it will tell you something is wrong right away. You get bonus points for telling what is actually wrong and how to fix things. Also, quick feedback loops allow a developer to keep flowing while implementing features.

Firebase, Ably, and PubNub allow you massive control over the implementation, and you are also in control of the developer experience of whatever you create. But remember, this is because you have to write large quantities of code to get to some level of feature parity with more full-featured chat implementations. You are pretty much going to be building everything yourself beyond the base infrastructure provided by these lower-level providers. Again, with great power comes great responsibility (and great room to create bugs as you go).

Stream offers many SDKs with which you can build any chat-related use case you can imagine. As mentioned before, Stream does this while handling your cognitive load, giving you quick feedback and allowing you to keep your flow state. A clear example is the customization options available in the Stream Chat SDKs. You can drop in the provided components like a hot potato or style and theme them in various ways. Even if a specific component does not conform to your requirements, you can replace parts of Stream's UI components with your implementations.

Weighing Feature Trade-Offs

Every rose has its thorn. But pricing is not the only serious factor to consider when deciding how to build your chat feature. Or, actually, maybe it is? After all, time is money, and you might be spending a lot of your time building out a fully loaded chat experience.

You'll want to ensure your chat has everything users expect — and as you might have heard, everything is quite a lot. Of the providers listed in this article, only one has prebuilt components available: Stream.

You must build every little detail yourself when using PubNub, Ably, or Firebase; think, typing indicators, attachment handling, emoji support, and animated GIF support, to name a few. These features will require serious time, effort, and product roadmap real estate to build yourself.

Is it possible to build expected user-facing features yourself from scratch? Yes. But should you?


So, dear developers, as you stand at the crossroads of your real-time infrastructure or API integration, consider your needs carefully.

Are you in need of lightning-fast data transmission and instant updates? Is your budget infinite? Is your time-to-market date 50 years or 50 days in the future? Do you crave the ease of use that APIs provide? If so, sign up for a free account to try Stream.

The choice is yours, but remember, there can only be one victor in the battle of RTI vs. API. Choose wisely.

Integrating Video with your App?
We've built a Video and Audio solution just for you. Check out our APIs and SDKs.
Learn more ->