Build multi-modal AI applications using our new open-source Vision AI SDK.

Stream Blog

Vision Agents v0.5.0 Release: Local Hardware I/O, Anam Avatars, and Faster Deepgram TTS

It's been a busy period since our last release, and now it's time to share Vision Agents v0.5.0: a step toward making production-grade multimodal AI agents easy to build and deploy. While previous versions laid the groundwork for real-time voice, video, and Vision Agents, v0.5.0 focuses on stability at scale and even more expressive

Read more
5 min read

Stream's AI Moderation Roadmap: What We're Building Next

Moderation has quietly become one of the hardest problems in modern apps. As chat, feeds, and real-time video interactions expand globally, the challenge isn't just catching bad content; it's doing it in real time, across languages, with context, and at scale. At Stream, we've been investing deeply in solving that problem. This roadmap is a

Read more
5 min

How to Build an App Like TikTok Shop (+ Turn Livestreams into Revenue)

Livestream shopping is changing how people discover and buy products online by combining real-time video with instant purchasing. Platforms like TikTok have popularised this model, enabling creators and brands to showcase products live while viewers shop without leaving the stream. In this tutorial, you'll learn how to build a TikTok-style livestream shopping application using Next.js.

Read more
26 min

The 8 Best Platforms To Build Voice AI Agents

Voice assistants like Siri and Alexa are great for non-trivial everyday personal assistive tasks. However, they are limited in providing accurate answers to complex questions, real-time information, handling turns, and user interruptions. Get started! Activate your free Stream account today and start prototyping your own voice AI agent! Try asking Siri about the best things

Read more
17 min

How Italy's Largest Marketplace Automated 99.5% of Moderation and Scaled Messaging Without Adding Headcount

Subito, Italy's largest re-commerce marketplace, moved buyer-seller messaging for 10M+ users onto Stream and now automates 99.5% of moderation, without standing up a chat infrastructure team. Here is how the build-vs-buy decision, migration, and trust and safety automation played out.

Read more
8 min read

How a Top-5 Gaming Publisher Migrated Off Community Sift

A top-5 mobile gaming publisher had six months to migrate 750M+ players off Community Sift with no drop in moderation quality across a dozen languages. Here is how the RFP, large-scale testing, on-site workshop, and shadow-testing migration to Stream AI Moderation actually played out.

Read more
12 min read

Stream Skills: Build a Marketplace App To Buy, Sell, and Shop Online

Let's build an online marketplace platform that enables safe, secure buying and selling, combining Stream's AI agent skills for chat, activity feed, moderation, and video into a unified product. What You Can Build Agent Skills improve developers' productivity and help them integrate features more quickly and build complex applications from scratch. Stream now has skills

Read more
14 min read

How Do I Choose Between Different Chat API Providers for a Chatbot?

Picking a chat infrastructure provider sounds like a feature comparison, but it's mostly a question of fit. Pricing, SDK depth, and AI capabilities matter, but they are downstream of the higher-order questions about what you're building and what trade-offs you can tolerate. Should I Build Chat In-House or Use a Provider? Building chat from scratch

Read more
9 min read

Using AI Agent Skills: Build an iOS Chat Messaging App With a Single Prompt

As developers, we typically spend time reading docs and tutorials, and watching YouTube videos to integrate APIs and SDKs to add specific functionality to apps and services. These integrations can now be completed much more quickly using AI Agent Skills. Agent Skills are sets of instructions, scripts, and reference documents that equip AI models to

Read more
16 min read

The End of the Orb: Building AI Agents That Feel Present

TLDR: Agents these days are blind and not very engaging, so we decided to team up with Anam and Inworld to build an agent using Vision Agents that feels personal and aware of the world around you. Give it a try here. Most voice agents today are blind. They hear words, convert them to text,

Read more
12 min read

Gemini Live API & Lyria 3: Generate Music From Text, Phone & Video Calls

The instrumental background music in the video below is AI-generated using Lyria 3 by Google DeepMind. Lyria 3 allows anyone to generate AI music from text and image prompts. The music demos in this article take it further by adding another input prompt modality, your voice. Let's proceed to generate your first music with Lyria

Read more
17 min read

Chat Application Architecture, Explained

TLDR; Wide-column stores like Cassandra handle messages while Redis holds read state, because each subsystem's access patterns differ significantly. Presence alone generates a write on every connect, disconnect, and heartbeat, making it orders of magnitude more write-heavy than messages. End-to-end encryption prevents the server from searching, moderating, or generating push notification previews, which transport-plus-at-rest encryption

Read more
31 min read