Build multi-modal AI applications using our new open-source Vision AI SDK .

Stream Blog

Vision Agents v0.5.0 Release: Local Hardware I/O, Anam Avatars, and Faster Deepgram TTS

It's been a busy period since our last release, and now it’s time to share Vision Agents v0.5.0 — a step toward making production-grade multimodal AI agents easy to build and deploy. While previous versions laid the groundwork for real-time voice, video, and Vision Agents, v0.5.0 focuses on stability at scale and even more expressive
Read more
4 min read

Stream’s AI Moderation Roadmap: What We’re Building Next

Moderation has quietly become one of the hardest problems in modern apps. As chat, feeds, and real-time video interactions expand globally, the challenge isn’t just catching bad content; it’s doing it in real time, across languages, with context, and at scale. At Stream, we’ve been investing deeply in solving that problem. This roadmap is a
Read more
4 min

How to Build an App Like TikTok Shop (+ Turn Livestreams into Revenue)

Livestream shopping is changing how people discover and buy products online by combining real-time video with instant purchasing. Platforms like TikTok have popularised this model, enabling creators and brands to showcase products live while viewers shop without leaving the stream. In this tutorial, you’ll learn how to build a TikTok-style livestream shopping application using Next.js.
Read more
19 min

The 8 Best Platforms To Build Voice AI Agents

Voice assistants like Siri and Alexa are great for non-trivial everyday personal assistive tasks. However, they are limited in providing accurate answers to complex questions, real-time information, handling turns, and user interruptions. Get started! Activate your free Stream account today and start prototyping your own voice AI agent! Try asking Siri about the best things
Read more
13 min

How To Design AI Voices in Minutes Using Qwen3-TTS

Before You Start To begin, ensure that you meet these requirements and have the following credentials. Python 3.13 or a later version. An Apple Silicon Mac (recommended) or any modern laptop. Stream API credentials (for realtime audio and video communication). A HuggingFace Account and access token (HF_TOKEN). A Deepgram API key (for speech-to-text). A Google

Read more
11 min read

Shipping WebRTC Video From a $10 Microcontroller: Challenges Building the Stream Video ESP32 SDK

We recently open-sourced the Stream Video ESP32 SDK — an SDK that lets an ESP32-S3 or ESP32-P4 join a Stream Video call, capture camera and microphone input, encode H.264 + Opus in real-time, and publish it over WebRTC. Someone on a browser or mobile device can then see and hear the ESP32 live. If you’re

Read more
15 min read

Where LLM Training Data Comes From (And Why It Matters)

Everyone talks about models. New architectures, larger parameter counts, faster inference—those tend to dominate the conversation. But if you’re actually building AI systems (or evaluating vendors), you quickly realize something else matters more: The data. Not just how much of it you have, but where it comes from, how it’s processed, and how it evolves

Read more
4 min read

HIPAA-Compliant Chat: How to Build Secure Messaging for Telemedicine Apps

Every message in a healthcare chat contains protected health information the moment a clinician pairs a patient name with a diagnosis, lab result, or appointment. That makes your chat infrastructure a HIPAA compliance surface, whether you designed it to be one or not. This guide covers what HIPAA actually requires of a messaging system, how

Read more
15 min read

The 6 Best On-Device TTS Models for Voice AI

When building voice AI applications, you have industry-leading cloud options for text-to-speech, such as Cartesia Sonic 3 and Grok TTS. For privacy and to avoid sharing your business’s data with these commercial text-to-speech (TTS) providers, your team may want to use free, open-source solutions that run locally on mobile and desktop devices. Continue reading to

Read more
21 min read

Vision Agents v0.5.0 Release: Local Hardware I/O, Anam Avatars, and Faster Deepgram TTS

It’s been a busy period since our last release, and now it’s time to share Vision Agents v0.5.0 — a step toward making production-grade multimodal AI agents easy to build and deploy. While previous versions laid the groundwork for real-time voice, video, and Vision Agents, v0.5.0 focuses on stability at scale and even more expressive

Read more
4 min read

Stream’s AI Moderation Roadmap: What We’re Building Next

Moderation has quietly become one of the hardest problems in modern apps. As chat, feeds, and real-time video interactions expand globally, the challenge isn’t just catching bad content; it’s doing it in real time, across languages, with context, and at scale. At Stream, we’ve been investing deeply in solving that problem. This roadmap is a

Read more
4 min read

Scaling Event-Driven Systems Without Compromising Mobile App Stability

Event-driven architecture is nothing new. IBM MQ shipped in 1993. JMS has been around since 1998. Kafka launched in 2011. But for most of that history, event-driven patterns were for specialized domains. Most developers never touched them. That’s changed. Real-time mobile features, such as chat, activity feeds, live collaboration, or presence indicators, have pushed event-driven

Read more
17 min read