Build low-latency Vision AI applications using our new open-source Vision AI SDK. ⭐️ on GitHub ->

Stream Blog

Open Vision Agents by Stream: Open Source SDK for Building Low-Latency Vision AI Apps

Vision Agents is a new, open-source framework from Stream that helps developers quickly build low-latency vision AI applications. The project is completely open-source and ships with over ten out-of-the-box integrations, including day one support for leading real-time voice and video models like OpenAI Realtime and Gemini Live. Text-to-speech, speech-to-text, and speech-to-speech models are also natively
Read more ->
4 min read

The 8 Best Platforms To Build Voice AI Agents

Voice assistants like Siri and Alexa are great for non-trivial everyday personal assistive tasks. However, they are limited in providing accurate answers to complex questions, real-time information, handling turns, and user interruptions. Get started! Activate your free Stream account today and start prototyping your own voice AI agent! Try asking Siri about the best things
Read more ->
13 min

The 6 Best LLM Tools To Run Models Locally

Running large language models (LLMs) like DeepSeek Chat, ChatGPT, and Claude usually involves sending data to servers managed by DeepSeek, OpenAI, and other AI model providers. While these services are secure, some businesses prefer to keep their data offline for greater privacy. Get started! Activate your free Stream account today and start prototyping with the
Read more ->
12 min

Using Stream to Build a Livestream Chat App in Next.js

I always wondered how to create the dynamic chat experience of livestreams, like those found on YouTube, but with an added convenience of allowing anyone to participate without logging in. Get started! Activate your free Stream account today and start prototyping livestream video. With Next.js and Stream, I was able to successfully create that experience.
Read more ->
8 min

How Text-to-Speech Works: Neural Models, Latency, and Deployment

Not long ago, text-to-speech (TTS) was a laughing stock. Robotic, obviously synthetic output that made customer service jokes write themselves and relegated TTS to accessibility contexts where users had no alternative. Now, you may have listened to text-to-speech today without even realizing. AI-generated podcasts, automated customer service calls, voice assistants that actually sound like assistants.

Read more ->
17 min read

Marketplace Content Moderation: How to Build Trust and Prevent Abuse at Scale

Marketplaces only work when people trust each other. Buyers trust that listings accurately represent what they’re purchasing. Sellers trust they won’t be scammed, harassed, or pushed off the platform by bad actors. And both trust that the marketplace itself is actively protecting them, not reacting after damage is already done. As marketplaces scale, maintaining that

Read more ->
9 min read

Edge-Optimized Speech Workflows: Combining Deepgram Nova-3 STT with Fish Speech V1.5 TTS

AI won’t stay online. It won’t stay on your laptop. It won’t stay centralized. It will move to every device and to the edge of every network, into your earbuds, your car, your factory floor, and your doorbell. This opens up a remarkable number of use cases. A fitness coach who listens continuously, counts your

Read more ->
15 min read

Building A2UI-Powered Interfaces with Stream Chat

A2UI (Agent-to-UI) is a protocol designed by Google to standardize how AI agents communicate with user interfaces. Instead of tightly coupling agents to specific frontends, A2UI defines a clear contract for intent, state, and actions – making it easier to build interactive, agent-driven experiences that are portable, composable, and UI-agnostic. As AI systems move from

Read more ->
9 min read

Scaling Activity Feeds to 100M Users: Stream’s Latest Benchmarks

Stream has reached a major milestone in activity feed infrastructure, successfully benchmarking over 37 million operations with a 10% write and 90% read workload distribution across a dataset of 100M users, 500M activities, and 200M follow relationships. Each scenario was tested at 500, 1,000, and 1,500 requests per second to measure performance under increasing load.

Read more ->
2 min read

Scaling WebRTC Video to 100,000 Participants: Stream’s Latest Video Benchmarks

Stream has reached a major milestone in real-time video infrastructure: Successfully scaling a single WebRTC-based livestream to 100,000 concurrent participants while maintaining ultra-low latency, stable frame rates, and zero packet loss. Today, Stream powers real-time chat, activity feeds, moderation, audio, and video for applications serving over one billion end users worldwide, backed by a 99.999%

Read more ->
2 min read

Visual Intelligence in Claude: Interpreting Documents and Structured Content

Claude isn’t the model most users turn to when needing visual capabilities. Rather than optimizing primarily for object detection or scene description, Claude processes visual content through the same reasoning architecture it uses for text. This design choice has significant implications for developers: Claude excels at tasks requiring interpretation and explanation rather than pure perception.

Read more ->
15 min read

How to Build a Local AI Voice Agent with Pocket TTS

Voice agents are getting better, but most text-to-speech pipelines still assume you’re okay with cloud APIs, large models, and unpredictable latency. If you want fast, natural-sounding speech that runs entirely on your own hardware (no GPU, no network calls), you need a different approach. In this tutorial, you’ll build a real-time AI voice agent that

Read more ->
9 min read