Build low-latency Vision AI applications using our new open-source Vision AI SDK. ⭐️ on GitHub ->

Stream Blog

Open Vision Agents by Stream: Open Source SDK for Building Low-Latency Vision AI Apps

Vision Agents is a new, open-source framework from Stream that helps developers quickly build low-latency vision AI applications. The project is completely open-source and ships with over ten out-of-the-box integrations, including day one support for leading real-time voice and video models like OpenAI Realtime and Gemini Live. Text-to-speech, speech-to-text, and speech-to-speech models are also natively
Read more ->
4 min read

The 8 Best Platforms To Build Voice AI Agents

Voice assistants like Siri and Alexa are great for non-trivial everyday personal assistive tasks. However, they are limited in providing accurate answers to complex questions, real-time information, handling turns, and user interruptions. Get started! Activate your free Stream account today and start prototyping your own voice AI agent! Try asking Siri about the best things
Read more ->
13 min

The 6 Best LLM Tools To Run Models Locally

Running large language models (LLMs) like DeepSeek Chat, ChatGPT, and Claude usually involves sending data to servers managed by DeepSeek, OpenAI, and other AI model providers. While these services are secure, some businesses prefer to keep their data offline for greater privacy. Get started! Activate your free Stream account today and start prototyping with the
Read more ->
12 min

Using Stream to Build a Livestream Chat App in Next.js

I always wondered how to create the dynamic chat experience of livestreams, like those found on YouTube, but with an added convenience of allowing anyone to participate without logging in. Get started! Activate your free Stream account today and start prototyping livestream video. With Next.js and Stream, I was able to successfully create that experience.
Read more ->
8 min

The 10 Essential Tools of the Modern Chat Moderation Stack

There was a time when the only weapon in the chat moderator’s arsenal was a simple keyword list. You would be adding new words and phrases to your filters as they came up, always in reactive mode. Maybe you have regexes to help. Perhaps you build out a team. But you’re always chasing the latest

Read more ->
9 min read

Best 5 Frameworks To Build Multi-Agent AI Applications

This article aims to help you build AI agents powered by memory, knowledgebase, tools, and reasoning and chat with them using the command line and beautiful agent UIs. What is an Agent? Large language models (LLMs) can automate complex and sequential workflows and tasks. For example, you can use LLMs to build assistants that can

Read more ->
17 min read
New

The Rise of Multimodal AI Agents

A technician stands in front of a malfunctioning pump at a manufacturing plant. The pump is old, with scattered documentation, and the plant manager needs it running in two hours. The tech raises her phone, and the camera scans the nameplate. Her AI agent sparks to life, cross-references the pump model against the facility’s asset

Read more ->
5 min read

Build an AI Voice Yoga Instructor in Python

Large Language Models (LLMs) have been improving recently and are often used for building conversational applications for speech and transcription. From answering location-based questions to managing a work calendar, voice AI assistants are becoming an everyday part of both personal and professional life. In this tutorial, we’ll take those same technologies a step further, using

Read more ->
8 min read

How Low-Latency Video Streaming Works

These days, low-latency video streaming is so deeply embedded in current culture that, for those accustomed to TikTok, Twitch, YouTube, or even straightforward video chats, the idea that video could be anything other than millisecond-perfect seems ridiculous. Of course, all those platforms, and really the entire concept of video streaming and video chat, are only

Read more ->
10 min read

How We Tested Our Video SDK with TestDevLab

Building a Video SDK is an interesting engineering challenge, a geek’s dream of lower-level concurrency primitives, synchronization mechanisms, and intelligent throttling–all working in tight loops. Additionally, you’re negotiating codecs (H.264, VP8/9, AV1), managing lock-free queues to maintain frame flow, and performing bandwidth estimation to ensure video remains smooth and audio stays in sync. Video Challenges

Read more ->
8 min read

Build Voice Agents With MCP: The Top 4 Frameworks and APIs

Voice AI technologies have recently become central to communication between customers, small businesses, and enterprises. To extend the capabilities of these systems, the Model Context Protocol (MCP) becomes a must-have. Utilizing MCP can enhance the capabilities of voice systems to ensure they provide users with satisfactory responses. Continue reading to discover the APIs, open-source frameworks,

Read more ->
12 min read

Shipping Real-Time Therapy Conversations with Stream

A senior Android engineer at Argentina’s second-largest fintech company, Juan Andrade spent his days shipping features used by millions. But at night, he opened Xcode, taught himself SwiftUI from scratch, and coded until the early hours. His goal was to create a mental health companion that went beyond surface-level validation to provide evidence-based support. This

Read more ->
4 min read