Build low-latency Vision AI applications using our new open-source Vision AI SDK. ⭐️ on GitHub ->

Stream Blog

Open Vision Agents by Stream: Open Source SDK for Building Low-Latency Vision AI Apps

Vision Agents is a new, open-source framework from Stream that helps developers quickly build low-latency vision AI applications. The project is completely open-source and ships with over ten out-of-the-box integrations, including day one support for leading real-time voice and video models like OpenAI Realtime and Gemini Live. Text-to-speech, speech-to-text, and speech-to-speech models are also natively
Read more ->
4 min read

The 8 Best Platforms To Build Voice AI Agents

Voice assistants like Siri and Alexa are great for non-trivial everyday personal assistive tasks. However, they are limited in providing accurate answers to complex questions, real-time information, handling turns, and user interruptions. Get started! Activate your free Stream account today and start prototyping your own voice AI agent! Try asking Siri about the best things
Read more ->
13 min

The 6 Best LLM Tools To Run Models Locally

Running large language models (LLMs) like DeepSeek Chat, ChatGPT, and Claude usually involves sending data to servers managed by DeepSeek, OpenAI, and other AI model providers. While these services are secure, some businesses prefer to keep their data offline for greater privacy. Get started! Activate your free Stream account today and start prototyping with the
Read more ->
12 min

Using Stream to Build a Livestream Chat App in Next.js

I always wondered how to create the dynamic chat experience of livestreams, like those found on YouTube, but with an added convenience of allowing anyone to participate without logging in. Get started! Activate your free Stream account today and start prototyping livestream video. With Next.js and Stream, I was able to successfully create that experience.
Read more ->
8 min

Build a Vision AI Agent with Gemini 3 in < 3 Minutes

We released support for Google’s new Gemini 3 models inside Vision Agents — the open-source Python framework for building real-time voice and video AI applications. In this 3-minute video demo, you’ll see how to spin up a fully functional vision-enabled voice agent that can see your screen (or webcam), reason with Gemini 3 Pro Preview,

Read more ->
2 min read

Build an Electronics Setup & Repair Assistant Using Baseten and Qwen3-VL

This tutorial demonstrates how to build an electronic device setup and repair assistant in Python with voice capabilities using Qwen3-VL hosted on Baseten. The assistant analyzes what a user shows on camera (like cables, ports, device components, or error states) and guides them step-by-step through setup or repair tasks. It’s designed to reduce confusion during

Read more ->
8 min read
New

Staying Competitive in a Rapid-Fire AI Landscape

Velocity is one of those words that shows up in every leadership deck and every product kickoff. But in practice, it behaves more like bubbles escaping a can of La Croix. The moment you try to hold onto it, it’s gone. What remains is a backlog that looks less like a roadmap and more like

Read more ->
3 min read
New

What is MCP: The Infrastructure Powering Agentic AI

You might not be using AI agents yet, but you will soon. They’ll schedule your meetings, analyze your data, write your code, and automate your workflows. However, to accomplish any of this, they need to access your calendar, data, code, and systems. Large language models can do this the same way any software talks to

Read more ->
10 min read

Vision Agents v0.2 Release

It’s been just over a month since we released the first version of Vision Agents, our new open-source framework designed to help developers quickly build video AI applications using their favourite AI tool and Stream. Since the initial release, we’ve been hard at work adding new plugins, simplifying the code, and working with the community

Read more ->
2 min read

Why Real-Time Is the Missing Piece in Today’s AI Agents

Thinking… Ruminating… Billowing… Wibbling… Cerebrating… These words invented by AI companies to mask processing are all very cute, but in reality, they’re all just apologetic loading states. When ChatGPT shows "thinking" or Claude displays "ruminating," they’re admitting their models aren’t ready to interact with you yet. For text chat, a few seconds of delay feels

Read more ->
6 min read

The 10 Essential Tools of the Modern Chat Moderation Stack

There was a time when the only weapon in the chat moderator’s arsenal was a simple keyword list. You would be adding new words and phrases to your filters as they came up, always in reactive mode. Maybe you have regexes to help. Perhaps you build out a team. But you’re always chasing the latest

Read more ->
9 min read

Best 5 Frameworks To Build Multi-Agent AI Applications

This article aims to help you build AI agents powered by memory, knowledgebase, tools, and reasoning and chat with them using the command line and beautiful agent UIs. What is an Agent? Large language models (LLMs) can automate complex and sequential workflows and tasks. For example, you can use LLMs to build assistants that can

Read more ->
17 min read