Build low-latency Vision AI applications using our new open-source Vision AI SDK. ⭐️ on GitHub ->

Stream Blog

Open Vision Agents by Stream: Open Source SDK for Building Low-Latency Vision AI Apps

Vision Agents is a new, open-source framework from Stream that helps developers quickly build low-latency vision AI applications. The project is completely open-source and ships with over ten out-of-the-box integrations, including day one support for leading real-time voice and video models like OpenAI Realtime and Gemini Live. Text-to-speech, speech-to-text, and speech-to-speech models are also natively
Read more ->
4 min read

The 8 Best Platforms To Build Voice AI Agents

Voice assistants like Siri and Alexa are great for non-trivial everyday personal assistive tasks. However, they are limited in providing accurate answers to complex questions, real-time information, handling turns, and user interruptions. Get started! Activate your free Stream account today and start prototyping your own voice AI agent! Try asking Siri about the best things
Read more ->
13 min

The 6 Best LLM Tools To Run Models Locally

Running large language models (LLMs) like DeepSeek Chat, ChatGPT, and Claude usually involves sending data to servers managed by DeepSeek, OpenAI, and other AI model providers. While these services are secure, some businesses prefer to keep their data offline for greater privacy. Get started! Activate your free Stream account today and start prototyping with the
Read more ->
12 min

Using Stream to Build a Livestream Chat App in Next.js

I always wondered how to create the dynamic chat experience of livestreams, like those found on YouTube, but with an added convenience of allowing anyone to participate without logging in. Get started! Activate your free Stream account today and start prototyping livestream video. With Next.js and Stream, I was able to successfully create that experience.
Read more ->
8 min

Build a Voice-Controlled GitHub Agent in Python (MCP + Vision Agents)

Turn any GitHub repo into a voice assistant: ask about branches, open issues, create pull requests, list contributors—all via natural conversation.  Powered by OpenAI’s Realtime API for low-latency voice, GitHub’s Model Context Protocol (MCP) for secure repo actions, and Vision Agents for seamless orchestration. In the demo, the agent understands spoken repo names (even when

Read more ->
4 min read

Content Moderation Circumvention: Algospeak, Obfuscation, and Adversarial Tactics

As online platforms strengthen their safety frameworks, malicious users respond with increasingly creative ways to evade detection. The rise of content moderation circumvention is not a surprise. Modern apps support global conversations at scale, and as moderation becomes more effective, the incentive to outsmart it grows. But circumvention isn’t increasing solely because moderation is improving.

Read more ->
7 min read

The Moderation Metrics Every Trust & Safety Team Should Track

Trust and safety teams sit at the intersection of user experience, legal risk, and community health. Yet many teams still struggle to answer basic performance questions like: Is our moderation platform actually catching harmful content? Are we overblocking and frustrating good users? Are our tools paying off in time and cost savings? The only way

Read more ->
8 min read

Build a Drive-Thru Voice AI Ordering System With Gemini Live Speech-to-Speech

Drive-thru ordering is a deceptively hard real-time problem. Background noise, interruptions, fast-paced conversations, and the need for low-latency responses all push traditional voice systems to their limits. Modern speech-to-speech models change that equation by making natural, interruptible conversations possible without stitching together separate STT, LLM, and TTS pipelines. In this tutorial, you’ll create a real-time

Read more ->
9 min read

Seeing Like Gemini: Building Vision Applications with Google’s Multimodal Models

Google just dropped Gemini 3. The impression is it’s impressive, and not just with words. The coolest concepts making the rounds are the ones that showcase the fundamental trait of the Gemini family of models: multimodality. From its inception, the Gemini models have been built different. Unlike GPT-4o or Claude, which bolt vision encoders onto

Read more ->
11 min read

Build a Realtime Video Restyling Agent with Gemini 3 + Decart AI

Google’s Gemini 3, released November 18, 2025, gives you multimodal reasoning and tool-use for building response-accurate AI applications. Let’s combine it with Decart AI and other leading LLM services to turn casual voice commands into artistic live video style changes, no extra scaffolding required. Pair it with Decart AI’s Mirage LSD, the first live-stream diffusion

Read more ->
4 min read

The Future of Content Moderation: Key Trends Shaping 2026 & Beyond

Content moderation is at a critical juncture. The amount of user-generated content has exploded across chat, activity feeds, gaming environments, livestreams, and marketplaces. Every interaction has become an opportunity for connection, but also an opportunity for harm. At the same time, regulation is increasing worldwide, and user expectations for safety are higher than ever. Trust

Read more ->
6 min read

Build an AI Math & Physics Agent with DeepSeek v3.2

DeepSeek recently released a powerful new model, DeepSeek-V3.2, that’s now instantly accessible via OpenRouter. In under 5 minutes, you can turn it into a real-time, voice-enabled math and physics agent that not only solves problems but also explains its reasoning out loud. DeepSeek’s latest open-source reasoning and agent-AI model, V3.2, leverages the new DeepSeek Sparse

Read more ->
4 min read