Stream Blog
Open Vision Agents by Stream: Open Source SDK for Building Low-Latency Vision AI Apps
The 8 Best Platforms To Build Voice AI Agents
The 6 Best LLM Tools To Run Models Locally
Using Stream to Build a Livestream Chat App in Next.js
Build a Voice-Controlled GitHub Agent in Python (MCP + Vision Agents)
Turn any GitHub repo into a voice assistant: ask about branches, open issues, create pull requests, list contributors—all via natural conversation. Powered by OpenAI’s Realtime API for low-latency voice, GitHub’s Model Context Protocol (MCP) for secure repo actions, and Vision Agents for seamless orchestration. In the demo, the agent understands spoken repo names (even when
Content Moderation Circumvention: Algospeak, Obfuscation, and Adversarial Tactics
As online platforms strengthen their safety frameworks, malicious users respond with increasingly creative ways to evade detection. The rise of content moderation circumvention is not a surprise. Modern apps support global conversations at scale, and as moderation becomes more effective, the incentive to outsmart it grows. But circumvention isn’t increasing solely because moderation is improving.
The Moderation Metrics Every Trust & Safety Team Should Track
Trust and safety teams sit at the intersection of user experience, legal risk, and community health. Yet many teams still struggle to answer basic performance questions like: Is our moderation platform actually catching harmful content? Are we overblocking and frustrating good users? Are our tools paying off in time and cost savings? The only way
Build a Drive-Thru Voice AI Ordering System With Gemini Live Speech-to-Speech
Drive-thru ordering is a deceptively hard real-time problem. Background noise, interruptions, fast-paced conversations, and the need for low-latency responses all push traditional voice systems to their limits. Modern speech-to-speech models change that equation by making natural, interruptible conversations possible without stitching together separate STT, LLM, and TTS pipelines. In this tutorial, you’ll create a real-time
Seeing Like Gemini: Building Vision Applications with Google’s Multimodal Models
Google just dropped Gemini 3. The impression is it’s impressive, and not just with words. The coolest concepts making the rounds are the ones that showcase the fundamental trait of the Gemini family of models: multimodality. From its inception, the Gemini models have been built different. Unlike GPT-4o or Claude, which bolt vision encoders onto
Build a Realtime Video Restyling Agent with Gemini 3 + Decart AI
Google’s Gemini 3, released November 18, 2025, gives you multimodal reasoning and tool-use for building response-accurate AI applications. Let’s combine it with Decart AI and other leading LLM services to turn casual voice commands into artistic live video style changes, no extra scaffolding required. Pair it with Decart AI’s Mirage LSD, the first live-stream diffusion
The Future of Content Moderation: Key Trends Shaping 2026 & Beyond
Content moderation is at a critical juncture. The amount of user-generated content has exploded across chat, activity feeds, gaming environments, livestreams, and marketplaces. Every interaction has become an opportunity for connection, but also an opportunity for harm. At the same time, regulation is increasing worldwide, and user expectations for safety are higher than ever. Trust
Build an AI Math & Physics Agent with DeepSeek v3.2
DeepSeek recently released a powerful new model, DeepSeek-V3.2, that’s now instantly accessible via OpenRouter. In under 5 minutes, you can turn it into a real-time, voice-enabled math and physics agent that not only solves problems but also explains its reasoning out loud. DeepSeek’s latest open-source reasoning and agent-AI model, V3.2, leverages the new DeepSeek Sparse
