Build low-latency Vision AI applications using our new open-source Vision AI SDK. ⭐️ on GitHub ->

Engineering: AI

New

Vision Agents v0.2 Release

It's been just over a month since we released the first version of Vision Agents, our new open-source framework designed to help developers quickly build video AI applications using their favourite AI tool and Stream. Since the initial release, we've been hard at work adding new plugins, simplifying the code, and working with the community
Read more ->
2 min read
New

Why Real-Time Is the Missing Piece in Today's AI Agents

Thinking... Ruminating... Billowing... Wibbling... Cerebrating... These words invented by AI companies to mask processing are all very cute, but in reality, they're all just apologetic loading states. When ChatGPT shows "thinking" or Claude displays "ruminating," they're admitting their models aren't ready to interact with you yet. For text chat, a few seconds of delay feels
Read more ->
6 min read
New

Best 5 Frameworks To Build Multi-Agent AI Applications

This article aims to help you build AI agents powered by memory, knowledgebase, tools, and reasoning and chat with them using the command line and beautiful agent UIs. What is an Agent? Large language models (LLMs) can automate complex and sequential workflows and tasks. For example, you can use LLMs to build assistants that can
Read more ->
17 min read
New

DeepSeek R1 - The Best Local LLM Tools To Run Offline

Many people (especially developers) want to use the new DeepSeek R1 thinking model but are concerned about sending their data to DeepSeek. Read this article to learn how to use and run the DeepSeek R1 reasoning model locally and without the Internet or using a trusted hosting service. You run the model offline, so your
Read more ->
6 min read
New

The Rise of Multimodal AI Agents

A technician stands in front of a malfunctioning pump at a manufacturing plant. The pump is old, with scattered documentation, and the plant manager needs it running in two hours. The tech raises her phone, and the camera scans the nameplate. Her AI agent sparks to life, cross-references the pump model against the facility's asset
Read more ->
5 min read

Build Voice Agents With MCP: The Top 4 Frameworks and APIs

Voice AI technologies have recently become central to communication between customers, small businesses, and enterprises. To extend the capabilities of these systems, the Model Context Protocol (MCP) becomes a must-have. Utilizing MCP can enhance the capabilities of voice systems to ensure they provide users with satisfactory responses. Continue reading to discover the APIs, open-source frameworks,
Read more ->
12 min read

Open Vision Agents by Stream: Open Source SDK for Building Low-Latency Vision AI Apps

Vision Agents is a new, open-source framework from Stream that helps developers quickly build low-latency vision AI applications. The project is completely open-source and ships with over ten out-of-the-box integrations, including day one support for leading real-time voice and video models like OpenAI Realtime and Gemini Live. Text-to-speech, speech-to-text, and speech-to-speech models are also natively
Read more ->
4 min read

Top 5 Real-Time Speech-to-Speech APIs and Libraries To Build Voice Agents

There are two ways to build conversational voice agents for enterprise and production use cases. Developers can use a real-time API or speech-to-speech (STS), that takes audio input from a user and sends it to a large language model (LLM) to return a voice output. Or they can use a turn-based architecture, which consists of
Read more ->
13 min read