Build multi-modal AI applications using our new open-source Vision AI SDK.

Tutorials

Build a Stream Chat App With Xcode 26.3's Coding Agent

Xcode 26.3 ships with a built-in coding agent backed by Claude. Pair it with Stream's Agent Skills and you can build a working SwiftUI chat app from a single goal-level prompt - channel list, message thread, and composer - then iterate without leaving the IDE. This is a build log of exactly that, including the WebSocket race condition the agent diagnosed by reading the SDK source.

Read more
10 min read

How to Build a Background Removal Tool with Segment Anything & Vision Agents

A step-by-step guide to building a real-time background removal tool with SAM 2, YOLO, and Vision Agents. Runs on a CPU, no GPU required.

Read more
22 min read

Stream Skills: Build a Marketplace App To Buy, Sell, and Shop Online

Let's build an online marketplace platform that enables safe, secure buying and selling, combining Stream's AI agent skills for chat, activity feed, moderation, and video into a unified product. What You Can Build Agent Skills improve developers' productivity and help them integrate features more quickly and build complex applications from scratch. Stream now has skills

Read more
14 min read

Using AI Agent Skills: Build an iOS Chat Messaging App With a Single Prompt

As developers, we typically spend time reading docs and tutorials, and watching YouTube videos to integrate APIs and SDKs to add specific functionality to apps and services. These integrations can now be completed much more quickly using AI Agent Skills. Agent Skills are sets of instructions, scripts, and reference documents that equip AI models to

Read more
16 min read

Gemini Live API & Lyria 3: Generate Music From Text, Phone & Video Calls

The instrumental background music in the video below is AI-generated using Lyria 3 by Google DeepMind. Lyria 3 allows anyone to generate AI music from text and image prompts. The music demos in this article take it further by adding another input prompt modality, your voice. Let's proceed to generate your first music with Lyria

Read more
17 min read

How to Clone Any Voice in Minutes Using Voxtral TTS

What You Will Build This tutorial demonstrates how to build an AI speech app with in-app voice cloning support. You can clone your favorite voice by supplying a reference audio of about 3 seconds. Here is a demo. Voice cloning example demonstrating reference and output voices Voice cloning example demonstrating reference and agent's output voices

Read more
11 min read

How To Design AI Voices in Minutes Using Qwen3-TTS

Before You Start To begin, ensure that you meet these requirements and have the following credentials. Python 3.13 or a later version. An Apple Silicon Mac (recommended) or any modern laptop. Stream API credentials (for realtime audio and video communication). A HuggingFace Account and access token (HF_TOKEN). A Deepgram API key (for speech-to-text). A Google

Read more
14 min read

HIPAA-Compliant Chat: How to Build Secure Messaging for Telemedicine Apps

TL;DR: Any chat message pairing a patient identifier with health information is PHI, making your entire messaging infrastructure a HIPAA compliance surface. HIPAA's three rules translate into concrete engineering requirements: role-based access, encryption, tamper-evident audit logs, and PHI-safe push notifications. Building compliant chat from scratch needs 2-3 dedicated engineers; a HIPAA-eligible API like Stream compresses

Read more
21 min read