Tutorials: Vision
Build a Vision AI Agent with Gemini 3 in < 3 Minutes
We released support for Google's new Gemini 3 models inside Vision Agents — the open-source Python framework for building real-time voice and video AI applications. In this 3-minute video demo, you'll see how to spin up a fully functional vision-enabled voice agent that can see your screen (or webcam), reason with Gemini 3 Pro Preview,
Read more ->
2 min read
Build an Electronics Setup & Repair Assistant Using Baseten and Qwen3-VL
This tutorial demonstrates how to build an electronic device setup and repair assistant in Python with voice capabilities using Qwen3-VL hosted on Baseten. The assistant analyzes what a user shows on camera (like cables, ports, device components, or error states) and guides them step-by-step through setup or repair tasks. It’s designed to reduce confusion during
Read more ->
8 min read
Build an AI Voice Yoga Instructor in Python
Large Language Models (LLMs) have been improving recently and are often used for building conversational applications for speech and transcription. From answering location-based questions to managing a work calendar, voice AI assistants are becoming an everyday part of both personal and professional life. In this tutorial, we’ll take those same technologies a step further, using
Read more ->
8 min read