Build low-latency Vision AI applications using our new open-source Vision AI SDK. ⭐️ on GitHub
Multi-modal AI agents that see, hear, & remember.
Open-source. Edge-agnostic. Low-latency.
An open source framework and developer platform for building, testing, deploying, scaling, and observing agents in production.
Customer support bots, phone assistants, and voice interfaces using OpenAI Realtime, Gemini, or STT + LLM + TTS pipelines.
Sports coaching, surveillance, & manufacturing workflows. Combine YOLO, Roboflow, or Moondream with Gemini or OpenAI vision.
Inbound and outbound calling via Twilio. Build phone bots with RAG-powered knowledge bases.
Real-time interactive avatars with HeyGen or video style transfer with Decart.
Join our
Building models, tools, or platforms that work with real-time voice or video AI?
We’re actively adding first-party integrations, co-building, and co-marketing with partners.



Track pose, give feedback in real-time. Uses Gemini Live + Ultralytics YOLO.

Build an agent to greet users, perform basic MCP functions, and observe a camera feed.
Community & Open Source
Follow Stream on X, star the Vision Agents GitHub repo, and join the discussion on Discord to try demos, share feedback, and contribute.
