Build low-latency Vision AI applications using our new open-source Vision AI SDK. ⭐️ on GitHub

Real-Time Vision AI Agents

Multi-modal AI agents that see, hear, & remember.
Open-source. Edge-agnostic. Low-latency.

Vision Agents Playground

Join our

Partner Ecosystem

Building models, tools, or platforms that work with real-time voice or video AI?

We’re actively adding first-party integrations, co-building, and co-marketing with partners.

  • Model providers (STT, TTS, LLM, STS etc.)
  • Competing video edge networks
  • Avatar, visual effect companies
  • Hosting, both AI and regular

See Vision Agents in Action

SIP & RAG

Connect voice agents to a phone network with real-time knowledge retrieval.

Golf Coach

Track pose, give feedback in real-time. Uses Gemini Live + Ultralytics YOLO.

Voice Agent

Build an agent to greet users, perform basic MCP functions, and observe a camera feed.

Community & Open Source

Join the Community

Follow Stream on X, star the Vision Agents GitHub repo, and join the discussion on Discord to try demos, share feedback, and contribute.