Most LLMs are great at thinking, but making them speak naturally is a different challenge. Gemini 3.1 Pro changes that.
This new model from Google brings significantly improved reasoning, longer context, and better tool-use capabilities, making it one of the best choices (at the time of writing) for building conversational voice agents.
In this guide, we’ll use Gemini 3.1 Pro (both the standard preview and custom tools variant) as the brain for a real-time voice AI agent and travel advisor built with Vision AI Agents.
What You’ll Build
- A real-time voice AI agent that tells a charming story on command and gives travel advice about Europe
- Natural, coherent spoken responses with strong reasoning and storytelling ability
- Support for both Gemini 3.1 Pro Preview and the Custom Tools variant
- Low-latency voice pipeline using Vision Agents and Stream
The Stack
- LLM → Gemini 3.1 Pro (via Vision Agents Gemini plugin)
- TTS → ElevenLabs
- STT → Deepgram
- Turn Detection → Smart-Turn
- Transport → Stream WebRTC
- Framework → Vision Agents (open-source)
Requirements
- Google API key (from Google AI Studio)
- ElevenLabs API key
- Deepgram API key
- Stream API key & secret
Step 1: Install the Plugins
uv add vision-agents
uv add "vision-agents[getstream, gemini, deepgram, elevenlabs, smart-turn]"
Step 2: Full Working Code (main.py)
1234567891011121314151617181920212223from vision_agents.core import Agent, Runner, User from vision_agents.core.agents import AgentLauncher from vision_agents.plugins import deepgram, gemini, getstream, elevenlabs async def create_agent(**kwargs) -> Agent: return Agent( edge=getstream.Edge(), agent_user=User(name="Assistant", id="agent"), instructions="You're a helpful voice/vision AI assistant powered by Gemini 3.1 Pro. Keep replies short and conversational. Be concise and to the point. Always describe what you see in the user's video camera feed.", stt=deepgram.STT(eager_turn_detection=True), tts=elevenlabs.TTS(), llm=gemini.LLM("gemini-3.1-pro-preview"), ) async def join_call(agent: Agent, call_type: str, call_id: str, **kwargs) -> None: await agent.create_user() call = await agent.create_call(call_type, call_id) async with agent.join(call): await agent.simple_response("Greet the user") await agent.finish() if __name__ == "__main__": Runner(AgentLauncher(create_agent=create_agent, join_call=join_call)).cli()
Step 3: Run It
12345678export GEMINI_API_KEY=... export ELEVENLABS_API_KEY=... export DEEPGRAM_API_KEY=... export STREAM_API_KEY=... export STREAM_API_SECRET=... EXAMPLE_BASE_URL=https://demo.visionagents.ai uv run main.py run
Join the call and ask the agent to tell a story or give travel advice. You’ll hear Gemini 3.1 Pro respond clearly and naturally.
Why We Love This Setup
Gemini 3.1 Pro brings noticeably better reasoning and coherence to voice agents compared to previous versions.
The Vision Agents Gemini plugin makes switching between the standard preview and custom tools variant extremely easy; just change the model name.
You get strong storytelling, advice-giving, and conversational ability with minimal code.
Links & Resources
Try asking the agent for travel recommendations in Europe, and test both Gemini 3.1 Pro variants to see which one feels more natural in conversation. ✈️