Build low-latency Vision AI applications using our new open-source Vision AI SDK. ⭐️ on GitHub ->

Build a Realtime Video Restyling Agent with Gemini 3 + Decart AI

New
4 min read
Amos G.
Amos G.
Published December 16, 2025

Google's Gemini 3, released November 18, 2025, gives you multimodal reasoning and tool-use for building response-accurate AI applications. Let's combine it with Decart AI and other leading LLM services to turn casual voice commands into artistic live video style changes, no extra scaffolding required.

Pair it with Decart AI's Mirage LSD, the first live-stream diffusion model for zero-latency video restyling at 24 FPS and <40ms per frame, and you can build an agent that instantly applies artistic styles (Neon nostalgia, Studio Ghibli, Cyberpunk) to your camera feed based on voice prompts.

Combining the two with speech recognition (STT) and speech synthesis (TTS) models, you can spin up a real-time demo that turns your webcam into an infinite, temporally coherent art generator in under five minutes.

In this demo, the agent restyles the live camera feed from "Neon Nostalgia" to "Studio Ghibli" to "War Zone" in response to voice commands, all with seamless, real-time transitions and no lag.

Here's exactly how to build the same agent yourself. You may also watch this step-by-step YouTube tutorial to create the demo in under 9 minutes. 

What You'll Build

Diagram showing the build of a realtime restyling agent with Gemini 3 and Decart AI

In just a few minutes, create a real-time video restyling agent that transforms your camera feed into artistic styles via voice prompts.

The stack:

  • Powered by Gemini 3 Pro (via Google API) for prompt understanding and agentic control

  • Video processing → Decart AI (Mirage LSD for zero-latency restyling)

  • Speech-to-text (STT) → DeepGram

  • Text-to-speech (TTS) → ElevenLabs

  • Real-time audio/video transport → Stream

  • Built with the open-source Vision Agents framework

Requirements (API Keys)

You'll need API keys from:

Step 1: Set Up Python the Project

bash
1
2
3
4
5
6
7
8
9
10
11
12
13
# Initialize a Python project uv init realtime-video-restyling cd realtime-video-restyling # Activate your environment uv venv && source .venv/bin/activate # Install Vision Agents and required plugins uv add vision-agents uv add "vision-agents[getstream, gemini, elevenlabs, deepgram]" # Install Decart AI with uv and pip uv pip install vision-agents-plugins-decart

Step 2: Full Working Code (main.py)

In the root of your generated uv project, substitute the content of main.py with the following sample code listing.

python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
import logging from dotenv import load_dotenv from vision_agents.core import User, Agent, cli from vision_agents.core.agents import AgentLauncher from vision_agents.plugins import decart, getstream, gemini, elevenlabs, deepgram logger = logging.getLogger(__name__) load_dotenv() async def create_agent(**kwargs) -> Agent: processor = decart.RestylingProcessor( initial_prompt="Change the video style to a cute animated movie with vibrant colours", model="mirage_v2" ) llm = gemini.LLM(model="gemini-3-pro-preview") agent = Agent( edge=getstream.Edge(), agent_user=User(name="Story teller", id="agent"), instructions="You will use the Decart processor to change the style of the video and the user's background. ", llm=llm, tts=elevenlabs.TTS(voice_id="N2lVS1w4EtoT3dr4eOWO"), stt=deepgram.STT(), processors=[processor], ) @llm.register_function( description="This function changes the prompt of the Decart processor which in turn changes the style of the video and user's background" ) async def change_prompt(prompt: str) -> str: await processor.update_prompt(prompt) return f"Prompt changed to {prompt}" return agent async def join_call(agent: Agent, call_type: str, call_id: str, **kwargs) -> None: """Join the call and start the agent.""" # Ensure the agent user is created await agent.create_user() # Create a call call = await agent.create_call(call_type, call_id) logger.info("🤖 Starting Agent...") # Have the agent join the call/room with await agent.join(call): logger.info("Joining call") logger.info("LLM ready") await agent.finish() # Run till the call ends if __name__ == "__main__": cli(AgentLauncher(create_agent=create_agent, join_call=join_call))

Step 3: Run It

Execute the following commands in your Terminal to store the required API credentials and run the Python script. You may also add the API keys to a .env file in your project's root.

bash
1
2
3
4
5
6
7
8
9
export GOOGLE_API_KEY=your_key export DECART_API_KEY=your_key export ELEVENLABS_API_KEY=your_key export DEEPGRAM_API_KEY=your_key export STREAM_API_KEY=your_key export STREAM_API_SECRET=your_secret cd realtime-video-restyling uv run main.py

A browser tab opens with a video call interface that automatically joins you. You can now go ahead and allow camera/mic access, and say "Make my video Studio Ghibli" and watch your camera feed transform live!

Example interaction from the video:

You: "Make it Neon Nostalgia."
Agent: "OK, I've updated the video style to Neon Nostalgia."
You: "Make it a War Zone."
Agent: "OK, I've updated the video style to a War Zone."

What Makes This Stack So Powerful

This stack is one of the fastest ways for developers to ship a fully-featured, low-latency video AI agent, all in pure Python and under 100 lines.

Vision Agents with integrated voice AI models abstract away turn detection, streaming, and interruption handling. Google's Gemini 3 brings agentic reasoning for prompt interpretation; and Decart's production-proven API delivers <40ms restyling without coherence loss.

It's open-source, local-first (except API calls), and scalable from prototype to production.

Give it a spin and see what wild style you like best. Maybe... post-apocalyptic Paris or Van Gogh's starry night? 🎨

Integrating Video With Your App?
We've built a Video and Audio solution just for you. Check out our APIs and SDKs.
Learn more ->