This tutorial teaches you how to quickly build a production-ready voice AI agent with OpenAI realtime using Stream’s video edge network, Flutter, and Node.
- The instructions to the agent are sent server-side (node) so you can do function calling or RAG
- The integration uses Stream’s video edge network (for low latency) and WebRTC (so it works under slow/unreliable network conditions)
- You have full control over the AI setup and visualization
The result will look something like this:
While this tutorial uses Node + Flutter you could achieve something similar with any other backend language + Stream SDK. (Swift, Kotlin, React, JS, Flutter, React Native, Unity etc)
Step 1 - Credentials and Backend setup
First, we are going to set up the Node.js backend and get Stream and OpenAI credentials.
Step 1.1 - OpenAI and Stream credentials
To get started, you need an OpenAI account and an API key. Please note that the OpenAI credentials will never be shared client-side and will only be exchanged between your and Stream servers.
Additionally, you will need a Stream account and use the API key and secret from the Stream dashboard.
Step 1.2 - Create the Node.js project
Make sure that you are using a recent version of Node.js such as 22 or later, you can check that with node -v
First, let’s create a new folder called “openai-audio-tutorial”. From the terminal, go to the folder, and run the following command:
1npm init -y
This command generates a package.json
file with default settings.
Step 1.3 - Installing the dependencies
Next, let’s update the generated package.json
with the following content:
12345678910111213141516{ "name": "@stream-io/video-ai-demo-server", "type": "module", "dependencies": { "@hono/node-server": "^1.13.8", "@stream-io/node-sdk": "^0.4.17", "@stream-io/openai-realtime-api": "^0.1.0", "dotenv": "^16.3.1", "hono": "^4.7.4", "open": "^10.1.0" }, "scripts": { "server": "node ./server.mjs", "standalone-ui": "node ./standalone.mjs" } }
Then, run the following command to install the dependencies:
1npm install
Step 1.4 - Setup the credentials
Create a .env
file in the project root with the following variables:
123456# Stream API credentials STREAM_API_KEY=REPLACE_WITH_API_KEY STREAM_API_SECRET=REPLACE_WITH_TOKEN # OpenAI API key OPENAI_API_KEY=your_openai_api_key
Then edit the .env
file with your actual API keys from Step 1.1.
Step 1.5 - Implement the standalone-ui script
Before diving into the Flutter integrations, we are going to build a simple server integration that will show how to connect to the AI agent to a call and connect to it with a simple web app.
Create a file called standalone.mjs
and paste this content
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657import { config } from 'dotenv'; import { StreamClient } from '@stream-io/node-sdk'; import open from 'open'; import crypto from 'crypto'; // load config from dotenv config(); async function main() { // Get environment variables const streamApiKey = process.env.STREAM_API_KEY; const streamApiSecret = process.env.STREAM_API_SECRET; const openAiApiKey = process.env.OPENAI_API_KEY; // Check if all required environment variables are set if (!streamApiKey || !streamApiSecret || !openAiApiKey) { console.error("Error: Missing required environment variables, make sure to have a .env file in the project root, check .env.example for reference"); process.exit(1); } const streamClient = new StreamClient(streamApiKey, streamApiSecret); const call = streamClient.video.call("default", crypto.randomUUID()); // realtimeClient is https://github.com/openai/openai-realtime-api-beta openai/openai-realtime-api-beta const realtimeClient = await streamClient.video.connectOpenAi({ call, openAiApiKey, agentUserId: "lucy", }); // Set up event handling, all events from openai realtime api are available here see: https://platform.openai.com/docs/api-reference/realtime-server-events realtimeClient.on('realtime.event', ({ time, source, event }) => { console.log(`got an event from OpenAI ${event.type}`); if (event.type === 'response.audio_transcript.done') { console.log(`got a transcript from OpenAI ${event.transcript}`); } }); realtimeClient.updateSession({ instructions: "You are a helpful assistant that can answer questions and help with tasks.", }); // Get token for the call const token = streamClient.generateUserToken({user_id:"theodore"}); // Construct the URL, TODO: replace this with const callUrl = `https://pronto.getstream.io/join/${call.id}?type=default&api_key=${streamClient.apiKey}&token=${token}&skip_lobby=true`; // Open the browser console.log(`Opening browser to join the call... ${callUrl}`); await open(callUrl); } main().catch(error => { console.error("Error:", error); process.exit(1); });
Step 1.6 - Running the sample
At this point, we can run the script with this command:
1npm run standalone-ui
This will open your browser and connect you to a call where you can talk to the OpenAI agent. As you talk to the agent, you will notice your shell will contain logs for each event that OpenAI is sending.
Let’s take a quick look at what it happening in the server-side code we just added:
- Here we instantiate Stream Node SDK with the API credentials and then use that to create a new call object. That call will be used to host the conversation between the user and the AI agent.
12const streamClient = new StreamClient(streamApiKey, streamApiSecret); const call = streamClient.video.call("default", crypto.randomUUID());
- The next step, is to have the Agent connect to the call and obtain a OpenAI Realtime API Client. The
connectOpenAi
function does the following things: it instantiate the Realtime API client and then uses Stream API to connect the agent to the call. The agent will connect to the call as a user with ID"lucy"
12345const realtimeClient = await streamClient.video.connectOpenAi({ call, openAiApiKey, agentUserId: "lucy", });
- We then use the realtimeClient object to pass instructions to OpenAI and to listen to events emitted by OpenAI. The interesting bit here is that
realtimeClient
is an instance of OpenAI’s official API client. This gives you full control of what you can do with OpenAI
12345678910realtimeClient.on('realtime.event', ({ time, source, event }) => { console.log(`got an event from OpenAI ${event.type}`); if (event.type === 'response.audio_transcript.done') { console.log(`got a transcript from OpenAI ${event.transcript}`); } }); realtimeClient.updateSession({ instructions: "You are a helpful assistant that can answer questions and help with tasks.", });
Step 2 - Setup your server-side integration
This example was pretty simple to set up and showcases how easy it is to add an AI bot to a Stream call. When building a real application, you will need your backend to handle authentication for your clients as well as send instructions to OpenAI (RAG, function calling in most applications, needs to run on your backend).
So the backend we are going to build will take care of two things:
- Generate a valid token for the Flutter app to join the call running on Stream
- Use Stream APIs to join the same call with the AI agent and set it up with instructions
Step 2.1 - Implement the server.mjs
Create a new file in the same project, called server.mjs
, and add the following code:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133import { serve } from "@hono/node-server"; import { StreamClient } from "@stream-io/node-sdk"; import { Hono } from "hono"; import { cors } from "hono/cors"; import crypto from 'crypto'; import { config } from 'dotenv'; // load config from dotenv config(); // Get environment variables const streamApiKey = process.env.STREAM_API_KEY; const streamApiSecret = process.env.STREAM_API_SECRET; const openAiApiKey = process.env.OPENAI_API_KEY; // Check if all required environment variables are set if (!streamApiKey || !streamApiSecret || !openAiApiKey) { console.error("Error: Missing required environment variables, make sure to have a .env file in the project root, check .env.example for reference"); process.exit(1); } const app = new Hono(); app.use(cors()); const streamClient = new StreamClient(streamApiKey, streamApiSecret); /** * Endpoint to generate credentials for a new video call. * Creates a unique call ID, generates a token, and returns necessary connection details. */ app.get("/credentials", (c) => { console.log("got a request for credentials"); // Generate a shorter UUID for callId (first 12 chars) const callId = crypto.randomUUID().replace(/-/g, '').substring(0, 12); // Generate a shorter UUID for userId (first 8 chars with prefix) const userId = `user-${crypto.randomUUID().replace(/-/g, '').substring(0, 8)}`; const callType = "default"; const token = streamClient.generateUserToken({ user_id: userId, }); return c.json({ apiKey: streamApiKey, token, callType, callId, userId }); }); /** * Endpoint to connect an AI agent to an existing video call. * Takes call type and ID parameters, connects the OpenAI agent to the call, * sets up the real-time client with event handlers and tools, * and returns a success response when complete. */ app.post("/:callType/:callId/connect", async (c) => { console.log("got a request for connect"); const callType = c.req.param("callType"); const callId = c.req.param("callId"); const call = streamClient.video.call(callType, callId); const realtimeClient = await streamClient.video.connectOpenAi({ call, openAiApiKey, agentUserId: "lucy", }); await setupRealtimeClient(realtimeClient); console.log("agent is connected now"); return c.json({ ok: true }); }); async function setupRealtimeClient(realtimeClient) { realtimeClient.on("error", (event) => { console.error("Error:", event); }); realtimeClient.on("session.update", (event) => { console.log("Realtime session update:", event); }); realtimeClient.updateSession({ instructions: "You are a helpful assistant that can answer questions and help with tasks.", }); realtimeClient.addTool( { name: "get_weather", description: "Call this function to retrieve current weather information for a specific location. Provide the city name.", parameters: { type: "object", properties: { city: { type: "string", description: "The name of the city to get weather information for", }, }, required: ["city"], }, }, async ({ city, country, units = "metric" }) => { console.log("get_weather request", { city, country, units }); try { // This is a placeholder for actual weather API implementation // In a real implementation, you would call a weather API service here const weatherData = { location: country ? `${city}, ${country}` : city, temperature: 22, units: units === "imperial" ? "°F" : "°C", condition: "Partly Cloudy", humidity: 65, windSpeed: 10 }; return weatherData; } catch (error) { console.error("Error fetching weather data:", error); return { error: "Failed to retrieve weather information" }; } }, ); return realtimeClient; } // Start the server serve({ fetch: app.fetch, hostname: "0.0.0.0", port: 3000, }); console.log(`Server started on :3000`);
In the code above, we set up two endpoints: /credentials
, which generates a unique call ID and authentication token, and /:callType/:callId/connect
, which connects the AI agent (that we call “lucy”) to a specific video call. The assistant follows predefined instructions, in this case trying to be helpful with tasks. Based on the purpose of your AI bot, you should update these instructions accordingly. We also show an example of a function call, using the get_weather
tool.
Step 2.2 - Running the server
We can run the server now, this will launch a server and listen on port:3000
1npm run server
To make sure everything is working as expected, you can run a curl
GET request from your terminal.
1curl -X GET http://localhost:3000/credentials
As a result, you should see the credentials required to join the call. With that, we’re all set up server-side!
Step 3 - Setting up the Flutter project
Now, let’s switch to the Flutter app, which will connect to this API and provide visualizations of the AI’s audio levels.
Step 3.1 - Adding the Stream Video dependency
Let’s create a new project, for example, ai_video_demo,
and add the StreamVideo Flutter SDK.
Create a new project and add the following dependencies that are needed for the demo:
123flutter create ai_video_demo cd ai_video_demo flutter pub add stream_video_flutter collection http permission_handler
Open the project in your favorite IDE. You should have the following dependencies now in your pubspec.yaml file
1234567891011dependencies: flutter: sdk: flutter # The following adds the Cupertino Icons font to your application. # Use with the CupertinoIcons class for iOS-style icons. cupertino_icons: ^1.0.8 stream_video_flutter: ^0.8.3 collection: ^1.19.1 http: ^1.3.0 permission_handler: ^11.4.0
Update the minimal iOS version by updating the platform :ios version in ios/Podfile to at least 13.0:
123# platform :ios, '12.0' platform :ios, '13.0'
Step 3.2 - Setup Microphone Permissions
Since you will be using the microphone to communicate with the AI bot, we need to make sure we request permission to use it.
For iOS you need to add the “Privacy - Microphone Usage Description” permission in your Info.plist
file, which you can find in ios/Runner. For example, you can use the following text as a description: “Microphone access needed for talking with AI.:
1234567<plist version="1.0"> <dict> (...) <key>NSMicrophoneUsageDescription</key> <string>Microphone access needed for talking with AI.</string> </dict> </plist>
For Android we need to add the permission in the AndroidManifest, which you can find in android/app/src/main. We also need to add the internet permission here.
12345<manifest xmlns:android="http://schemas.android.com/apk/res/android"> <uses-permission android:name="android.permission.INTERNET" /> <uses-permission android:name="android.permission.RECORD_AUDIO" /> <application (...)
Step 4 - Stream Video Setup
Step 4.1 - Setup basic app
It’s time to write some Dart code. Head over to the created main.dart
file, and replace with the following code:
1234567891011121314151617181920212223242526272829import 'package:flutter/material.dart'; import 'ai_demo_controller.dart'; import 'home_page.dart'; void main() { final aiController = AiDemoController(); runApp(AIVideoDemoApp(aiController)); } class AIVideoDemoApp extends StatelessWidget { const AIVideoDemoApp(this.aiDemoController, {super.key}); final AiDemoController aiDemoController; Widget build(BuildContext context) { return MaterialApp( title: 'AI Video Demo App', theme: ThemeData( colorScheme: ColorScheme.fromSeed( seedColor: Colors.blueAccent, brightness: Brightness.dark, ), ), home: HomePage(aiDemoController), ); } }
In the main.dart, we create a controller and set up the MaterialApp with a HomePage. In the next steps, we’ll create the controller and HomePage.
Step 4.2 - Create AiDemoController
Create a file ai_demo_controller.dart
in the lib folder.
First we’ll add the baseUrl for the server:
12345import 'dart:io'; String get _baseURL => Platform.isAndroid ? _baseUrlAndroid : _baseURLiOS; const _baseURLiOS = "http://localhost:3000"; const _baseUrlAndroid = 'http://10.0.2.2:3000';
Note: We are using “localhost” here (as defined in the baseURL
property), the simplest way to test this is to run on an iOS simulator or Android emulator. On Android emulators the “localhost” would refer to the emulator itself, so for Android, we use “10.0.2.2”, which refers to the machine running the emulator. You can also test this on a real device. To do that you need to set _baseURL
to your local network IP address instead. Additionally, your device and your computer should be on the same WiFi network and you need to allow “Arbitrary Loads” and “Local Networking” in your plist for iOS (the local server uses HTTP and not HTTPS).
Create the basic controller with the required properties.
123456789101112131415161718192021import 'package:flutter/foundation.dart'; import 'package:stream_video_flutter/stream_video_flutter.dart' as stream_video; enum AICallState { idle, joining, active } class AiDemoController extends ChangeNotifier { AiDemoController() {} AICallState _callState = AICallState.idle; set callState(AICallState callState) { _callState = callState; notifyListeners(); } AICallState get callState => _callState; Credentials? credentials; stream_video.StreamVideo? streamVideo; stream_video.Call? call; }
We use the ChangeNotifier from Flutter to manage our state and simplify the UI. We’ve created an AICallState
to update the UI with the right screen and we use the callState setter in our controller to notify the UI.
After declaring these properties, we need to set them up. We will use the streamVideo
object to communicate with Stream’s Video API and store the relevant call information in the call
object.
We will fetch the credentials required to set up the streamVideo
object and the call
from the Node.js server API we created above and populate the credentials
value with the response from the Node.js server.
Let’s add the Credentials
model that reflects this response:
12345678910111213141516import 'dart:convert'; class Credentials { Credentials.fromJson(Map<String, dynamic> json) : apiKey = json['apiKey'], token = json['token'], callType = json['callType'], callId = json['callId'], userId = json['userId']; final String apiKey; final String token; final String callType; final String callId; final String userId; }
Step 4.3 - Fetching the credentials
Now, we can create a method to fetch the credentials from our server. Add the following code in the ai_demo_controller.dart
file inside the AiDemoController
.
123456789101112131415161718import 'package:http/http.dart' as http; class AiDemoController extends ChangeNotifier { (...) Future<Credentials?> _fetchCredentials() async { final url = Uri.parse('$_baseURL/credentials'); try{ final result = await http.get(url); final json = jsonDecode(result.body) as Map<String, dynamic>; return Credentials.fromJson(json); } catch (e) { FlutterError.dumpErrorToConsole( FlutterErrorDetails(exception: e, silent: true), ); return null; } }
This method sends a GET request to fetch the credentials to set up the StreamVideo object and get the call data.
Step 4.3 - Connecting to Stream Video
We now have the credentials, and we can connect to Stream Video. To do this, add the following code in AiDemoController
:
123456789101112131415161718192021222324import 'dart:async'; class AiDemoController extends ChangeNotifier { (...) final Completer<void> _connectCompleter = Completer(); Future<void> _connect() async { final credentials = await _fetchCredentials(); if (credentials == null) { _connectCompleter.completeError(Exception('No valid credentials')); return; } streamVideo = stream_video.StreamVideo( credentials.apiKey, user: stream_video.User.regular(userId: credentials.userId), userToken: credentials.token, ); this.credentials = credentials; await streamVideo!.connect(); _connectCompleter.complete(); }
This method fetches the credentials, creates a StreamVideo
object, and connects to it. We keep track of the _connectCompleter
for later use.
We will call this method directly in the constructor of the controller:
1234class AiDemoController extends ChangeNotifier { AiDemoController() { _connect(); }
Step 5 - Building the UI
We can now start building the UI for our app. Create a file home_page.dart
and add the following content:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162import 'package:flutter/material.dart'; import 'package:stream_video_flutter/stream_video_flutter.dart' as stream_video; import 'ai_demo_controller.dart'; import 'ai_speaking_view.dart'; class HomePage extends StatelessWidget { const HomePage(this.controller, {super.key}); final AiDemoController controller; Widget build(BuildContext context) { return Scaffold( body: SizedBox.expand( child: LayoutBuilder( builder: (context, constraints) => ListenableBuilder( listenable: controller, builder: (context, _) => switch (controller.callState) { AICallState.idle => GestureDetector( onTap: controller.joinCall, child: Center(child: Text('Click to talk to AI')), ), AICallState.joining => Center( child: Column( mainAxisAlignment: MainAxisAlignment.center, children: [ Text('Waiting for AI agent to join...'), SizedBox(height: 8), CircularProgressIndicator(), ], ), ), AICallState.active => Stack( children: [ AiSpeakingView( controller.call!, boxConstraints: constraints, ), Align( alignment: Alignment.bottomRight, child: SafeArea( child: Padding( padding: const EdgeInsets.all(8.0), child: stream_video.LeaveCallOption( call: controller.call!, onLeaveCallTap: controller.leaveCall, ), ), ), ), ], ), }, ), ), ), ); } }
The HomePage uses a Scaffold and a ListenableBuilder
that updates the UI based on the callState in our controller. The AiDemoController
is already constructed in our main.dart
before runApp, so it directly connects to our backend to fetch credentials. However, in our controller, we don’t change the callState yet, so the app will always show the AICallState.idle
state.
When the state is .active
, we show a new AiSpeakingView
. This view will show a nice audio visualization when the current user and AI speak. We will provide more details for this view in the next section. For now, it’s enough to declare it with a simple Placeholder. Create a file ai_speaking_view.dart
and add the following:
12345678910111213141516171819import 'package:flutter/material.dart'; import 'package:stream_video_flutter/stream_video_flutter.dart'; class AiSpeakingView extends StatefulWidget { const AiSpeakingView(this.call, {required this.boxConstraints, super.key}); final Call call; final BoxConstraints boxConstraints; State<AiSpeakingView> createState() => _AiSpeakingViewState(); } class _AiSpeakingViewState extends State<AiSpeakingView> { Widget build(BuildContext context) { return const Placeholder(); } }
Additionally, we are adding an overlay that shows a button for leaving the call. We are using the LeaveCallOption
from the Flutter Video SDK for this.
Next, when the state is .joining
, we show appropriate text and a progress view.
When the state is .idle
, we show a button with the text “Click to talk to AI.” When the button is tapped, we call the joinCall
method, which joins the call with the AI bot.
Add the following methods in our AiDemoController:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748Future<void> joinCall() async { try { callState = AICallState.joining; if (Platform.isAndroid) { final hasMicrophonePermission = await Permission.microphone.request().isGranted; if (!hasMicrophonePermission) { callState = AICallState.idle; return; } } await _connectCompleter.future; final credentials = this.credentials; final streamVideo = this.streamVideo; if (credentials == null || streamVideo == null) { callState = AICallState.idle; return; } final call = streamVideo.makeCall( callType: stream_video.StreamCallType.fromString(credentials.callType), id: credentials.callId, ); this.call = call; await call.getOrCreate(); await _connectAi( callType: credentials.callType, callId: credentials.callId, ); await call.join(); callState = AICallState.active; } catch (e) { FlutterError.dumpErrorToConsole( FlutterErrorDetails(exception: e, silent: true), ); callState = AICallState.idle; } } Future _connectAi({required String callType, required String callId}) async { final url = Uri.parse('$_baseURL/$callType/$callId/connect'); await http.post(url); }
Let’s get through the joinCall
method step by step.
- We update the
callState
so the home_page widget will update the UI. - On Android we need to request permission to use the microphone, on iOS this will be done automatically when we start using it. When we don’t get the permission we go back to idle state.
- Next, we ensure we are connected to our server by awaiting the
_connectCompleter
set in the_connect
method. - If we have the credentials we continue to make the call using
makeCall
andgetOrCreate
methods of thestream_video_flutter
sdk. - After the call is created we can call our server to connect our AI agent to the call and we join the call.
- We update the callState to
.active
or log an error when something went wrong.
Also add the following leaveCall
method to the same controller:
123456789Future<void> leaveCall() async { final call = this.call; if (call == null) return; await call.leave(); this.call = null; callState = AICallState.idle; }
The leaveCall method leaves the existing call and updates the UI state to .idle
.
At this point, you can run the app, join a call, and converse with the AI agent. However, we can take this step further and show nice visualizations based on the participants' audio levels. If you want to give it a try now, make sure your local server is running.
Step 6 - Visualizing the audio levels
Let’s implement the AiSpeakingView
next. This view will listen to the audio levels provided by the call state for each of its participants. It will then visualize them with a nice glowing animation that expands and contracts based on the user’s voice amplitude. Additionally, it subtly rotates and changes shape.
Step 6.1 - AiSpeakingView
We want to animate our AiSpeakingView based on the amplitude of the speaker and animate over time. Add the following code to the _AiSpeakingViewState
:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051import 'dart:async'; class _AiSpeakingViewState extends State<AiSpeakingView> with TickerProviderStateMixin { static const _agentId = "lucy"; var _speakerState = AISpeakerState.idle; var _currentAmplitude = 0.0; late AnimationController _timeController; late AnimationController _amplitudeController; late StreamSubscription<CallState> _callStateSubscription; void initState() { super.initState(); _timeController = AnimationController( duration: const Duration(seconds: 10), vsync: this, )..addListener(() { setState(() { // The state that has changed here is the animation object's value. }); }); _timeController.repeat(); _amplitudeController = AnimationController( duration: const Duration(milliseconds: 300), vsync: this, lowerBound: 0.0, upperBound: _currentAmplitude, ); _updateSpeakerState(widget.call.state.value); _listenToCallState(); } void didUpdateWidget(covariant AiSpeakingView oldWidget) { super.didUpdateWidget(oldWidget); if (oldWidget.call != widget.call) { _callStateSubscription.cancel(); _listenToCallState(); } } void dispose() { _timeController.dispose(); _amplitudeController.dispose(); _callStateSubscription.cancel(); super.dispose(); }
First notice that we’ve added the TickerProviderStateMixin
on the widget State for the animations.
We created an AnimationController to loop the time between 0 and 1 every 10 seconds and an AnimationController to animate the amplitude. At the start, the amplitude is 0, so we don’t start the controller for the amplitude, but the _timeController
is always repeating.
When the widget is updated with a new call, we reset the listeners for the call state, and in the dispose, we dispose of the _callStateSubscription
and both AnimationControllers.
Lets add the missing AISpeakerState
enum and the _listenToCallState
method:
12345678910111213enum AISpeakerState { aiSpeaking, userSpeaking, idle } class _AiSpeakingViewState extends State<AiSpeakingView> with TickerProviderStateMixin { (...) void _listenToCallState() { _callStateSubscription = widget.call.state.asStream().listen((callState) { _updateSpeakerState(callState); }); } }
Now we listen to updates from the call state and call the _updateSpeakerState
every time the call state changes. Add all the methods needed to update the speaker state inside _AiSpeakingViewState
.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081import 'dart:math' as math; import 'package:collection/collection.dart'; void _updateSpeakerState(CallState callState) { final activeSpeakers = callState.activeSpeakers; final agent = activeSpeakers.firstWhereOrNull( (p) => p.userId.contains(_agentId), ); final user = activeSpeakers.firstWhereOrNull( (p) => p.userId == callState.localParticipant?.userId, ); List<double> audioLevels; if (agent != null && agent.isSpeaking) { _speakerState = AISpeakerState.aiSpeaking; audioLevels = agent.audioLevels .map((e) => e / (math.Random().nextInt(2) + 1)) .toList(); } else if (user != null && user.isSpeaking) { _speakerState = AISpeakerState.userSpeaking; audioLevels = user.audioLevels; } else { _speakerState = AISpeakerState.idle; audioLevels = []; } final amplitude = _computeSingleAmplitude(audioLevels); _updateAmplitudeAnimation(amplitude); } double _computeSingleAmplitude(List<double> audioLevels) { final normalized = _normalizePeak(audioLevels); if (normalized.isEmpty) return 0; final sum = normalized.reduce((value, element) => value + element); final average = sum / normalized.length; return average; } List<double> _normalizePeak(List<double> audioLevels) { final max = audioLevels.fold( 0.0, (value, element) => math.max(value, element), ); if (max == 0.0) return audioLevels; return audioLevels.map((e) => e / max).toList(); } void _updateAmplitudeAnimation(double newAmplitude) { if (_currentAmplitude != newAmplitude) { var currentAnimationState = _amplitudeController.value; _amplitudeController.dispose(); final reverse = currentAnimationState > newAmplitude; _amplitudeController = AnimationController( duration: const Duration(milliseconds: 500), vsync: this, lowerBound: reverse ? newAmplitude : currentAnimationState, upperBound: reverse ? currentAnimationState : newAmplitude, ); _amplitudeController.addListener(() { setState(() { // The state that has changed here is the animation object's value. }); }); if (currentAnimationState != newAmplitude) { if (reverse) { _amplitudeController.reverse(from: currentAnimationState); } else { _amplitudeController.forward(); } } } _currentAmplitude = newAmplitude; }
We call our agent “lucy”, and based on that, we filter out this participant from the current user. The amplitude is calculated from the last 10 audio levels of the speaker and normalized to a value between 0 and 1. That amplitude is smoothly animated with a duration of 500ms.
Now update the build method with the Placeholder to contain a Stack with 3 layers of Glow and use the animation values. We have 3 large and small layers that form the AI glow together.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051Widget build(BuildContext context) { final size = Size( widget.boxConstraints.maxWidth, widget.boxConstraints.maxHeight, ); final time = _timeController.value; final amplitude = _amplitudeController.value; return Stack( children: [ GlowLayer( baseRadiusMax: 1, baseRadiusMin: 3 / 5, baseOpacity: 0.35, scaleRange: 0.3, waveRangeMin: 0.2, waveRangeMax: 0.02, amplitude: amplitude, time: time, size: size, speakerState: _speakerState, ), GlowLayer( baseRadiusMax: 3 / 5, baseRadiusMin: 2 / 5, baseOpacity: 0.35, scaleRange: 0.3, waveRangeMin: 0.15, waveRangeMax: 0.03, amplitude: amplitude, time: time, size: size, speakerState: _speakerState, ), GlowLayer( baseRadiusMax: 1 / 5, baseRadiusMin: 2 / 5, baseOpacity: 0.9, scaleRange: 0.5, waveRangeMin: 0.35, waveRangeMax: 0.05, amplitude: amplitude, time: time, size: size, speakerState: _speakerState, ), ], ); }
Step 6.2 - GlowLayer
Let’s implement the GlowLayer
next. For each of the layers, we define minimum and maximum values for the radius size, brightness, opacity, and wavelength. Feel free to adjust these values to customize the animation. Add the GlowLayer and the new import to the ai_speaking_view.dart
file.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384import 'dart:ui'; class GlowLayer extends StatelessWidget { const GlowLayer({ required this.speakerState, required this.baseRadiusMin, required this.baseRadiusMax, required this.baseOpacity, required this.scaleRange, required this.waveRangeMin, required this.waveRangeMax, required this.amplitude, required this.time, required this.size, super.key, }); final AISpeakerState speakerState; final double baseRadiusMin; final double baseRadiusMax; final double baseOpacity; final double scaleRange; final double waveRangeMin; final double waveRangeMax; final double amplitude; final double time; final Size size; Widget build(BuildContext context) { // The actual radius = lerp from min->max based on amplitude final baseRadius = lerpDouble(baseRadiusMin, baseRadiusMax, amplitude)!; // The waveRange also “lerps,” but we want big wave at low amplitude => waveRangeMin at amplitude=1 // => just invert the parameter. Another approach: waveRange = waveRangeMax + (waveRangeMin-waveRangeMax)*(1 - amplitude). final waveRange = lerpDouble(waveRangeMax, waveRangeMin, (1 - amplitude))!; final radius = baseRadius * math.min(size.width, size.height); // Subtle elliptical warping from sin/cos final shapeWaveSin = math.sin(2 * math.pi * time); final shapeWaveCos = math.cos(2 * math.pi * time); // scale from amplitude final amplitudeScale = 1.0 + scaleRange * amplitude; // final x/y scale => merges amplitude + wave final xScale = amplitudeScale + waveRange * shapeWaveSin; final yScale = amplitudeScale + waveRange * shapeWaveCos; return Center( child: Opacity( opacity: baseOpacity, child: Transform.scale( scaleY: yScale, scaleX: xScale, child: SizedBox( height: radius, width: radius, child: DecoratedBox( decoration: BoxDecoration( gradient: RadialGradient( radius: 0.5, colors: speakerState.gradientColors, stops: <double>[0.0, 1.0], ), ), ), ), ), ), ); } } extension on AISpeakerState { List<Color> get gradientColors => switch (this) { AISpeakerState.userSpeaking => [Colors.red, Colors.red.withAlpha(0)], _ => [ Color.from(red: 0.0, green: 0.976, blue: 1.0, alpha: 1.0), Color.from(red: 0.0, green: 0.227, blue: 1.0, alpha: 0.0), ], }; }
We show a different color depending on who is speaking. If the AI speaks, we show a blue color with different gradients. When the current user is speaking, we use a red color instead. When the amplitude is lower the wave animation will be stronger.
Now, you can run the app, talk to the AI, and see beautiful visualizations while the participants speak.
You can find the source code of the Node.js backend here, while the completed Flutter tutorial can be found on the following page.
Recap
In this tutorial, we have built an example of an app that lets you talk with an AI bot using OpenAI Realtime and Stream’s video edge infrastructure. The integration uses WebRTC for the best latency and quality even with poor connections.
We have shown you how to use OpenAI’s real-time API and provide the agent with custom instructions, voice, and function calls. On the Flutter side, we have shown you how to join the call and build an animation using the audio levels.
Both the video SDK for Flutter and the API have plenty more features available to support more advanced use cases.
Next Steps
- Explore the tutorials for other platforms: React, iOS, Android, React Native.
- Check the Backend documentation with more examples in JS and Python.
- Read more about the Flutter SDK documentation about additional features.