Flutter AI Voice Assistant Tutorial

This tutorial teaches you how to quickly build a production-ready voice AI agent with OpenAI realtime using Stream’s video edge network, Flutter, and Node.

The instructions to the agent are sent server-side (node) so you can do function calling or RAG
The integration uses Stream’s video edge network (for low latency) and WebRTC (so it works under slow/unreliable network conditions)
You have full control over the AI setup and visualization

The result will look something like this:

While this tutorial uses Node + Flutter you could achieve something similar with any other backend language + Stream SDK. (Swift, Kotlin, React, JS, Flutter, React Native, Unity etc)

Step 1 - Credentials and Backend setup

First, we are going to set up the Node.js backend and get Stream and OpenAI credentials.

Step 1.1 - OpenAI and Stream credentials

To get started, you need an OpenAI account and an API key. Please note that the OpenAI credentials will never be shared client-side and will only be exchanged between your and Stream servers.

Additionally, you will need a Stream account and use the API key and secret from the Stream dashboard.

Step 1.2 - Create the Node.js project

Make sure that you are using a recent version of Node.js such as 22 or later, you can check that with node -v

First, let’s create a new folder called “openai-audio-tutorial”. From the terminal, go to the folder, and run the following command:

bash

1
npm init -y

This command generates a package.json file with default settings.

Step 1.3 - Installing the dependencies

Next, let’s update the generated package.json with the following content:

package.json (json)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
{
  "name": "@stream-io/video-ai-demo-server",
  "type": "module",
  "dependencies": {
    "@hono/node-server": "^1.13.8",
    "@stream-io/node-sdk": "^0.4.17",
    "@stream-io/openai-realtime-api": "^0.1.0",
    "dotenv": "^16.3.1",
    "hono": "^4.7.4",
    "open": "^10.1.0"
  },
  "scripts": {
    "server": "node ./server.mjs",
    "standalone-ui": "node ./standalone.mjs"
  }
}

Then, run the following command to install the dependencies:

bash

1
npm install

Step 1.4 - Setup the credentials

Create a .env file in the project root with the following variables:

.env (text)

1
2
3
4
5
6
# Stream API credentials
STREAM_API_KEY=your_stream_api_key
STREAM_API_SECRET=your_stream_api_secret

# OpenAI API key
OPENAI_API_KEY=your_openai_api_key

Then edit the .env file with your actual API keys from Step 1.1. You can find the keys on your Stream Dashboard:

Step 1.5 - Implement the standalone-ui script

Before diving into the Flutter integrations, we are going to build a simple server integration that will show how to connect to the AI agent to a call and connect to it with a simple web app.

Create a file called standalone.mjs and paste this content

standalone.mjs (js)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
import { config } from 'dotenv';
import { StreamClient } from '@stream-io/node-sdk';
import open from 'open';
import crypto from 'crypto';

// load config from dotenv
config();

async function main() {
    // Get environment variables
    const streamApiKey = process.env.STREAM_API_KEY;
    const streamApiSecret = process.env.STREAM_API_SECRET;
    const openAiApiKey = process.env.OPENAI_API_KEY;

    // Check if all required environment variables are set
    if (!streamApiKey || !streamApiSecret || !openAiApiKey) {
        console.error("Error: Missing required environment variables, make sure to have a .env file in the project root, check .env.example for reference");
        process.exit(1);
    }

    const streamClient = new StreamClient(streamApiKey, streamApiSecret);
    const call = streamClient.video.call("default", crypto.randomUUID());

    // realtimeClient is https://github.com/openai/openai-realtime-api-beta openai/openai-realtime-api-beta
    const realtimeClient = await streamClient.video.connectOpenAi({
        call,
        openAiApiKey,
        agentUserId: "lucy",
    });

    // Set up event handling, all events from openai realtime api are available here see: https://platform.openai.com/docs/api-reference/realtime-server-events
    realtimeClient.on('realtime.event', ({ time, source, event }) => {
        console.log(`got an event from OpenAI ${event.type}`);
        if (event.type === 'response.audio_transcript.done') {
            console.log(`got a transcript from OpenAI ${event.transcript}`);
        }
    });

    realtimeClient.updateSession({
        instructions: "You are a helpful assistant that can answer questions and help with tasks.",
    });

    // Get token for the call
    const token = streamClient.generateUserToken({user_id:"theodore"});

    // Construct the URL, TODO: replace this with 
    const callUrl = `https://pronto.getstream.io/join/${call.id}?type=default&api_key=${streamClient.apiKey}&token=${token}&skip_lobby=true`;

    // Open the browser
    console.log(`Opening browser to join the call... ${callUrl}`);
    await open(callUrl);
}

main().catch(error => {
    console.error("Error:", error);
    process.exit(1);
});

Step 1.6 - Running the sample

At this point, we can run the script with this command:

bash

1
npm run standalone-ui

This will open your browser and connect you to a call where you can talk to the OpenAI agent. As you talk to the agent, you will notice your shell will contain logs for each event that OpenAI is sending.

Let’s take a quick look at what it happening in the server-side code we just added:

Here we instantiate Stream Node SDK with the API credentials and then use that to create a new call object. That call will be used to host the conversation between the user and the AI agent.

1
2
const streamClient = new StreamClient(streamApiKey, streamApiSecret);
const call = streamClient.video.call("default", crypto.randomUUID());

The next step, is to have the Agent connect to the call and obtain a OpenAI Realtime API Client. The connectOpenAi function does the following things: it instantiate the Realtime API client and then uses Stream API to connect the agent to the call. The agent will connect to the call as a user with ID "lucy"

1
2
3
4
5
const realtimeClient = await streamClient.video.connectOpenAi({
    call,
    openAiApiKey,
    agentUserId: "lucy",
});

We then use the realtimeClient object to pass instructions to OpenAI and to listen to events emitted by OpenAI. The interesting bit here is that realtimeClient is an instance of OpenAI’s official API client. This gives you full control of what you can do with OpenAI

jsx

1
2
3
4
5
6
7
8
9
10
realtimeClient.on('realtime.event', ({ time, source, event }) => {
    console.log(`got an event from OpenAI ${event.type}`);
    if (event.type === 'response.audio_transcript.done') {
        console.log(`got a transcript from OpenAI ${event.transcript}`);
    }
});

realtimeClient.updateSession({
    instructions: "You are a helpful assistant that can answer questions and help with tasks.",
});

Step 2 - Setup your server-side integration

This example was pretty simple to set up and showcases how easy it is to add an AI bot to a Stream call. When building a real application, you will need your backend to handle authentication for your clients as well as send instructions to OpenAI (RAG, function calling in most applications, needs to run on your backend).

So the backend we are going to build will take care of two things:

Generate a valid token for the Flutter app to join the call running on Stream
Use Stream APIs to join the same call with the AI agent and set it up with instructions

Step 2.1 - Implement the server.mjs

Create a new file in the same project, called server.mjs, and add the following code:

jsx

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
import { serve } from "@hono/node-server";
import { StreamClient } from "@stream-io/node-sdk";
import { Hono } from "hono";
import { cors } from "hono/cors";
import crypto from 'crypto';
import { config } from 'dotenv';

// load config from dotenv
config();

// Get environment variables
const streamApiKey = process.env.STREAM_API_KEY;
const streamApiSecret = process.env.STREAM_API_SECRET;
const openAiApiKey = process.env.OPENAI_API_KEY;

// Check if all required environment variables are set
if (!streamApiKey || !streamApiSecret || !openAiApiKey) {
    console.error("Error: Missing required environment variables, make sure to have a .env file in the project root, check .env.example for reference");
    process.exit(1);
}

const app = new Hono();
app.use(cors());

const streamClient = new StreamClient(streamApiKey, streamApiSecret);

/**
 * Endpoint to generate credentials for a new video call.
 * Creates a unique call ID, generates a token, and returns necessary connection details.
 */
app.get("/credentials", (c) => {
  console.log("got a request for credentials");
  // Generate a shorter UUID for callId (first 12 chars)
  const callId = crypto.randomUUID().replace(/-/g, '').substring(0, 12);
  // Generate a shorter UUID for userId (first 8 chars with prefix)
  const userId = `user-${crypto.randomUUID().replace(/-/g, '').substring(0, 8)}`;
  const callType = "default";
  const token = streamClient.generateUserToken({
    user_id: userId,
  });
  return c.json({
    apiKey: streamApiKey,
    token,
    callType,
    callId,
    userId
  });
});

/**
 * Endpoint to connect an AI agent to an existing video call.
 * Takes call type and ID parameters, connects the OpenAI agent to the call,
 * sets up the real-time client with event handlers and tools,
 * and returns a success response when complete.
 */
app.post("/:callType/:callId/connect", async (c) => {
  console.log("got a request for connect");
  const callType = c.req.param("callType");
  const callId = c.req.param("callId");

  const call = streamClient.video.call(callType, callId);
  const realtimeClient = await streamClient.video.connectOpenAi({
    call,
    openAiApiKey,
    agentUserId: "lucy",
  });
  await setupRealtimeClient(realtimeClient);
  console.log("agent is connected now");
  return c.json({ ok: true });
});

async function setupRealtimeClient(realtimeClient) {
  realtimeClient.on("error", (event) => {
    console.error("Error:", event);
  });

  realtimeClient.on("session.update", (event) => {
    console.log("Realtime session update:", event);
  });

  realtimeClient.updateSession({
    instructions: "You are a helpful assistant that can answer questions and help with tasks.",
    turn_detection: { type: "semantic_vad" },
    input_audio_transcription: { model: "gpt-4o-transcribe" },
    input_audio_noise_reduction: { type: "near_field" },
  });

  realtimeClient.addTool(
    {
      name: "get_weather",
      description:
        "Call this function to retrieve current weather information for a specific location. Provide the city name.",
      parameters: {
        type: "object",
        properties: {
          city: {
            type: "string",
            description: "The name of the city to get weather information for",
          },
        },
        required: ["city"],
      },
    },
    async ({ city, country, units = "metric" }) => {
      console.log("get_weather request", { city, country, units });
      try {
        // This is a placeholder for actual weather API implementation
        // In a real implementation, you would call a weather API service here
        const weatherData = {
          location: country ? `${city}, ${country}` : city,
          temperature: 22,
          units: units === "imperial" ? "°F" : "°C",
          condition: "Partly Cloudy",
          humidity: 65,
          windSpeed: 10
        };
        
        return weatherData;
      } catch (error) {
        console.error("Error fetching weather data:", error);
        return { error: "Failed to retrieve weather information" };
      }
    },
  );

  return realtimeClient;
}

// Start the server
serve({
  fetch: app.fetch,
  hostname: "0.0.0.0",
  port: 3000,
});

console.log(`Server started on :3000`);

In the code above, we set up two endpoints: /credentials, which generates a unique call ID and authentication token, and /:callType/:callId/connect, which connects the AI agent (that we call “lucy”) to a specific video call.

The assistant follows predefined instructions, in this case trying to be helpful with tasks. Based on the purpose of your AI bot, you should update these instructions accordingly. In the same updateSession call we instruct OpenAI to use the semantic classifier for voice activity detection to detect when the user has finished speaking, a GPT-4o based model for transcriptions, and near-field noise reduction for audio.

We also show an example of a function call, using the get_weather tool.

Step 2.2 - Running the server

We can run the server now, this will launch a server and listen on port:3000

bash

1
npm run server

To make sure everything is working as expected, you can run a curl GET request from your terminal.

bash

1
curl -X GET http://localhost:3000/credentials

As a result, you should see the credentials required to join the call. With that, we’re all set up server-side!

Step 3 - Setting up the Flutter project

Now, let’s switch to the Flutter app, which will connect to this API and provide visualizations of the AI’s audio levels.

Step 3.1 - Adding the Stream Video dependency

Let’s create a new project, for example, ai_video_demo, and add the StreamVideo Flutter SDK.

Create a new project and add the following dependencies that are needed for the demo:

bash

1
2
3
flutter create ai_video_demo
cd ai_video_demo
flutter pub add stream_video_flutter collection http permission_handler

Open the project in your favorite IDE. You should have the following dependencies now in your pubspec.yaml file

pubspec.yaml (yaml)

1
2
3
4
5
6
7
8
9
10
11
dependencies:
  flutter:
    sdk: flutter

  # The following adds the Cupertino Icons font to your application.
  # Use with the CupertinoIcons class for iOS-style icons.
  cupertino_icons: ^1.0.8
  stream_video_flutter: ^0.8.3
  collection: ^1.19.1
  http: ^1.3.0
  permission_handler: ^11.4.0

Update the minimal iOS version by updating the platform :ios version in ios/Podfile to at least 13.0:

yaml

1
2
3
# platform :ios, '12.0'

platform :ios, '13.0'

Step 3.2 - Setup Microphone Permissions

Since you will be using the microphone to communicate with the AI bot, we need to make sure we request permission to use it.

For iOS you need to add the “Privacy - Microphone Usage Description” permission in your Info.plist file, which you can find in ios/Runner. For example, you can use the following text as a description: “Microphone access needed for talking with AI.:

Info.plist (xml)

1
2
3
4
5
6
7
<plist version="1.0">
<dict>
(...)
	<key>NSMicrophoneUsageDescription</key>
	<string>Microphone access needed for talking with AI.</string>
</dict>
</plist>

For Android we need to add the permission in the AndroidManifest, which you can find in android/app/src/main. We also need to add the internet permission here.

AndroidManifest (xml)

1
2
3
4
5
<manifest xmlns:android="http://schemas.android.com/apk/res/android">
    <uses-permission android:name="android.permission.INTERNET" />
    <uses-permission android:name="android.permission.RECORD_AUDIO" />
    <application
	    (...)

Step 4 - Stream Video Setup

Step 4.1 - Setup basic app

It’s time to write some Dart code. Head over to the created main.dart file, and replace with the following code:

main.dart (dart)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
import 'package:flutter/material.dart';

import 'ai_demo_controller.dart';
import 'home_page.dart';

void main() {
  final aiController = AiDemoController();
  runApp(AIVideoDemoApp(aiController));
}

class AIVideoDemoApp extends StatelessWidget {
  const AIVideoDemoApp(this.aiDemoController, {super.key});

  final AiDemoController aiDemoController;

  @override
  Widget build(BuildContext context) {
    return MaterialApp(
      title: 'AI Video Demo App',
      theme: ThemeData(
        colorScheme: ColorScheme.fromSeed(
          seedColor: Colors.blueAccent,
          brightness: Brightness.dark,
        ),
      ),
      home: HomePage(aiDemoController),
    );
  }
}

In the main.dart, we create a controller and set up the MaterialApp with a HomePage. In the next steps, we’ll create the controller and HomePage.

Step 4.2 - Create AiDemoController

Create a file ai_demo_controller.dart in the lib folder.

First we’ll add the baseUrl for the server:

ai_demo_controller.dart (dart)

1
2
3
4
5
import 'dart:io';

String get _baseURL => Platform.isAndroid ? _baseUrlAndroid : _baseURLiOS;
const _baseURLiOS = "http://localhost:3000";
const _baseUrlAndroid = 'http://10.0.2.2:3000';

Note: We are using “localhost” here (as defined in the baseURL property), the simplest way to test this is to run on an iOS simulator or Android emulator. On Android emulators the “localhost” would refer to the emulator itself, so for Android, we use “10.0.2.2”, which refers to the machine running the emulator. You can also test this on a real device. To do that you need to set _baseURL to your local network IP address instead. Additionally, your device and your computer should be on the same WiFi network and you need to allow “Arbitrary Loads” and “Local Networking” in your plist for iOS (the local server uses HTTP and not HTTPS).

Create the basic controller with the required properties.

ai_demo_controller.dart (dart)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
import 'package:flutter/foundation.dart';
import 'package:stream_video_flutter/stream_video_flutter.dart' as stream_video;

enum AICallState { idle, joining, active }

class AiDemoController extends ChangeNotifier {
  AiDemoController() {}

  AICallState _callState = AICallState.idle;

  set callState(AICallState callState) {
    _callState = callState;
    notifyListeners();
  }

  AICallState get callState => _callState;

  Credentials? credentials;
  stream_video.StreamVideo? streamVideo;
  stream_video.Call? call;
}

We use the ChangeNotifier from Flutter to manage our state and simplify the UI. We’ve created an AICallState to update the UI with the right screen and we use the callState setter in our controller to notify the UI.

After declaring these properties, we need to set them up. We will use the streamVideo object to communicate with Stream’s Video API and store the relevant call information in the call object.

We will fetch the credentials required to set up the streamVideo object and the call from the Node.js server API we created above and populate the credentials value with the response from the Node.js server.

Let’s add the Credentials model that reflects this response:

dart

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
import 'dart:convert';
  
class Credentials {
  Credentials.fromJson(Map<String, dynamic> json)
    : apiKey = json['apiKey'],
      token = json['token'],
      callType = json['callType'],
      callId = json['callId'],
      userId = json['userId'];

  final String apiKey;
  final String token;
  final String callType;
  final String callId;
  final String userId;
}

Step 4.3 - Fetching the credentials

Now, we can create a method to fetch the credentials from our server. Add the following code in the ai_demo_controller.dart file inside the AiDemoController.

ai_demo_controller.dart (dart)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
import 'package:http/http.dart' as http;

class AiDemoController extends ChangeNotifier {
(...)

  Future<Credentials?> _fetchCredentials() async {
    final url = Uri.parse('$_baseURL/credentials');
    try{
      final result = await http.get(url);
      final json = jsonDecode(result.body) as Map<String, dynamic>;
      return Credentials.fromJson(json);
    } catch (e) {
      FlutterError.dumpErrorToConsole(
        FlutterErrorDetails(exception: e, silent: true),
      );
      return null;
    }
  }

This method sends a GET request to fetch the credentials to set up the StreamVideo object and get the call data.

Step 4.3 - Connecting to Stream Video

We now have the credentials, and we can connect to Stream Video. To do this, add the following code in AiDemoController:

dart

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import 'dart:async';

class AiDemoController extends ChangeNotifier {
(...)

  final Completer<void> _connectCompleter = Completer();
  
  Future<void> _connect() async {
    final credentials = await _fetchCredentials();
    if (credentials == null) {
      _connectCompleter.completeError(Exception('No valid credentials'));
      return;
    }

    streamVideo = stream_video.StreamVideo(
      credentials.apiKey,
      user: stream_video.User.regular(userId: credentials.userId),
      userToken: credentials.token,
    );
    this.credentials = credentials;
    await streamVideo!.connect();

    _connectCompleter.complete();
  }

This method fetches the credentials, creates a StreamVideo object, and connects to it. We keep track of the _connectCompleter for later use.

We will call this method directly in the constructor of the controller:

dart

1
2
3
4
class AiDemoController extends ChangeNotifier {
  AiDemoController() {
    _connect();
  }

Step 5 - Building the UI

We can now start building the UI for our app. Create a file home_page.dart and add the following content:

home_page.dart (dart)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
import 'package:flutter/material.dart';
import 'package:stream_video_flutter/stream_video_flutter.dart' as stream_video;

import 'ai_demo_controller.dart';
import 'ai_speaking_view.dart';

class HomePage extends StatelessWidget {
  const HomePage(this.controller, {super.key});

  final AiDemoController controller;
  
  @override
  Widget build(BuildContext context) {
    return Scaffold(
      body: SizedBox.expand(
        child: LayoutBuilder(
          builder:
              (context, constraints) => ListenableBuilder(
                listenable: controller,
                builder:
                    (context, _) => switch (controller.callState) {
                      AICallState.idle => GestureDetector(
                        onTap: controller.joinCall,
                        child: Center(child: Text('Click to talk to AI')),
                      ),
                      AICallState.joining => Center(
                        child: Column(
                          mainAxisAlignment: MainAxisAlignment.center,
                          children: [
                            Text('Waiting for AI agent to join...'),
                            SizedBox(height: 8),
                            CircularProgressIndicator(),
                          ],
                        ),
                      ),
                      AICallState.active => Stack(
                        children: [
                          AiSpeakingView(
                            controller.call!,
                            boxConstraints: constraints,
                          ),
                          Align(
                            alignment: Alignment.bottomRight,
                            child: SafeArea(
                              child: Padding(
                                padding: const EdgeInsets.all(8.0),
                                child: stream_video.LeaveCallOption(
                                  call: controller.call!,
                                  onLeaveCallTap: controller.leaveCall,
                                ),
                              ),
                            ),
                          ),
                        ],
                      ),
                    },
              ),
        ),
      ),
    );
  }
}

The HomePage uses a Scaffold and a ListenableBuilder that updates the UI based on the callState in our controller. The AiDemoController is already constructed in our main.dart before runApp, so it directly connects to our backend to fetch credentials. However, in our controller, we don’t change the callState yet, so the app will always show the AICallState.idle state.

When the state is .active, we show a new AiSpeakingView. This view will show a nice audio visualization when the current user and AI speak. We will provide more details for this view in the next section. For now, it’s enough to declare it with a simple Placeholder. Create a file ai_speaking_view.dart and add the following:

ai_speaking_view.dart (dart)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
import 'package:flutter/material.dart';
import 'package:stream_video_flutter/stream_video_flutter.dart';

class AiSpeakingView extends StatefulWidget {
  const AiSpeakingView(this.call, {required this.boxConstraints, super.key});

  final Call call;
  final BoxConstraints boxConstraints;

  @override
  State<AiSpeakingView> createState() => _AiSpeakingViewState();
}

class _AiSpeakingViewState extends State<AiSpeakingView> {
  @override
  Widget build(BuildContext context) {
    return const Placeholder();
  }
}

Additionally, we are adding an overlay that shows a button for leaving the call. We are using the LeaveCallOption from the Flutter Video SDK for this.

Next, when the state is .joining, we show appropriate text and a progress view.

When the state is .idle, we show a button with the text “Click to talk to AI.” When the button is tapped, we call the joinCall method, which joins the call with the AI bot.

Add the following methods in our AiDemoController:

dart

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
Future<void> joinCall() async {
  try {
    callState = AICallState.joining;
    if (Platform.isAndroid) {
      final hasMicrophonePermission =
          await Permission.microphone.request().isGranted;
      if (!hasMicrophonePermission) {
        callState = AICallState.idle;
        return;
      }
    }

    await _connectCompleter.future;

    final credentials = this.credentials;
    final streamVideo = this.streamVideo;

    if (credentials == null || streamVideo == null) {
      callState = AICallState.idle;
      return;
    }

    final call = streamVideo.makeCall(
      callType: stream_video.StreamCallType.fromString(credentials.callType),
      id: credentials.callId,
    );
    this.call = call;

    await call.getOrCreate();
    await _connectAi(
      callType: credentials.callType,
      callId: credentials.callId,
    );
    await call.join();

    callState = AICallState.active;
  } catch (e) {
    FlutterError.dumpErrorToConsole(
      FlutterErrorDetails(exception: e, silent: true),
    );
    callState = AICallState.idle;
  }
}

Future _connectAi({required String callType, required String callId}) async {
  final url = Uri.parse('$_baseURL/$callType/$callId/connect');
  await http.post(url);
}

Let’s get through the joinCall method step by step.

We update the callState so the home_page widget will update the UI.
On Android we need to request permission to use the microphone, on iOS this will be done automatically when we start using it. When we don’t get the permission we go back to idle state.
Next, we ensure we are connected to our server by awaiting the _connectCompleter set in the _connect method.
If we have the credentials we continue to make the call using makeCall and getOrCreate methods of the stream_video_flutter sdk.
After the call is created we can call our server to connect our AI agent to the call and we join the call.
We update the callState to .active or log an error when something went wrong.

Also add the following leaveCall method to the same controller:

dart

1
2
3
4
5
6
7
8
9
Future<void> leaveCall() async {
  final call = this.call;
  if (call == null) return;

  await call.leave();
  this.call = null;

  callState = AICallState.idle;
}

The leaveCall method leaves the existing call and updates the UI state to .idle.

At this point, you can run the app, join a call, and converse with the AI agent. However, we can take this step further and show nice visualizations based on the participants' audio levels. If you want to give it a try now, make sure your local server is running.

Step 6 - Visualizing the audio levels

Let’s implement the AiSpeakingView next. This view will listen to the audio levels provided by the call state for each of its participants. It will then visualize them with a nice glowing animation that expands and contracts based on the user’s voice amplitude. Additionally, it subtly rotates and changes shape.

Step 6.1 - AiSpeakingView

We want to animate our AiSpeakingView based on the amplitude of the speaker and animate over time. Add the following code to the _AiSpeakingViewState :

dart

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
import 'dart:async';

class _AiSpeakingViewState extends State<AiSpeakingView>
    with TickerProviderStateMixin {
  static const _agentId = "lucy";
  var _speakerState = AISpeakerState.idle;
  var _currentAmplitude = 0.0;
  late AnimationController _timeController;
  late AnimationController _amplitudeController;

  late StreamSubscription<CallState> _callStateSubscription;

  @override
  void initState() {
    super.initState();
    _timeController = AnimationController(
      duration: const Duration(seconds: 10),
      vsync: this,
    )..addListener(() {
      setState(() {
        // The state that has changed here is the animation object's value.
      });
    });
    _timeController.repeat();
    _amplitudeController = AnimationController(
      duration: const Duration(milliseconds: 300),
      vsync: this,
      lowerBound: 0.0,
      upperBound: _currentAmplitude,
    );
    
    _updateSpeakerState(widget.call.state.value);
    _listenToCallState();
  }

  @override
  void didUpdateWidget(covariant AiSpeakingView oldWidget) {
    super.didUpdateWidget(oldWidget);
    if (oldWidget.call != widget.call) {
      _callStateSubscription.cancel();
      _listenToCallState();
    }
  }

  @override
  void dispose() {
    _timeController.dispose();
    _amplitudeController.dispose();
    _callStateSubscription.cancel();
    super.dispose();
  }

First notice that we’ve added the TickerProviderStateMixin on the widget State for the animations.

We created an AnimationController to loop the time between 0 and 1 every 10 seconds and an AnimationController to animate the amplitude. At the start, the amplitude is 0, so we don’t start the controller for the amplitude, but the _timeController is always repeating.

When the widget is updated with a new call, we reset the listeners for the call state, and in the dispose, we dispose of the _callStateSubscription and both AnimationControllers.

Lets add the missing AISpeakerState enum and the _listenToCallState method:

dart

1
2
3
4
5
6
7
8
9
10
11
12
13
enum AISpeakerState { aiSpeaking, userSpeaking, idle }

class _AiSpeakingViewState extends State<AiSpeakingView>
    with TickerProviderStateMixin {
    (...)
    

  void _listenToCallState() {
    _callStateSubscription = widget.call.state.asStream().listen((callState) {
      _updateSpeakerState(callState);
    });
  }
}

Now we listen to updates from the call state and call the _updateSpeakerState every time the call state changes. Add all the methods needed to update the speaker state inside _AiSpeakingViewState.

dart

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
import 'dart:math' as math;
import 'package:collection/collection.dart';

void _updateSpeakerState(CallState callState) {
  final activeSpeakers = callState.activeSpeakers;
  final agent = activeSpeakers.firstWhereOrNull(
    (p) => p.userId.contains(_agentId),
  );
  final user = activeSpeakers.firstWhereOrNull(
    (p) => p.userId == callState.localParticipant?.userId,
  );

  List<double> audioLevels;

  if (agent != null && agent.isSpeaking) {
    _speakerState = AISpeakerState.aiSpeaking;
    audioLevels =
        agent.audioLevels
            .map((e) => e / (math.Random().nextInt(2) + 1))
            .toList();
  } else if (user != null && user.isSpeaking) {
    _speakerState = AISpeakerState.userSpeaking;
    audioLevels = user.audioLevels;
  } else {
    _speakerState = AISpeakerState.idle;
    audioLevels = [];
  }
  final amplitude = _computeSingleAmplitude(audioLevels);
  _updateAmplitudeAnimation(amplitude);
}

double _computeSingleAmplitude(List<double> audioLevels) {
  final normalized = _normalizePeak(audioLevels);
  if (normalized.isEmpty) return 0;

  final sum = normalized.reduce((value, element) => value + element);
  final average = sum / normalized.length;
  return average;
}

List<double> _normalizePeak(List<double> audioLevels) {
  final max = audioLevels.fold(
    0.0,
    (value, element) => math.max(value, element),
  );
  if (max == 0.0) return audioLevels;

  return audioLevels.map((e) => e / max).toList();
}

void _updateAmplitudeAnimation(double newAmplitude) {
  if (_currentAmplitude != newAmplitude) {
    var currentAnimationState = _amplitudeController.value;

    _amplitudeController.dispose();
    final reverse = currentAnimationState > newAmplitude;

    _amplitudeController = AnimationController(
      duration: const Duration(milliseconds: 500),
      vsync: this,
      lowerBound: reverse ? newAmplitude : currentAnimationState,
      upperBound: reverse ? currentAnimationState : newAmplitude,
    );

    _amplitudeController.addListener(() {
      setState(() {
        // The state that has changed here is the animation object's value.
      });
    });

    if (currentAnimationState != newAmplitude) {
      if (reverse) {
        _amplitudeController.reverse(from: currentAnimationState);
      } else {
        _amplitudeController.forward();
      }
    }
  }

  _currentAmplitude = newAmplitude;
}

We call our agent “lucy”, and based on that, we filter out this participant from the current user. The amplitude is calculated from the last 10 audio levels of the speaker and normalized to a value between 0 and 1. That amplitude is smoothly animated with a duration of 500ms.

Now update the build method with the Placeholder to contain a Stack with 3 layers of Glow and use the animation values. We have 3 large and small layers that form the AI glow together.

dart

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
@override
Widget build(BuildContext context) {
  final size = Size(
    widget.boxConstraints.maxWidth,
    widget.boxConstraints.maxHeight,
  );

  final time = _timeController.value;
  final amplitude = _amplitudeController.value;

  return Stack(
    children: [
      GlowLayer(
        baseRadiusMax: 1,
        baseRadiusMin: 3 / 5,
        baseOpacity: 0.35,
        scaleRange: 0.3,
        waveRangeMin: 0.2,
        waveRangeMax: 0.02,
        amplitude: amplitude,
        time: time,
        size: size,
        speakerState: _speakerState,
      ),
      GlowLayer(
        baseRadiusMax: 3 / 5,
        baseRadiusMin: 2 / 5,
        baseOpacity: 0.35,
        scaleRange: 0.3,
        waveRangeMin: 0.15,
        waveRangeMax: 0.03,
        amplitude: amplitude,
        time: time,
        size: size,
        speakerState: _speakerState,
      ),
      GlowLayer(
        baseRadiusMax: 1 / 5,
        baseRadiusMin: 2 / 5,
        baseOpacity: 0.9,
        scaleRange: 0.5,
        waveRangeMin: 0.35,
        waveRangeMax: 0.05,
        amplitude: amplitude,
        time: time,
        size: size,
        speakerState: _speakerState,
      ),
    ],
  );
}

Step 6.2 - GlowLayer

Let’s implement the GlowLayer next. For each of the layers, we define minimum and maximum values for the radius size, brightness, opacity, and wavelength. Feel free to adjust these values to customize the animation. Add the GlowLayer and the new import to the ai_speaking_view.dart file.

ai_speaking_view.dart (dart)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
import 'dart:ui';

class GlowLayer extends StatelessWidget {
  const GlowLayer({
    required this.speakerState,
    required this.baseRadiusMin,
    required this.baseRadiusMax,
    required this.baseOpacity,
    required this.scaleRange,
    required this.waveRangeMin,
    required this.waveRangeMax,
    required this.amplitude,
    required this.time,
    required this.size,
    super.key,
  });

  final AISpeakerState speakerState;
  final double baseRadiusMin;
  final double baseRadiusMax;
  final double baseOpacity;
  final double scaleRange;
  final double waveRangeMin;
  final double waveRangeMax;
  final double amplitude;
  final double time;
  final Size size;

  @override
  Widget build(BuildContext context) {
    // The actual radius = lerp from min->max based on amplitude
    final baseRadius = lerpDouble(baseRadiusMin, baseRadiusMax, amplitude)!;

    // The waveRange also “lerps,” but we want big wave at low amplitude => waveRangeMin at amplitude=1
    // => just invert the parameter. Another approach: waveRange = waveRangeMax + (waveRangeMin-waveRangeMax)*(1 - amplitude).
    final waveRange = lerpDouble(waveRangeMax, waveRangeMin, (1 - amplitude))!;

    final radius = baseRadius * math.min(size.width, size.height);

    // Subtle elliptical warping from sin/cos
    final shapeWaveSin = math.sin(2 * math.pi * time);
    final shapeWaveCos = math.cos(2 * math.pi * time);

    // scale from amplitude
    final amplitudeScale = 1.0 + scaleRange * amplitude;

    // final x/y scale => merges amplitude + wave
    final xScale = amplitudeScale + waveRange * shapeWaveSin;
    final yScale = amplitudeScale + waveRange * shapeWaveCos;

    return Center(
      child: Opacity(
        opacity: baseOpacity,
        child: Transform.scale(
          scaleY: yScale,
          scaleX: xScale,
          child: SizedBox(
            height: radius,
            width: radius,
            child: DecoratedBox(
              decoration: BoxDecoration(
                gradient: RadialGradient(
                  radius: 0.5,
                  colors: speakerState.gradientColors,
                  stops: <double>[0.0, 1.0],
                ),
              ),
            ),
          ),
        ),
      ),
    );
  }
}

extension on AISpeakerState {
  List<Color> get gradientColors => switch (this) {
    AISpeakerState.userSpeaking => [Colors.red, Colors.red.withAlpha(0)],
    _ => [
      Color.from(red: 0.0, green: 0.976, blue: 1.0, alpha: 1.0),
      Color.from(red: 0.0, green: 0.227, blue: 1.0, alpha: 0.0),
    ],
  };
}

We show a different color depending on who is speaking. If the AI speaks, we show a blue color with different gradients. When the current user is speaking, we use a red color instead. When the amplitude is lower the wave animation will be stronger.

Now, you can run the app, talk to the AI, and see beautiful visualizations while the participants speak.

You can find the source code of the Node.js backend here, while the completed Flutter tutorial can be found on the following page.

Recap

In this tutorial, we have built an example of an app that lets you talk with an AI bot using OpenAI Realtime and Stream’s video edge infrastructure. The integration uses WebRTC for the best latency and quality even with poor connections.

We have shown you how to use OpenAI’s real-time API and provide the agent with custom instructions, voice, and function calls. On the Flutter side, we have shown you how to join the call and build an animation using the audio levels.

Both the video SDK for Flutter and the API have plenty more features available to support more advanced use cases.

Next Steps

Explore the tutorials for other platforms: React, iOS, Android, React Native.
Check the Backend documentation with more examples in JS and Python.
Read more about the Flutter SDK documentation about additional features.

Flutter AI Voice Assistant Tutorial

Step 1 - Credentials and Backend setup

Step 1.1 - OpenAI and Stream credentials

Step 1.2 - Create the Node.js project

Step 1.3 - Installing the dependencies

Step 1.4 - Setup the credentials

Step 1.5 - Implement the standalone-ui script

Step 1.6 - Running the sample

Step 2 - Setup your server-side integration

Step 2.1 - Implement the server.mjs

Step 2.2 - Running the server

Step 3 - Setting up the Flutter project

Step 3.1 - Adding the Stream Video dependency

Step 3.2 - Setup Microphone Permissions

Step 4 - Stream Video Setup

Step 4.1 - Setup basic app

Step 4.2 - Create AiDemoController

Step 4.3 - Fetching the credentials

Step 4.3 - Connecting to Stream Video

Step 5 - Building the UI

Step 6 - Visualizing the audio levels

Step 6.1 - AiSpeakingView

Step 6.2 - GlowLayer

Recap

Next Steps

Give us feedback!

Start coding for free