flowchart LR
User((User Audio)) -->|audio| Bot[Bot]
Bot -->|PCM data| STT[STT]
STT -->|transcript| LLM[LLM]
LLM -->|MCP| Func[Python Function]
Func -->|result| Out[Chat & TTS]
Out -->|message & audio| User
How to Add Function-Calling Bots with MCP
This how-to guide shows you how to give function calling capabilities to AI bots in your Stream video calls. We’ll be using the Model Context Protocol (MCP).
You will create a small assistant that:
- Joins a Stream video call
- Listens to participants’ speech
- Converts speech → text with a Deepgram STT plugin
- Sends the transcript to an OpenAI model LLM that may call MCP tools
- Executes the function and speaks the result back with ElevenLabs TTS
Prerequisites
- Stream account and Stream app - create a Stream app in the Stream Dashboard and copy your
STREAM_API_KEY
andSTREAM_API_SECRET
- API keys for Deepgram, OpenAI and ElevenLabs
- Python 3.10+
uv
package manager (recommended)
Create a project folder
In your terminal, create a project folder with
mkdir mcp-bot && cd mcp-bot
Add dependencies
uv add \
getstream[webrtc]>=2.3.0a4 \
getstream-plugins-deepgram>=0.1.1 \
getstream-plugins-elevenlabs>=0.1.0 \
fastmcp>=2.10.1 \
openai>=1.93.0 \
httpx>=0.28.1 \
python-dotenv>=1.1.1
Set Environment Variables
Copy the template below to a file called .env
and fill in your actual keys.
STREAM_API_KEY=your_stream_api_key
STREAM_API_SECRET=your_stream_api_secret
EXAMPLE_BASE_URL=https://pronto.getstream.io
DEEPGRAM_API_KEY=your_deepgram_api_key
ELEVENLABS_API_KEY=your_elevenlabs_api_key
OPENAI_API_KEY=your_openai_api_key
Create an MCP Server (server.py
)
Create a file called server.py
and add the following:
# server.py
from fastmcp import FastMCP
import httpx
mcp = FastMCP("Stream MCP Server")
# Define a tool that can be used by your MCP Client
@mcp.tool()
def get_forecast(city: str) -> str:
"""Return today's weather for <city>."""
data = httpx.get(f"https://wttr.in/{city}?format=%C+%t").text
return data.strip()
if __name__ == "__main__":
mcp.run()
get_forecast
will be the tool called by your agent.
Create an LLM helper (agent.py
)
Create a file called agent.py
. This will contain the logic for the MCP client and define how the LLM works with your tool.
# agent.py
import json, re, os, logging, openai, fastmcp
openai.api_key = os.environ["OPENAI_API_KEY"]
logging.basicConfig(level=logging.INFO)
# The system prompt that will be passed in to the LLM
SYSTEM = """
You are a helpful assistant. When the user needs real-world weather data, call the get_forecast tool.
If you call a tool, use tool_name["arg":"value"] on a single line.
You can chain up to 3 tool calls.
"""
def _chat(messages):
resp = openai.chat.completions.create(model="gpt-4o", messages=messages)
return resp.choices[0].message.content
async def chat_with_tools(prompt: str, client: fastmcp.Client) -> str:
tools = await client.list_tools()
tool_names = {t.name for t in tools}
history = [
{"role": "system", "content": SYSTEM},
{"role": "user", "content": prompt},
]
while True:
reply = _chat(history)
m = re.match(r"^(\w+)\[(.*)\]$", reply.strip())
if not (m and m.group(1) in tool_names):
return reply
name, args_json = m.groups()
args = json.loads(f"{{{args_json}}}")
result = await client.call_tool(name, args)
history.append({"role": "assistant", "content": str(result.data)})
When your bot sends the audio transcript of your speech to the LLM, it will run this tool-calling logic as well, to give the LLM the option of calling the tool you defined above.
Write the Bot Logic (main.py
)
Now we’re ready for the main bot. Create a file called main.py
and add the following sections.
Import Relevant Objects
import os, uuid, asyncio, logging, webbrowser
from urllib.parse import urlencode
from dotenv import load_dotenv
from getstream.stream import Stream
from getstream.models import UserRequest
from getstream.video import rtc
from getstream.video.rtc import audio_track
from getstream.video.rtc.track_util import PcmData
from getstream.plugins.deepgram.stt import DeepgramSTT
from getstream.plugins.elevenlabs.tts import ElevenLabsTTS
from agent import chat_with_tools
import fastmcp
Set Logger and Create a Client
logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s")
load_dotenv()
CLIENT = Stream.from_env()
Define Utility Methods
def _create_user(user_id: str, name: str):
CLIENT.upsert_users(UserRequest(id=user_id, name=name))
def _open_browser(call_id: str, token: str):
base = f"{os.getenv('EXAMPLE_BASE_URL')}/join/"
params = {"api_key": CLIENT.api_key, "token": token, "skip_lobby": "true"}
url = f"{base}{call_id}?{urlencode(params)}"
webbrowser.open(url)
Create a Function to Run the Bot
async def _run_bot(call, bot_id):
stt = DeepgramSTT()
tts = ElevenLabsTTS()
track = audio_track.AudioStreamTrack(framerate=16000)
tts.set_output_track(track)
async with await rtc.join(call, bot_id) as conn:
await conn.add_tracks(audio=track)
async with fastmcp.Client("server.py") as mcp_client:
@conn.on("audio")
async def on_audio(pcm: PcmData, user):
await stt.process_audio(pcm, user)
@stt.on("transcript")
async def on_transcript(text, user, meta):
if not meta.get("is_final", False):
return
answer = await chat_with_tools(text, mcp_client)
if answer:
await tts.send(answer)
await conn.wait()
Create the Main Loop
async def main():
call_id = str(uuid.uuid4())
human_id = f"user-{uuid.uuid4()}"
bot_id = f"mcp-bot-{uuid.uuid4()}"
_create_user(human_id, "Human")
_create_user(bot_id, "MCP Bot")
token = CLIENT.create_token(human_id, expiration=3600)
call = CLIENT.video.call("default", call_id)
call.get_or_create(data={"created_by_id": bot_id})
_open_browser(call_id, token)
try:
await _run_bot(call, bot_id)
finally:
CLIENT.delete_users([human_id, bot_id])
if __name__ == "__main__":
asyncio.run(main())
Run Everything
Now we’re ready to run the code. Run with
uv run main.py # starts the bot and opens the browser
You can now speak in the call. Ask the bot what the weather is like in a city, e.g. “What’s the weather like in London?”. The bot calls get_forecast
, gets the weather and replies aloud.
Next Steps
You have now built an MCP-enabled assistant bot that runs entirely on Stream’s Python SDK. You can
- Add more
@mcp.tool()
functions inserver.py
- Swap providers - any STT/TTS plugin will work!
- I'm working with the Stream Video Python AI SDK and would like to ask questions about this documentation page: https://getstream.io/video/docs/python-ai/guides/add-mcp-client-function-calling.md
- View as markdown
- Open in ChatGPT
- Open in Claude