Add Audio Moderation

This how-to guide shows you how to build real-time audio moderation into a Stream Video call using the Stream Python AI SDK and the Stream Moderation API.

For a complete runnable example, see the GitHub sample.

Need an overview first? See the Moderating Video Calls explanation.

What You Will Build

A small “moderation bot” that

Joins a call as a participant
Listens to raw PCM audio from every speaker
Transcribes speech with a Speech-to-Text (STT) plugin (in our case, Deepgram)
Sends each utterance to the Moderation API
Outputs the moderation verdict

Note: we’re using Deepgram as our speech-to-text provider here, but you can use any provider you like.

The audio moderation worflow as a diagram

Set Up Your Environment

Install the SDK and STT Plugin

Note: we’re using uv for package/dependency management, but you can use another tool if you prefer.

uv add getstream>=2.3.0a4 getstream-plugins-deepgram>=0.1.1

Create a .env file in your project root:

# Stream credentials
STREAM_API_KEY=your_stream_api_key
STREAM_API_SECRET=your_stream_api_secret
EXAMPLE_BASE_URL=https://pronto.getstream.io

# STT provider (example for Deepgram)
DEEPGRAM_API_KEY=your_deepgram_api_key

Initialise the Core Clients

Create a file called main.py and add the following content:

from dotenv import load_dotenv
import os, uuid
from getstream.stream import Stream
from getstream.video import rtc
from getstream.plugins.deepgram import DeepgramSTT

load_dotenv()

client = Stream.from_env()   # initialises with STREAM_API_KEY / SECRET
stt = DeepgramSTT()          # uses DEEPGRAM_API_KEY from env

Create Users and Call

call_id = str(uuid.uuid4())
print(f"📞 Call ID: {call_id}")

user_id = f"user-{uuid.uuid4()}"
create_user(client, user_id, "My User")
logging.info("👤 Created user: %s", user_id)

user_token = client.create_token(user_id, expiration=3600)
logging.info("🔑 Created token for user: %s", user_id)

bot_user_id = f"moderation-bot-{uuid.uuid4()}"
create_user(client, bot_user_id, "Moderation Bot")
logging.info("🤖 Created bot user: %s", bot_user_id)

# Create the call
call = client.video.call("default", call_id)
call.get_or_create(data={"created_by_id": bot_user_id})
print(f"📞 Call created: {call_id}")

Create a Moderation Policy Configuration

Create a Moderation policy config with the key custom:python-ai-test. The policy below will flag text deemed insulting.

client = Stream.from_env()
client.moderation.upsert_config(
    key="custom:python-ai-test",
    ai_text_config={
        "rules": [{"label": "INSULT", "action": "flag"}],
    },
)

You can also create a moderation policy config from the Stream dashboard, by creating an app, then navigating to Moderation > Policies > Add New.

The audio moderation menu in the Stream dashboard.

The audio moderation policy page in the Stream dashboard.

Open a Web Browser for Testing

import webbrowser

base_url = f"{os.getenv('EXAMPLE_BASE_URL')}/join/"
params = {"api_key": api_key, "token": user_token, "skip_lobby": "true"}

url = f"{base_url}{call_id}?{urlencode(params)}"
print(f"Opening browser to: {url}")

webbrowser.open(url)

This will open your browser and authenticate you as the user you created earlier.

Add the Moderation Bot and Process Audio

Once the moderation bot is added to the call, it will start listening for audio events emitted by the Python AI SDK. Once it receives an audio event, it will transcribe the audio via STT and send a transcript event.

from getstream.models import ModerationPayload
import uuid

async with rtc.join(call, bot_user_id) as connection:
    @connection.on("audio")
    async def on_audio(pcm: PcmData, user):
        # Process audio through STT
        await stt.process_audio(pcm, user)

Add the Moderation Check

Add this event handler, which runs when the STT provider sends a transcript event.

    @stt.on("transcript")
    async def on_transcript(text: str, user: any, metadata: dict):
        timestamp = time.strftime("%H:%M:%S")
        user_info = user.name if user and hasattr(user, "name") else "unknown"
        print(f"[{timestamp}] {user_info}: {text}")
        if metadata.get("confidence"):
            print(f"    └─ confidence: {metadata['confidence']:.2%}")

        # Moderation check (executed in a background thread to avoid blocking)
        moderation = await asyncio.to_thread(moderate, client, text, user_info)
        print(
            f"    └─ moderation recommended action: {moderation.recommended_action} for transcript: {text}"
        )

    # Keep the connection alive and wait for audio
    await connection.wait()

That’s it! Your bot now moderates speech in real-time.

Run the Code

Run the code with

uv run main.py

It will open up a web browser so you can join the call. You will see a bot participant joining. Now you can speak to it and test the moderation features by saying nice or insulting things!

You should see output like the following in your terminal:

An innocent sentence which is reviewed as safe:

Terminal output of a "non-offensive" sentence, the Moderation API recommends keeping the text and taking no action

An insulting sentence which is reviewed as potentially infringing the moderation policy:

Terminal output of an "insult" sentence, the Moderation API recommends flagging the text for review

Review the Content

When a sentence is recommended to be flagged by the moderation API, it’s also available in the review queue. You can access this via the Python SDK or the Dashboard.

Access it via the SDK with

from getstream.models import QueryReviewQueueResponse, ReviewQueueItemResponse

response: QueryReviewQueueResponse = client.moderation.query_review_queue().data
item: ReviewQueueItemResponse

for item in response.items:
    print("--------------------------------")
    print(f"Flagged audio transcript with ID: {item.id}")
    print(f"Recommended action: {item.recommended_action}")
    if item.recommended_action != "keep":
        print(f"Transcript: {item.moderation_payload.texts[0]}")
    print(f"Created at: {item.created_at.isoformat()}")

Terminal output when querying the review queue, containing references to utterances by the speaker

You will then see the items the Moderation API has sent to the review queue.

You can also see and review this content in the Dashboard, as well as doing things like banning and deleting users.

Screenshot from the Stream dashboard when querying the review queue, containing references to utterances by the speaker

Next Steps

You can find a full, working example on GitHub.
There’s much more you can do with the Moderation API! Check out the Moderation API documentation

Record a Call

How to Add Function-Calling Bots with MCP