What is a moderation policy and how to create one

What Is a Moderation Policy?

A moderation policy is the foundation of Stream AI Moderation. It’s a set of rules that define what kinds of content your AI should detect and how the system should respond when harmful content is found.

Think of a policy as a playbook: it tells the AI which harms to look for (e.g., harassment, hate speech, spam) and what actions to take (e.g., flag for review, block automatically).

Why Policies Matter

Without policies, the AI has no guidance on what is considered acceptable or harmful in your community. By designing policies that reflect your platform’s standards, you ensure moderation decisions are aligned with your goals, your user base, and any compliance requirements.

Creating a Policy in the Dashboard

To create a new policy:

Go to Moderation → Policies in the Stream Dashboard.
2.Click Create Policy and give your policy a clear, descriptive name (e.g., “Chat Policy – Production”).
Choose the scope of the policy:
- All Chat Channels: applies globally across every channel in your app.
- Specific Channel Types: applies to categories like messaging, livestream, or other channel types you’ve defined.
- Specific Channel IDs: applies only to individual channels you specify (e.g., chat:messaging:general).
Add one or more rules that define what the AI should detect. Rules can be AI-powered, blocklist-based, or regex-based (we’ll cover these in detail later).
Assign actions to each rule, such as flag, block, shadowblock, bounce and flag, bounce and block, or no action.
Save and activate the policy.

Testing and Validating a Policy

Before applying a policy to your production environment, always test it:

Use your staging app to send sample messages.
Review how the AI categorizes and acts on the content.
Adjust severity thresholds or rules if you see too many false positives or misses.

Rule Builder

The Rule Builder goes beyond single-message moderation. Instead of only acting on one piece of content at a time, it tracks patterns of behavior over time and takes action when thresholds are crossed. This makes it especially powerful for spotting spam bursts, repeat offenders, or coordinated abuse.

Two Types of Rules

User-type rules: Watch a user’s activity over a time window and trigger actions when violations stack up.
- Example: “3 hate speech messages in 24 hours → flag user.”
Content-type rules: Act instantly on the current message or media.
- Example: “3 Messages containing links → block content.”

Conditions You Can Mix & Match

You can build rules with flexible AND/OR logic:

Text harms (HATE_SPEECH, SCAM, THREAT, etc.)
Image/video harms (Explicit, Violence, Drugs, Hate Symbols, etc.)
Total content count (e.g., “≥50 messages in 1 hour”)
User attributes (e.g., “account created in the last 24 hours”)

How a Rule is Created

Every Rule Builder automation follows a simple control flow:

IF one or more conditions are met (combined with AND/OR):

Examples of conditions you can mix:
- LLM text labels counted within a time window (e.g., HATE_SPEECH ≥ 3 in 24h)
- Image/video harms (e.g., Explicit, Violence)
- Content patterns (links, regex matches, semantic filters)
- User attributes or states (e.g., account_age < 24h, trust_score = low)
Conditions can include counters and time windows as part of the IF (e.g., “within 60 minutes”).

THEN take an action:

User-level: flag user, temporary ban, shadowban, permanent ban, IP ban
Content-level: flag content, block content

COOLDOWN (optional) to prevent re-triggering too quickly:

After the action fires, the rule won’t trigger again for the same subject (user/content) until the cooldown expires.
Useful to avoid repeated bans/blocks in bursty scenarios or after a manual unban.

Concrete examples

Spam burst control (user)
- IF (SCAM ≥ 5 in 60m) OR (messages_total ≥ 50 in 60m) THEN ban user for 1h COOLDOWN 24h
New-account harassment (user)
- IF (account_age < 24h) AND (HARASSMENT ≥ 2 in 2h) THEN shadowban user for 24h COOLDOWN 24h
Phishing link blocker (content)
- IF (message contains link) AND (regex matches phishing pattern) THEN block content COOLDOWN 10m (optional, content rules often don’t need one)

Best Practices

Keep rules simple at first, start with spam burst, and repeat harassment. Tune thresholds gradually to avoid false positives. Use cooldowns to prevent whiplash re-enforcement after manual unbans. Always test in staging with sample messages before production.

A moderation policy is your blueprint for safe interactions. By creating, testing, and iterating on policies, you set the foundation for consistent, scalable moderation across your platform.

Next, we’ll look at different types of moderation rules and how to configure them effectively.