How Auto-Moderation Works

Stream’s auto-moderation system provides automated content filtering and moderation capabilities that help maintain a safe and positive environment for your users. The system works by automatically analyzing content as it is submitted and taking predefined actions based on your moderation policies.

When you set up auto-moderation, you create moderation policies that define:

What kind of content should be moderated based on your platform’s needs and community guidelines.
What action to take when that content is detected:
- Flag: Marks the content for review but allows it to be posted.
- Block: Prevents the content from being posted entirely.
- Shadow Block: Makes the content appear as posted to the sender but hides it from other users.
- Bounce: Prevents the content from being posted by returning it to the sender for correction.

The auto-moderation system processes content in real-time through multiple moderation engines:

AI Harm Detection Engine: Analyzes content for 40+ types of harmful content across 30+ languages, e.g., hate speech, self-harm, sexual content, etc.
Semantic Filters: Detect content similar to predefined harmful phrases.
Blocklists: Check against custom lists of prohibited words or patterns.
Image Recognition: Analyzes images for inappropriate content, e.g., nudity, weapons, etc.

The following diagram shows how Stream’s auto-moderation feature works for the chat product:

graph TD
    A[User Sends Message] --> B

    subgraph B[Moderation Engines]
        C1[AI Text Harm Detection Engine]
        C2[AI Semantic Filters]
        C3[Blocklists]
        C4[AI Image Recognition]
    end

    B --> D[Moderation Decision]

    D --> E1[Allow Message]
    D --> E2[Flag for Review]
    D --> E3[Block Message]

    E2 --> F[Moderation Dashboard]
    E3 --> F

Moderation Introduction

Chat Moderation