Available moderation actions

Why Actions Matter

Detection is only half of moderation. Once the AI identifies harmful content, the system must decide what to do next. Actions are the bridge between classification (what harm was found) and enforcement (how your platform responds).

Stream provides several moderation actions you can configure at the policy level. Each action carries different implications for user experience, moderator workload, and community safety.

Flag

What It Does: Flagged content is marked as potentially harmful and sent to the moderation queue for human review. The message still exists in the app and is received within the channel. This action has no impact on the user experience.

When to Use:

For harms that require human judgment.
When you want to minimize false positives.

Best Practice: Start with Flag as your default for ambiguous harms. It gives moderators control while you gather data on how well your prompts and thresholds perform.

Block

What It Does: Blocked content is immediately removed from the platform and routed to the moderation queue. Users cannot see their own blocked content, and other participants never see it.

When to Use:

For high-severity harms with zero tolerance.
To protect your community from immediate harm.

Best Practice: Use Block for categories where community safety takes priority over leniency. Start with critical harms, then expand cautiously as you refine thresholds and prompts.

Shadowblock

What It Does: Shadowblocked content remains visible to the sender but hidden from everyone else. It also appears in the moderation queue for review.

When to Use:

For communities where bad actors are likely to test moderation boundaries.
To reduce adversarial behavior, since the user may not realize their content has been hidden.

Best Practice: Shadowblock is highly effective against spam and trolling. However, be cautious with sensitive harms like self-harm, where users expect a real audience response.

Bounce and Flag

What It Does: When a message violates a policy, it is bounced back to the sender with a prompt to correct it. If the user revises the message and resends it, the corrected version passes through normally. If they attempt to resend the original without changes, it is flagged in the queue for moderator review.

When to Use:

For educational enforcement, remind users of community standards without blocking conversations outright.
For borderline harms like profanity or mild policy violations.

Best Practice: Use Bounce and Flag when you want to nudge users to self-correct while still capturing uncorrected violations for moderator oversight.

Bounce and Block

What It Does: Like Bounce and Flag, the message is first returned to the sender with a prompt to revise it. If the user corrects it, the message goes through as normal. But if they try to resend the original without changes, it is blocked entirely and appears in the moderation queue as blocked content.

When to Use:

For categories where violations are unacceptable, but you still want to give users a chance to self-correct.
Useful for spam, slurs, or repeated low-effort violations.

Best Practice: Use Bounce and Block for clear no-go content that users may accidentally include but should never be allowed to publish uncorrected.

No Action

What It Does: Content passes through without moderation. It does not appear in the moderation queue.

When to Use:

Only for the harms you intentionally want to ignore.
Sometimes used temporarily for testing categories without enforcement.

Custom Severity

What It Does: Instead of applying a fixed action, you can map harms to a custom severity level (Low, Medium, High, Critical). Severity determines how content is prioritized in the queue, or which thresholds trigger auto-blocking.

When to Use:

For nuanced harms where some cases are mild and others critical (e.g., sexual content, violence).
To route high-severity harms to senior moderators while letting junior staff handle low-risk cases.

Best Practice:
Define severity mappings in advance so your team has clear triage guidelines. Example:

Low = Flag for review
Medium = Bounce and Flag
High = Shadowblock
Critical = Block

Examples by Harm Type:

Harassment / Bullying

Low: “You’re kind of annoying lol.” (mild insult)
Medium: “Nobody likes you, just quit already.” (targeted ridicule)
High: “Everyone should go report this loser until they leave.” (coordinated harassment)
Critical: “I’m going to find you and beat you up.” (direct violent threat)

Violence

Low: “That game was so good, it killed me.” (casual figurative expression)
Medium: “I want to punch that guy in the face.” (non-graphic intent to harm)
High: “Here’s a video of the fight with blood everywhere.” (graphic violence)
Critical: “We should bomb that place tomorrow.” (terrorism / imminent violent threat)

Spam / Scams

Low: “Check out my YouTube channel!” (harmless promotion)
Medium: “Buy cheap followers here: www.spamlink.com.” (repeated spam links)
High: “Congratulations, you won an iPhone! Enter your credit card here.” (fraud/phishing)
Critical: Mass messages across multiple channels with scam links to steal identities. (coordinated scam campaign)

Mask and Flag

What It Does: Masking automatically hides the offending portion of a message (e.g., replacing a profanity with ****), while keeping the rest of the content visible. This action is only available for Blocklists & Regex Filters.

When to Use:

For communities where you want to preserve most of the conversation, but filter offensive language.
For profanity, spam links, or sensitive but non-critical terms.

Best Practice: Masking works best as a middle-ground enforcement, it keeps conversations flowing while signaling that content has been moderated.

Next, we’ll cover managing confidence scores and severity levels, fine-tuning the thresholds that determine when content is flagged or blocked, and balancing automation with human review.

Understanding moderation actions

Why Actions Matter

Flag

Block

Shadowblock

Bounce and Flag

Bounce and Block

No Action

Custom Severity

Mask and Flag